Coyote Quick Start

This is a brief introductory section outlining the steps to run Coyote and the essential prerequisites for its setup.

Prerequisites

Full Vivado/Vitis suite is needed to build hardware designs. Hardware server will be enough for deployment only scenarios. Various Ubuntu versions should be supported (we advise on using 20.04).

Warning

Coyote runs with Vivado 2022.1. Previous versions can be used at one’s own peril.

Following AMD data center cards are supported: Alveo u50, Alveo u55c, Alveo u200, Alveo u250 and Alveo u280. You can also deploy Coyote on a development board, for example vcu118.

Coyote has continously been developed on the HACC cluster at ETH Zurich. For more information and possible external access check out the following link: ETH Zurich HACC.

CMake is used for project creation (version 3.0+) and builds. Additionally Jinja2 template engine for Python is used for some of the code generation.

The API is writen in C++, 17 should suffice (for now).

If networking services are used, to generate the design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license set up in Vivado/Vitis.

To run the virtual machines on top of individual vFPGAs the following packages are needed: qemu-kvm, build-essential and kmod.

Initialization

You can clone the full Coyote repository if you want to change stuff within the framework. When cloning the repository be sure to check out all the submodules:

> git clone --recurse-submodules https://github.com/fpgasystems/Coyote

Otherwise, we suggest to use Coyote as a submodule within your projects:

> git submodule add https://github.com/fpgasystems/Coyote
> git submodule update --init --recursive

An example of how this can be done can be seen in the following repo: Coyote-PROJECT.

Building and Loading the Hardware

The CMake is coupled with the project flow within Vivado. This flow provides the capability to synthesize individual application projects (to be loaded in different virtual FPGAs) independently and subsequently merge them into a single overarching project after the synthesis step.

This is the basis for the nested system layers exposed by Coyote. The following sub-layers (each represented by a different Vivado project) exist:

  1. Static Layer : This is the static portion of the system. It is always the same for every project (for the same chip). It provides the bare essential functionality to support the rest of the system.

  2. Dynamic (Shell, Service) Layer : This layer houses all the services offered by Coyote which are shared among all applications. It is the first dynamic layer that can be swapped during runtime and represents the current shell configuration.

  3. Application Layer : This layer houses all user projects, each representing a separate user application. The number of overall projects depends on two factors:

    1. the number of virtual FPGAs (vFPGAs) within the dynamic layer (N_REGIONS), and

    2. the number of different configurations of these vFPGAs within the overarching shell (N_CONFIG).

Hardware Configuration

The hardware configuration is provided via CMake. The following is an example of a project configuration:

cmake_minimum_required(VERSION 3.0)
project(example_prj)

set(CYT_DIR ${CMAKE_SOURCE_DIR}/) # Path to Coyote dir
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CYT_DIR}/cmake)

find_package(CoyoteHW REQUIRED)

# Shell configuration
set(FDEV_NAME "u55c") # Link to u55c static
set(N_REGIONS 2) # Number of vFPGAs in this specific shell
set(EN_PR 1) # Enable 2nd level dynamic reconfiguration
set(N_CONFIG 2) # Number of app dynamic configurations
set(EN_STRM 1) # Interface to host memory
set(EN_MEM 1) # Interface to FPGA-side memory (HBM/DRAM)
set(HBM_SPLIT 1) # Specific HBM configuration

validation_checks_hw() # Validate configuration

load_apps ( # Load arbitrary user applications for all configs
    VFPGA_C0_0 "<some_path_to_the_cores>/addmul"
    VFPGA_C0_1 "<some_path_to_the_cores>/cntmin"
    VFPGA_C1_0 "<some_path_to_the_cores>/shifter"
    VFPGA_C1_1 "<some_path_to_the_cores>/hloglog"
)

create_hw() # Generate all targets

This project will link to the existing static design floorplanned for the Alveo u55c chip.

After indicating the path to Coyote repository, the shell configuration is chosen. In the example, we enable two distinct vFPGAs. Each of these functions as an independent hardware process, accommodating one user application (user process) at a time. Applications in these vFPGAs can also be swapped during runtime without disrupting the surrounding shell operations (EN_PR flag). Multiple dynamic configurations can be compiled within one project (N_CONFIG). Additional flags can then be provided. All these will define the exact configuration of the shell.

Note

For the complete list of the shell configuration options please check Shell Configurations.

Be sure to include the validation_checks_hw() and create_hw() functions, necessary for properly setting up the environment.

Project Structure

The load_apps() function facilitates the automatic loading of user hardware applications into the corresponding vFPGAs. This process also performs any essential high-level synthesis compilations, if needed. When utilized, users must explicitly provide path to all configurations (N_CONFIG x N_REGIONS).

The hardware applications (in the provided path) should be structured as follows:

 <coyote submodule>
├ CMakeLists.txt (an example can be the one shown previously) <path_to_your_hw_project>
     vfpga_top.svh (this is the integration wrapper, connect your stuff to the interfaces)
     init_ip.tcl (all extra IP cores can be instantiated here, buffers, ILAs, VIOs ...)
     hls (put all your hls cores under this directory)
         kernel_1
             kernel_1 files (should contain kernel_1.cpp top level)
         kernel_2
         kernel_3
     hdl
         all RTL cores and files that might be used (.v, .sv, .svh, .vhd, ...)

Note

Be sure to create the vfpga_top.svh. This is the main integration header file. It is used to connect your circuits to the interfaces exposed by each vFPGA.

It is not necessary to use the load_apps() function. You can also integrate your circuits manually into the provided wrappers (available after the project creation step).

Builds

The projects can be built after configuration and directories have been setup. First, the build directory can be created:

> mkdir build_hw && cd build_hw

The CMake configuration can then be invoked:

> cmake <path_to_cmake_config> <any_additional_configs_if_needed>

If all validation checks pass without errors, all the necessary build files will be generated after this step. Project creation can be then be invoked with the following command:

> make project

This will create all projects:

<project_name>_static # Only if static region is being generated (BUILD_STATIC = 1)
<project_name>_shell  # This is the dynamic shell
<project_name>_config_0/<project_name>_user_c0_0 # (vFPGA_C0_0)
<project_name>_config_0/<project_name>_user_c0_1 # (vFPGA_C0_1)
...
<project_name>_config_1/<project_name>_user_c1_0 # (vFPGA_C1_0)
...

If load_apps() was used, there is nothing else that users need to do after this command. Otherwise each of the user projects (vFPGA_CX_Y) will contain wrappers under the project/hdl directory where users can instantiate their circuits as they please.

Compilation

Compilation can then be executed. To generate all bitstreams straight away, users can run:

> make bitgen

The command consists of the following incremental steps:

> make synth # Synth all layers
> make link # Link all layers
> make shell # Compile the shell (static + dynamic layers)
> make app # Compile the application layer (only if EN_PR is enabled)
> make bitgen # Generate all bitstreams

If EN_PR floorplanning of the applications (vFPGAs) needs to be done by users explicitly after the make shell step. This can be done by opening the generated shell_subdivided.dcp checkpoint. Check out the following link for the detailed floorplanning guide. Alternatively, users can provide pre-existing vFPGA floorplanning via the FPLAN_PATH variable during configuration.

Once the (typically quite lengthy) compilation is done, the bitstreams will be generated for each application and configuration. The shell bitstream (the dynamic layer bitstream) with the initial (config 0) configuration will also be generated. This one can be used to load the shell dynamically and swap out other shells during runtime. All of these will be present in the bitstreams directory.

../_images/build_struct.png

The overall bitstream structure should roughly resemble the one in the figure above.

Linking Additional Projects to the Existing Shell

One advantage of the nested framework organization is that additional user applications can easily be added to already compiled shells.

For instance, an application in a networking shell with RDMA enabled can easily be added without having to resynthesize the complete shell.

To do this, users can create an additional project and link it against an existing shell project. An example of CMake configuration in this case is shown below:

# Shell configuration
set(BUILD_SHELL 0) # We are reusing a shell, not building a new one ...
set(BUILD_APP 1) # App flow, instead of the default shell flow
set(N_CONFIG 2) # Number of additional configurations to be added
set(SHELL_PATH "path_to_the_existing_shell_dir")

validation_checks_hw() # Validate configuration

load_apps ( # Load additional apps
    VFPGA_C0_0 "<some_path_to_the_cores>/aes_cbc"
    VFPGA_C1_0 "<some_path_to_the_cores>/aes_ecb"
)

create_hw() # Generate all targets

The rest of compilation flow doesn’t differ from the one already covered.

> make project
> make bitgen

After the compilation, additional bitstreams for the newly added applications will be created which can be dynamically swapped within the existing shell.

Loading the bitstreams

The initial bitstreams (static layer) can be loaded via JTAG through Vivado’s hardware programmer. The script under /util/program_alveo.tcl can be used for this loading procedure. All other bitstreams (shell and application bitstreams) are loaded through the Coyote framework and do not require external tools.

Note

Bitstreams with .bin extensions should be used when loading the designs dynamically through Coyote. If bitstreams are being programmed through Vivado programmer, use the ones with .bit extension.

Hot Plug

Whenever the static image of Coyote is loaded for the first time the interconnect has to be rescanned.

For this purpose the scripts under /util/flow_alveo.tcl directory can be used. The ETHZ-HACC cluster contains all the necessary infrastructure to automate this for the end users.

Note

The rescanning needs to be done only for the initial loading of the static shell. Subsequent shells can be loaded dynamically during runtime.

Building and Loading the Driver

The driver can be built by running make within the driver directory:

> cd driver && make

Note

Be sure to compile the driver on the target deployment machine.

Driver Insertion

After compilation you can insert the driver (make sure to have sudo):

> insmod coyote_drv.ko <args>

The following arguments can be supplied during driver insertion:

Argument

Function

config_fname

Configuration file, mapping of devices, used when multiple FPGAs are available.

cyt_arch

Target underlining platform. Used to target different interconnects like ECI.

en_hmm

Enable heterogeneous memory management.

en_hypervisor

Run Coyote in hypervisor mode. Used when running VMs on top of vFPGAs.

ip/mac_addr

IP and MAC addresses for the network stacks.

Exposed vFPGA Devices

Once the driver is inserted a range of device files will be exposed in the kernel under /dev. These can be used from the user space to access the target devices.

> ls -la /dev/fpga*
  fpga_0_v0
  fpga_0_v1
  ...
  fpga_0_pr
  fpga_1_v0
  ...

If system contains multiple FPGAs, the first number will indicate the ID of this FPGA. These IDs can be manually assigned by providing an external config_fname file during driver insertion.

The second number after the v indicates the target vFPGA. Additinoally, the pr device is available which is used by user space scheduler to control the dynamic reconfiguration.

Users can also interact with the system through the sysfs file system. Coyote will expose a number of internal registers which can be read from and writen to in order to control and debug the live system. These can typically be found under /cat/sys/kernel/coyote_cnfg.

Building the Software

Procedure to build the software is similar to the one for hardware, albeit more simple.

First create a build directory:

> mkdir build_sw && cd build_sw

Then set the TARGET_DIR in CMake which points to the main sources:

> set(TARGET_DIR "${CYT_DIR}/<some_path_to_sources>")

Additional header files can be included in the path under include. After that you can just run make:

> make

Running Coyote as a Service

Coyote can also be deployed as a background daemon. Check out the example to see how this can be invoked.

Communication with the daemon is done through either the Unix domain sockets or through TCP sockets for local and remote services, respectively.

Shell Loading

User shells (different versions of the dynamic layer) can be loaded at any point as long as the common static layer is on-line.

These can be loaded via the provided examples software application (examples_sw/apps/reconfigure_shell). Users can also load shells dynamically from within their code via the cRnfg` class.

Similarly applications in the app layer can be loaded via the same class, but additionally the loading can also be controlled by the derived cSched class which can additionally handle the necessary scheduling.