Software

class bFunc

#include <bFunc.hpp>

Base user function.

This class should not be used directly; instead it is be inherited in the cFunc class, which defines the arbitrary user functions and the corresponding bitstreams. Therefore this class is fully abstract and all the functions are pure virtual. Additionally, since it is a fully abstract class, there is no corresponding .cpp file.

The only reason this class exists is because the cFunc class has a variadic template, making it difficult to include in other classes. For example, the cService class keeps a map of functions added to the Coyote background service, and, since each function is user-defined, it can have different templates and arguments.

Subclassed by coyote::cFunc< ret, args >

Public Functions

inline virtual ~bFunc()

virtual std::vector<char> run(cThread *coyote_thread, const std::vector<std::vector<char>> &args) = 0

virtual int32_t getFid() const = 0

virtual std::string getBitstreamPath() const = 0

virtual std::pair<void*, uint32_t> getBitstreamPointer() const = 0

virtual void setBitstreamPointer(std::pair<void*, uint32_t> bitstream_pointer) = 0

virtual std::vector<size_t> getArgumentSizes() const = 0

virtual size_t getReturnSize() const = 0

class cBench

#include <cBench.hpp>

Helper class for benchmarking various functions in Coyote.

At a high-level, it executes some function a number of times and records its duration Then, it can be used for outputting run-time statistics, such as average, minimum, maximum etc.

Public Functions

cBench(unsigned int n_runs = 1000, unsigned int n_warmups = 100): Default constructor; user can define number of test runs and also the number of warm-up runs, which don’t affect time measurements.

template<class BenchFunc, typename ...BenchArgs, class PrepFunc, typename ...PrepArgs> inline void execute(BenchFunc const &bench_func, BenchArgs... bench_args, PrepFunc const &prep_func, PrepArgs... prep_args)

Benchmark function execution (measure the duration)

We use functional programming + variadic templates: A function takes another function as argument + arbitrary number of other arguments

Note

Even though this is a header file, we define the function here (but only this one, the others are defined in cBench.cpp) The reason for that being is that at compile time get an -fpermissive warning of no definition for this function However, declaring it here and defining it in cBench.cpp doesn’t work as it requires a template specialiastion See here for more details: https://isocpp.org/wiki/faq/templates#templates-defn-vs-decl

Parameters:

bench_func – Function to be benchmarked
bench_args – Arguments passed to benchmark function
prep_func – Function executed before benchmark (any prep work)
prep_args – Arguments passed to prep function

double getAvg(): Returns the mean execution time; averaged over n_runs.

double getMin(): Returns the minimum execution time out of the n_runs recorded times.

double getMax(): Returns the maximum execution time out of the n_runs recorded times.

double getP25(): Returns the P25 execution time out of the n_runs recorded times.

double getP50(): Returns the P50 execution time out of the n_runs recorded times.

double getP75(): Returns the P75 execution time out of the n_runs recorded times.

double getP95(): Returns the P95 execution time out of the n_runs recorded times.

double getP99(): Returns the P99 execution time out of the n_runs recorded times.

Private Members

unsigned int n_runs

unsigned int n_warmups

std::vector<double> measured_times

class cConn

#include <cConn.hpp>

Coyote connection class.

A utility class that allows clients to connect to a Coyote background service and submit tasks to be executed on the server side. The class supports both blocking and non-blocking tasks.

Public Functions

cConn(std::string sock_name)

Default constructor for local connections.

When called, this constructor create a local connection to a Coyote service, as implemented in cService.hpp

Parameters:: sock_name – The name of the Coyote socket, as registed by the server

~cConn(): Default destructor; sends a request to close the connection.

bool isTaskCompleted(int32_t tid)

Checks if a task with the given ID is completed.

Parameters:: tid – Task ID to check
Returns:: true if the task is completed, false otherwise

template<typename ret, typename ...args> inline ret task(int32_t fid, args... msg)

Submits a task to the Coyote service; blocking - waits until the task is completed.

Note

Implemnted in the header file, since it is a template function.

Note

This function can throw a runtime_error if there are failures in the sending the payload to the server or if the server returns a non-zero code (e.g., timeouts, function not found, etc.)

Note

Users must ensure they pass the correct template arguments, matching the function signature on the server; the server simply serializes a byte array into the target arguments; so if incorrect templates are passed, a wrong value may be returned.

Parameters:

fid – Function ID of the request
msg – Variable number of arguments to be sent to the server

Returns:

The return value of executed function

template<typename ret, typename ...args> inline int32_t iTask(int32_t fid, args... msg)

Submits a task to the Coyote service; non-blocking - exits immediately after sending the request.

Users should use isTaskCompleted(int32_t tid) to query the status of the task and if complete, getTaskRetVal(int32_t tid) to retrieve the return value.

Note

Implemnted in the header file, since it is a template function.

Note

This function can throw a runtime_error if there are failures in the sending the payload to the server.

Note

Users must ensure they pass the correct template arguments, matching the function signature on the server; the server simply serializes a byte array into the target arguments; so if incorrect templates are passed, a wrong value may be returned.

Parameters:

fid – Function ID of the request
msg – Variable number of arguments to be sent to the server

Returns:

Unique task ID

template<typename ret> inline ret getTaskReturnValue(int32_t tid)

Obtains the task return value from the server.

This function should only be called after the task has been submitted and marked as completed, by checking isTaskCompleted(int32_t tid). Otherwise, the return value may be wrong / zero.

Note

Implemnted in the header file, since it is a template function.

Note

This function can throw a runtime_error if the server returns a non-zero code for the task. This can happen if the requested function doesn’t exist, timeouts, wrong argument serialization etc.

Note

Users must ensure they pass the correct template for ret, matching the function signature on the server; the server simply serializes a byte array into the response; so if an incorrect incorrect template is passed, a wrong value may be returned.

Parameters:: tid – Task ID, as obtained from iTask()
Returns:: The return value of executed function

Private Functions

void checkCompletedTasks(): Periodically checks for completed tasks and update the task map.

Private Members

int sockfd = -1: Connection socket file descriptor.

std::atomic<int32_t> task_counter: An atomic variable; used for generating unique IDs for tasks.

std::map<int32_t, std::unique_ptr<cTask>> tasks: A map of submitted tasks.

std::thread completion_thread: A dedicated thread that periodically checks for completed tasks.

bool run_thread: Set to true when the completion thread is running.

template<typename ret, typename ...args> class cFunc : public coyote::bFunc 

#include <cFunc.hpp>

User-defined functions.

This class is a template for user-defined functions. Each function is associated with a specific application bitstream and the corresponding software-side function to be executed. The functions are implemented using variadic templates to allow for a variable number of parameters to be passed. This class is expected to be used in conjuction with Coyote services (cService) and requests (cReq). For an example, refer to Example 9 in examples/.

Note

Since this class is a template, it must be implemented in the header file Otherwise, it leads to compilatiation errors. An alternative is to use template specialization, but it is not applicable in this case, since the function arguments are arbitrary and not known ahead of time.

Public Functions

inline cFunc(int32_t fid, std::string app_bitstream, std::function<ret(cThread*, args...)> fn): Default constructor; converts the app_bitstream path to an absolute path.

inline ~cFunc(): Default destructor.

inline virtual std::vector<char> run(cThread *coyote_thread, const std::vector<std::vector<char>> &x) override

Executes the function with the given arguments.

Note

The cService holds a list of functions registered with the background service To do so, we need to implement a base (non-tempalated) bFunc class (otherwise it becomes very hard to store the functions in a map). However, since the base class is not templated, this function must also be non-templated. Therefore, the run function takes the arguments as a vector of char buffers (std::vector<char>). Each char buffer is then unpacked into the corresponding argument. There are alternatives to this implementation (e.g., using std::any); however, using char buffer provides one of the simplest solutions, with no reliance on complex data types. Additionally, when the function arguments are received in the server (processRequests() function), they are naurally written to a char buffer, since they are contigious, byte-addressable and easily cast to other data types.

Parameters:

coyote_thread – Pointer to the cThread object
x – List of arguments passed as vector of char buffers, one buffer per argument

Returns:

The result of the function execution, serialized into a char buffer

inline virtual std::pair<void*, uint32_t> getBitstreamPointer() const override: Returns a pointer to the bitstream memory and its size.

inline virtual void setBitstreamPointer(std::pair<void*, uint32_t> bitstream_pointer) override

Sets the bitstream memory for this function.

Once the bistream has been loaded from disk to memory (using functions cRcnfg), this function updates the bitstream_pointer variable with its address and size.

Parameters:: bitstream_pointer – A pair containing the pointer to the bitstream memory and its size

inline virtual std::vector<size_t> getArgumentSizes() const override

Returns a vector of sizes, one for of the function arguments.

Example: For args = {int64_t, float, bool}, the return is std::vector<size_t> = {8, 4, 1}

Returns:: A vector of sizes of the function argument

inline virtual size_t getReturnSize() const override: Similar to above, returns the size of the return value of the function.

inline virtual int32_t getFid() const override: Getter: Function ID.

inline virtual std::string getBitstreamPath() const override: Getter: Bitstream path.

Private Functions

template<std::size_t... I> inline std::tuple<args...> unpackArgs(const std::vector<std::vector<char>> &x, std::index_sequence<I...>)

Utility function; unpacks the arguments from a vector of char buffers into a tuple.

This function uses parameter pack expansion and lambda function to unpack the arguments

Parameters:

x – Vector of char buffers, one for each argument
I – Index sequence for unpacking

Returns:

A tuple containing the unpacked arguments

Private Members

int32_t fid: Unique function identifier.

std::string app_bitstream: Path to the application bitstream.

std::pair<void*, uint32_t> bitstream_pointer: Once the bitstream is loaded from disk, this variable holds the pointer to the bitstream memory and its size.

std::function<ret(cThread*, args...)> fn

Body of the software function to be executed.

Each function is a callable object, which by definition takes a cThread pointer which interacts with the vFPGA, and a variable number of arguments, that represent the parameters of the function. ret represens the return type of the function.

struct CoyoteAlloc

#include <cOps.hpp>

Public Members

CoyoteAllocType alloc = {CoyoteAllocType::REG}: Type of allocated memory.

uint32_t size = {0}: Size of the allocated memory.

bool remote = {false}: Is this buffer used for remote operations?

uint32_t gpu_dev_id = {0}: GPU device ID (when alloc == CoyoteAllocType::GPU)

int32_t gpu_dmabuf_fd = {0}: File descriptor for the DMABuff used for GPU memory.

void *mem = {nullptr}: Pointer to the allocated memory; the struct keeps track of it so that it can be freed automatically after use.

class cRcnfg

#include <cRcnfg.hpp>

Coyote reconfiguration class Used for loading partial bitstreams to FPGA memory and triggering reconfiguration Used for both shell reconfiguration (dynamic + user layer) and app reconfiguration (2nd-level PR)

The most important function for users here is reconfigureShell(std::string bistream_path)` Which is used to reconfigure the entire shell (dynamic + user layer); for more details see comments below

In general, the flow of reconfiguration is (all of which is abstracted by reconfigureShell):

Allocate host-side, kernel memory to hold the partial bitsream (void* getMem, internally calling the Coyote driver)
Map the allocated memory to the user-space (using reconfig_mmap from the Coyote driver)
Load the bitsream from disk and store it to the allocated memory
- Trigger reconfiguration by writing the memory to FPGA memory and asserting the correct registers
Once complete, release the allocated memory (void freeMem, internally calling the Coyote driver)

Subclassed by coyote::cSched

Public Functions

cRcnfg(unsigned int device = 0)

Default reconfiguration constructor.

Parameters:: device – Target (physical) FPGA device to reconfigure Only important for systems with multiple FPGA cards e.g., reconfiguring 2nd FPGA in a system would mean device = 1

~cRcnfg(): Default destructor; free up dynamically allocated bitstream_t memory, remove mutex etc.

void reconfigureShell(std::string bitstream_path)

Shell reconfiguration Loads the partial bitstream into the internal memory and triggers reconfiguration.

Parameters:: bitstream_path – Path to partial bitstream (typically shell_top.bin inside build/bitstreams)

void reconfigureApp(std::string bitstream_path, int vfid)

App reconfiguration Loads the partial bitstream into the internal memory and triggers reconfiguration of the specific vFPGA.

Parameters:

bitstream_path – Path to partial bitstream (typically shell_top.bin inside build/bitstreams)
vfid – vFPGA ID to be reconfigured

Protected Functions

uint8_t readByte(std::ifstream &fb): Helper function, pops and returns the first byte from the input stream (fb)

bitstream_t readBitstream(std::ifstream &fb)

Read bitstream from a file stream, that can be used for reconfiguration.

Parameters:: fb – File input stream, corresponding to a .bin file (most likely shell_top.bin)
Returns:: bitstream, an in-memory object of type bitstream with virtual address and length

void reconfigureBase(bitstream_t bitstream, uint32_t vfid = -1)

Base reconfiguration function, can be used to reconfigure the whole shell or individual vFPGAs.

Parameters:

bitstream – partial bitstream to use for reconfiguration, obtainable from reconfigureBase
vfid – (optional) vFPGA to reconfigure; default = -1, which reconfigures the entire shell

void *getMem(CoyoteAlloc &&alloc)

Allocates a buffer for storing partial bitstream.

Parameters:: alloc – Allocation parameters; most importantly number of pages for the buffer

void freeMem(void *vaddr)

Releases dynamically allocated memory (allocated using the above function) Similar to the standard C/C++ free() function.

Parameters:: vaddr – corresponding to the buffer to be freed

Protected Attributes

int reconfig_dev_fd = {0}

pid_t pid: Host-side process ID.

uint32_t crid: Unique configuration ID.

boost::interprocess::named_mutex mlock: Global mutex, ensuring no two processes are simultaneously allocating bitstream memory on the same object.

std::unordered_map<void*, CoyoteAlloc> mapped_pages

Protected Static Attributes

static std::atomic_uint32_t crid_gen: A unique generator for crid.

class cSched : public coyote::cRcnfg 

#include <cSched.hpp>

Coyote run-time scheduler.

The scheduler is responsible for managing tasks and functions in the Coyote Users can submit aritrary functions, defined through the cFunc class, to the scheduler. Each function contains a path to its app bistream and the corresponding software-side code. Then, the tasks can be submitted to the scheduler (most commonly done from the cService, though it is possible to write code that interacts directly with the scheduler), which dispatches the tasks based on a scheduling policy. Where needed, the scheduler will also reconfigure the vFPGA bitstream with the one correct for the function. Currently, there are two scheduling policieies implemented: (1) first-come, first-served (FCFS) and (2) minimize reconfigurations. The second one will always execute all the tasks with the same bitstream, avoiding the latency inccured by partial reconfiguration, before proceeding to the next task with a different bitstream.

TODO:

Implement more scheduling policies, such as priority-based scheduling

Public Functions

void start(): Start the scheduler.

void stop(): Stops the scheduler and cleans up resources.

bool addTask(std::unique_ptr<cTask> task)

Adds a task to list of tasks to be executed by the scheduler.

Parameters:: task – Unique pointer to the cTask object representing the task
Returns:: true if the task was added successfully, false if the task ID already exists or if the task is associated with a function that is not registered

bool isTaskCompleted(int32_t tid)

Checks if a task with a given ID is completed.

Note

If task is not found, false is returned

Parameters:: tid – Task ID to check
Returns:: true if the task is completed, false otherwise

cTask *getTask(int32_t tid)

Gets the task with the given ID.

Parameters:: tid – Task ID to get
Returns:: Pointer to the cTask object if found, nullptr otherwise

bool isFunctionRegistered(int32_t fid)

Checks if a function with the given ID is registered in the scheduler.

Parameters:: fid – Function ID to check

bFunc *getFunction(int32_t fid)

Gets the function with the given ID.

Parameters:: fid – Function ID to get
Returns:: Pointer to the cFunc object if found, nullptr otherwise

inline int addFunction(std::unique_ptr<bFunc> fn)

Adds an arbitrary user function to the scheduler.

Each function is uniquely identified by its ID and holds information about the function: path to its bistream and the corresponding software-side code.

Note

Implemented in the header file, since the function is a template.

Parameters:: fn – Unique pointer to the bFunc object representing the function
Returns:: 0 if the function was added successfully, 1 if bitstream cannot be opened, 2 if the function ID already exists

Public Static Functions

static inline cSched *getInstance(int32_t vfid, uint32_t device = 0, bool reorder = true, std::string current_bitstream = "")

Creates an instance of the scheduler for a vFPGA.

If an instance already exists, return the existing instance (“singleton” implementation)

Note

When partial reconfiguration is enabled, the parameter current_bitstream is ignored, since the scheduler will automatically reconfigure the vFPGA with the bitstream corresponding to the function of the task being executed. However, when partial reconfiguration is not enabled, users must specify the current bitstream, so that all tasks that match the bitstream can still be executed without reconfiguration. This scenario could be beneficial for priority-based scheduling or in conjuction with the cService class with one type of function but mutliple connected clients to the service. See schedule(…) for more details.

Parameters:

vfid – Virtual FPGA ID associated with the service
device – Device number, for systems with multiple vFPGAs
reorder – If true, the scheduler will reorder tasks to minimize the number of reconfigurations.
current_bitstream – If a user alread loaded an application bitstream, it can be marked as the active one

Returns:

Pointer to a cSched instance

Private Functions

cSched(int32_t vfid, uint32_t device, bool reorder, std::string current_bitstream): Default constructor; private to ensure the class is implemented as a singleton.

bool taskChecker(int32_t tid)

A utility function that is reaused throughut the scheduler Does the following checks:

Checks if the task ID is present in the task_id_map
Checks whether the task is non-NULL
As a common sanity check, ensures the ID in the map and the ID in the vector match If not, something went seriously wrong when inserting the task to the list of tasks.

Parameters:: tid – Task ID to check
Returns:: true if the task is found and valid, false otherwise

void schedule()

The main function of the scheduler.

It iterates through the list of outstanding tasks and executed the outstanding ones. The scheduling policy depends on the variable scheduling_policy, passed to the class constructor. This function will also reconfigure the vFPGA bitstream, if needed.

Private Members

int32_t vfid: vFPGA ID associated with the scheduler

bool reorder: Allow reordering of tasks to minimize number of reconfigurations.

fpgaCnfg fcnfg: Shell configuration as set before hardware synthesis in CMake.

std::map<int32_t, std::unique_ptr<bFunc>> functions: A map of the functions loaded to the scheduler, each identified by a unique function ID.

std::vector<std::unique_ptr<cTask>> tasks: A list of tasks submitted to the scheduler.

std::map<int32_t, int> task_id_map: A simple map from task ID to its position in the tasks vector; simply used for faster lookups of individual tasks.

std::mutex tlock: Task lock; there are multiple concurrent threads that access the tasks vector, and since vectors can relocate data (e.g., when adding new elements), this can lead to undefined behaviour. E.g., the scheduler thread iterates through the tasks vector, while the addTask() function could be called in the meantime, which would cause the vector to be resized and may cause the iterators to become invalid.

std::thread scheduler_thread: A dedicated thread that runs the scheduler.

bool scheduler_running: A flag indicating whether the scheduler thread is running.

std::string current_bitstream: The currently loaded bitstream.

Private Static Attributes

static std::map<std::string, cSched*> schedulers

Instances of the schedulers.

We only allow one instance of the scheduler per vFPGA on a single device, to avoid conflicting scheduling decisions and reconfigurations.

class cService

#include <cService.hpp>

Coyote background service.

This class implements the Coyote background service which runs on server node and can execute pre-defined Coyote functions. On the client side, the users can connect to this service through the helper class cConn and submit requests to the loaded functions. The service will automatically reconfigure the vFPGA with the correct bistream. The requests can be local or remote.

TODO:

Remote connections

Note

There is currently a bug in terminating the signals. Since the signal handler is static and limited in parameters, is it not aware of what instance should be terminated. Therefore, for now, the signal handler terminates all instances of the service. Users should only terminate the service once all vFPGAs have finished processing requests.

Public Functions

void start()

Starts the service.

This function initializes the daemon, sets up the socket for communication, and starts the scheduler thread to handle incoming requests. It will also accept connections from clients and register them.

inline int addFunction(std::unique_ptr<bFunc> fn)

Adds an arbitrary user function to the service.

Note

Implemented in the header file since it is a templated function

Parameters:: fn – Unique pointer to the bFunc object representing the function
Returns:: 0 if the function was added successfully, 1 if bitstream cannot be opened, 2 if the function ID already exists

Public Static Functions

static inline cService *getInstance(std::string name, bool remote, int32_t vfid, uint32_t device = 0, bool reorder = true, uint16_t port = DEF_PORT)

Creates an instance of the service for a vFPGA.

If an instance already exists, return the existing instance (“singleton” implementation)

Parameters:

name – Unique name for the service
remote – Local or remote service
vfid – Virtual FPGA ID associated with the service
device – Device number, for systems with multiple vFPGAs
reorder – Allow the scheduler to reorder tasks, to minimize reconfigurations
port – Port for remote connections

Private Functions

cService(std::string name, bool remote, int32_t vfid, uint32_t device, bool reorder, uint16_t port): Default constructor; private to ensure the class is implemented as a singleton.

void daemonSigHandler(int signum): Handles signals sent to the background service.

Note

Currently only SIGTERM is handled, which is used to gracefully terminate the service and clean up resources. Other signals are ignored.

void initDaemon(): Initializes the background daemon for this service.

void initSocket(): Initializes the socket for connections to this service, either local or remote.

void cleanConns(): Function that periodically iterates and releases resources (e.g., threads) held by stale connections.

void acceptConnectionLocal(): Accepts a local connection (IPC) to this service.

void acceptConnectionRemote(): Accepts a connection from a remote client to this service.

void processRequests(int connfd)

Processes client requests in a dedicated thread.

This function continuously loops to accept incoming requests for a connected client. It stores the requests in a list and can close the connection if the client sends a close request.

Parameters:: connfd – The connection file descriptor for the client

void sendResponses(int connfd)

Send client responses in a dedicated thread.

This function continuously loops to check whether a client task has been completed and sends the response back to the client.

Parameters:: connfd – The connection file descriptor for the client

Private Members

std::string service_id: Unique service ID, derived from the name.

std::string socket_name: Name of the socket for communication.

bool is_running: Boolean flag indicating whether the service is running; prevents double-starting daemon.

int sockfd: Socket file descriptor for communication.

bool remote: Whether the service receives requests from a remote node or locally.

int32_t vfid: vFPGA ID associated with the service

uint32_t device: Device number, for systems with multiple vFPGAs.

uint16_t port: Port for remote connections.

std::map<int, std::unique_ptr<cThread>> coyote_threads: A map of the connected clients and their corresponding Coyote threads which are used for executing the functions.

std::map<int, std::pair<std::thread, std::thread>> connection_threads: Dedicated threads which process the requests for each connected client; one for incoming request and one for writing the result back.

cSched *scheduler: Scheduler instance; handles the execution of tasks as well as reconfiguration, where required.

std::atomic<int32_t> task_counter: An atomic variable; used for generating unique IDs for tasks on the server side.

std::map<int, std::vector<std::pair<int32_t, int32_t>>> tasks: A map of client-submitted tasks When a client submits a task, it holds an ID which is written back with the task result, so that the client can link the result to the task (see cConn::checkCompletedTasks() for details). However, there is no guarantee that the client-submitted task ID is globally unique (because of multiple clients), so for each task, the cService also stores a server-generated task ID.

std::map<int, std::unique_ptr<std::mutex>> task_locks

Task map locks Since the map is written by the processRequests() thread and read by the

sendResponses() thread, thread safety must be ensured when accesing the map.

std::map<int, bool> conns_to_clean: A list of connection to be cleaned up; if the bool value is true, the connection is stale and should be cleaned up; false indicated it’s been cleaned up.

std::thread cleanup_thread: Dedicated thread that periodically iterates conns_to_clean and release stale connection threads and resources.

bool run_cleanup_thread: A boolean flag indicating whether the clean-up thread is running.

Private Static Functions

static void sigHandler(int signum): Just a wrapper around daemonSigHandler; since the handler must be static.

Private Static Attributes

static std::map<std::string, cService*> services

Instances of the services.

We only allow one instance of the service per vFPGA on a single device, to ensure that mutliple services do not run in parallel on the same vFPGA, which can lead to multiple reconfigurations, execution conflicts etc. The map of the key is the device ID concatenated with the vFPGA ID.

class cTask

#include <cTask.hpp>

A task represents a single request to execute a function.

This class encapsulates all the necessary logic and metadata to execute a Coyote function (cFunc). For example, the Coyote scheduler (cSched) keeps track of all the tasks and based on the scheduling policy, decides the next one to be executed.

Public Functions

cTask(int32_t tid, int32_t fid, size_t ret_val_size, cThread *cthread = nullptr, std::vector<std::vector<char>> fn_args = {}): Default constructor; sets the unique task ID and the associated function, sets the args, init other params to default value.

int32_t getTid() const: Getter: Task ID.

int32_t getFid() const: Getter: Function ID.

bool isCompleted() const: Checks if the task is completed.

void setCompleted(bool val): Sets the value of is_completed.

cThread *getCThread() const: Getter: Pointer to associated cThread.

std::vector<std::vector<char>> getArgs() const: Getter: Function arguments.

std::vector<char> getRetVal() const: Getter: Function return value.

void setRetVal(const std::vector<char> retval): Setter: Function return value.

size_t getRetValSize() const: Getter: Function return value size.

int32_t getRetCode() const: Getter: Function return code.

void setRetCode(int32_t retcode): Setter: Function return code.

Private Members

int32_t tid: Unique task identifier.

int32_t fid: ID of the function to be executed for this task o.

bool is_completed: Set to true when the task is completed.

cThread *cthread: Pointer to the cThread that executes this task (it is passed to the cFunc as the first argument)

std::vector<std::vector<char>> fn_args: Arguments for the function to be executed; see cFunc for detail on why a vector of char buffers is used.

std::vector<char> ret_val: Function return value; see cFunc for detail on why a char buffer is is used.

size_t ret_val_size: Size of the function return value; primarily a util value used for deserializing the char buffer.

int32_t ret_code: Function return code; a non-zero value indicates an error in the function execution.

class cThread

#include <cThread.hpp>

The cThread class is the core component of Coyote for interacting with vFPGAs.

This class provides methods for memory management, data transfer operations, and synchronization with the vFPGA device. It also handles user interrupts and out-of-band set-up for RDMA operations. It abstracts the interaction with the char vfpga_device in the driver, providing a high-level interface for Coyote operations.

Public Functions

cThread(int32_t vfid, pid_t hpid, uint32_t device = 0, std::function<void(int)> uisr = nullptr)

Default constructor for the cThread.

Parameters:

vfid – Virtual FPGA ID
hpid – Host process ID
device – Device number, for systems with multiple vFPGAs
uisr – User interrupt (notifications) service routine, called when an interrupt from the vFPGA is received

~cThread()

Default destructor for the cThread.

Cleans up the resources used by the cThread, including memory and file descriptors.

void userMap(void *vaddr, uint32_t len)

Maps a buffer to the vFPGAs TLB.

Parameters:

vaddr – Virtual address of the buffer
len – Length of the buffer, in bytes

void userUnmap(void *vaddr)

Unmaps a buffer from the the vFPGAs TLB.

Parameters:: vaddr – Virtual address of the buffer

void *getMem(CoyoteAlloc &&alloc)

Allocates memory for this cThread and maps it into the vFPGA’s TLB.

Parameters:: alloc – CoyoteAlloc object containing the allocation parameters, including size, type (e.g., hugepage, GPU) etc.
Returns:: Pointer to the alocated memory

void freeMem(void *vaddr)

Frees and unmaps previously allocated memory.

Parameters:: vaddr – Virtual address of the buffer to be freed

void setCSR(uint64_t val, uint32_t offs)

Sets a control register in the vFPGA at the specified offset.

Parameters:

val – Register value to be set
offs – Offset of the control register to be set

uint64_t getCSR(uint32_t offs) const

Reads from a register in the vFPGA at the specified offset.

Parameters:: offs – Offset of the register to be read
Returns:: Value of the register at the specified offset

void invoke(CoyoteOper oper, syncSg sg)

Invokes a Coyote sync or offload operation with the specified scatter-gather list (sg)

Note

Syncs and off-loads are blocking (synchronous) by design

Parameters:

oper – Operation be invoked, in this case must be either CoyoteOper::LOCAL_SYNC or CoyoteOper::LOCAL_OFFLOAD
sg – Scatter-gather entry, specifying the memory address and length for the operation

void invoke(CoyoteOper oper, localSg sg, bool last = true)

Invokes a one-sided local Coyote operation with the specified scatter-gather list (sg)

Note

Local operations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()

Note

Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not

Parameters:

oper – Operation be invoked, in this case must be either CoyoteOper::LOCAL_READ or CoyoteOper::LOCAL_WRITE
sg – Scatter-gather entry, specifying the memory address, length and stream for the operation
last – Indicates whether this is the last operation in a sequence (default: true)

void invoke(CoyoteOper oper, localSg src_sg, localSg dst_sg, bool last = true)

Invokes a two-sided local Coyote operation with the specified scatter-gather list (sg)

Note

Local operations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()

Note

Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not

Parameters:

oper – Operation be invoked, in this case must be CoyoteOper::LOCAL_TRANSFER
src_sg – Source scatter-gather entry, specifying the memory address, length and stream
dst_sg – Destination scatter-gather entry, specifying the memory address, length and stream
last – Indicates whether this is the last operation in a sequence (default: true)

void invoke(CoyoteOper oper, rdmaSg sg, bool last = true)

Invokes an RDMA operation with the specified scatter-gather list (sg)

Note

Remote oeprations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()

Note

Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not

Parameters:

oper – Operation be invoked, in this case must be CoyoteOper::RDMA_WRITE or CoyoteOper::RDMA_READ
sg – Scatter-gather entry, specifying the RDMA operation parameters
last – Indicates whether this is the last operation in a sequence (default: true)

void invoke(CoyoteOper oper, tcpSg sg, bool last = true)

Invokes a TCP operation with the specified scatter-gather list (sg)

Note

TCP operations aren’t fully stable in Coyote 0.2.1, to be updated in the future

Parameters:

oper – Operation be invoked, in this case must be CoyoteOper::TCP_SEND
sg – Scatter-gather entry, specifying the TCP operation parameters
last – Indicates whether this is the last operation in a sequence (default: true)

uint32_t checkCompleted(CoyoteOper oper) const

Returns the number of completed operations for a given Coyote operation type.

Parameters:: oper – Operation to be queried
Returns:: Cumulative number of completed operations for the specified operation type, since the last clearCompleted() call

void clearCompleted(): Clears all the completion counters (for all operations)

void connSync(bool client)

Synchronizes the connection between the client and server.

Parameters:: client – If true, this cThread acts as a client; otherwise, it acts as a server

void *initRDMA(uint32_t buffer_size, uint16_t port, const char *server_address = nullptr)

Sets up the cThread for RDMA operations.

This function creates an out-of-band connection to the server, which is used to exchange the queue pair (QP) between the nodes. Additionally, it allocates a buffer for the RDMA operations and returns a pointer to the allocated buffer.

Parameters:

buffer_size – Size of the buffer to be allocated for RDMA operations
port – Port number to be used for the out-of-band connection
server_address – Optional server address to connect to; if not provided, this cThread acts as the server

void closeConn(): Opposite of initRDMA; releases the the out-of-band connection which was used to exchange QP.

void lock()

Locks the vFPGA for exclusive access by this cThread.

Locking ensures no other operation (even from other processes) is performed on the vFPGA concurrently. However, this may not always be desirable, as shown in Example 8 multi-threading. Generally, this method is typically not required and may mainly be needed when there are multiple software processes/threads targetting the same vFPGA simultaneously which can lead to undefined behaviour

void unlock(): Unlocks the vFPGA for exclusive access by this cThread.

int32_t getVfid() const: Getter: vFPGA ID (vfid)

int32_t getCtid() const: Getter: Coyote thread ID (ctid)

pid_t getHpid() const: Getter: Host process ID (hpid)

void printDebug() const: Utility function, prints stats about this cThread including the number of commands invalidations etc.

Protected Functions

void mmapFpga(): Utility function, memory mapping all the vFPGA control registers and writeback regions.

void munmapFpga(): Utility function, unmapping all the vFPGA control registers and writeback regions.

void postCmd(uint64_t offs_3, uint64_t offs_2, uint64_t offs_1, uint64_t offs_0)

Posts a DMA command to the vFPGA.

This function triggers a DMA command by writing the provided offsets to the appropriate control registers.

Parameters:

offs_3 – Destination address
offs_2 – Destination control signals (e.g., size, offset, stream etc.)
offs_1 – Source address
offs_0 – Source control signals (e.g., size, offset, stream etc.)

void sendAck(uint32_t ack)

Sends an ack to the connected remote node via the out-of-band channel.

Note

Utility function, primarily used for syncing clients and servers between benchmarks and operations

Parameters:: ack – Acknowledgment value to be sent

uint32_t readAck()

Reads an ack from the connected remote node via the out-of-band channel.

Note

Utility function, primarily used for syncing clients and servers between benchmarks and operations This function works in conjunction with sendAck() to synchronize operations between the client and server.

Returns:: Acknowledgment value received from the remote node

void doArpLookup(uint32_t ip_addr)

Writes an IP address to a config register so it can be used for ARP lookup.

Parameters:: ip_addr – IP address to be looked up

void writeQpContext(uint32_t port)

Writes the exchanged QP information to the vFPGA config registers.

Parameters:: ip_addr – IP address to be looked up

Protected Attributes

int32_t fd = {0}: vFPGA device file descriptor

int32_t vfid = {-1}: vFPGA virtual ID

int32_t ctid = {-1}: Coyote thread ID.

pid_t hpid = {0}: Host process ID.

fpgaCnfg fcnfg: Shell configuration, as set by the user in CMake config.

std::unique_ptr<ibvQp> qpair: RDMA queue pair.

uint32_t cmd_cnt = {0}: Number data transfer commands sent to the vFPGA.

int32_t efd = {-1}: User interrupt file descriptor.

int32_t terminate_efd = {-1}: Termination event file descriptor for stopping the user interrupt thread.

std::thread event_thread: Dedicated thread for handling user interrupts.

volatile uint64_t *cnfg_reg = {0}

vFPGA config registers, if AVX is enabled, as implemented in cnfg_slave_avx.sv; used mainly for starting DMA commands

vFPGA config registers, if AVX is disabled, as implemented in cnfg_slave.sv; used mainly for starting DMA commands

volatile uint64_t *ctrl_reg = {0}: User-defined control registers, which can be parsed using axi_ctrl in the vFPGA.

volatile uint32_t *wback = {0}: Pointer to writeback region, if enabled.

std::unordered_map<void*, CoyoteAlloc> mapped_pages: A map of all the pages that have been allocated and mapped for this thread.

int connfd = {-1}: Out-of-band connection file descriptor to a remote node This connection is primarily used for exchanging of QPs and syncing (barriers) between operations

int sockfd = {-1}: Out-of-band socket file descriptor for the cThread This socket is initially used to establish an out-of-band connection (connfd) to a remote node for exchanging QP information and for sending/receiving acknowledgments.

bool is_connected: Set to true if there is an active out-of-band connection to a remote node for this cThread.

boost::interprocess::named_mutex vlock: Inter-process vFPGA lock, see lock() and unlock() functions for more details.

bool lock_acquired = {false}: Set to true if the vFPGA lock is acquired by this cThread; used to release the lock in the destructor.

struct localSg

#include <cOps.hpp>

Scatter-gather entry for local operations (LOCAL_READ, LOCAL_WRITE, LOCAL_TRANSFER)

Public Members

void *addr = {nullptr}: Buffer address.

uint32_t len = {0}: Buffer length in bytes.

uint32_t stream = {STRM_HOST}: Buffer stream: HOST or CARD.

uint32_t dest = {0}: Target AXI4 destination stream in the vFPGA; a value of i will use the to axis_(host|card)_(recv|send)[i] in the vFPGA.

struct rdmaSg

#include <cOps.hpp>

Scatter-gather entry for RDMA operations (REMOTE_READ, REMOTE_WRITE) NOTE: No field for source/dest address, since these are defined when exchanging queue pair information And, each cThread holds exactly one queue pair, so the source and destination addresses are always the same.

Public Members

uint64_t local_offs = {0}: Offset from the local buffer address; in case the buffer to be sent doesn’t need to start from the exchanged virtual address.

uint32_t local_stream = {STRM_HOST}: Source buffer stream: HOST or CARD.

uint32_t local_dest = {0}: Target AXI4 source stream in the vFPGA; a value of i will write pull data for the RDMA operation from axis_(host|card)_recv[i] in the vFPGA.

uint64_t remote_offs = {0}

uint32_t remote_dest = {0}: Target AXI4 destination stream; a value of i will write write data to axis_(host|card)_send[i] in the remote vFPGA.

uint32_t len = {0}: Lenght of the RDMA transfer, in bytes.

struct syncSg

#include <cOps.hpp>

Scatter-gather entry for sync and offload operations.

Public Members

void *addr = {nullptr}: Buffer address to be synced/offloaded.

uint64_t len = {0}: Size of the buffer in bytes.

struct tcpSg

#include <cOps.hpp>

Scatter-gather entry for TCP operations (REMOTE_TCP_SEND)

Public Members

uint32_t stream = {STRM_TCP}

uint32_t dest = {0}

uint32_t len = {0}

namespace coyote

Typedefs

using bitstream_t = std::pair<void*, uint32_t>: Bitstream alias: pointer to buffer holding its contents and its length.

Enums

enum class CoyoteOper

Various Coyote operations that allow users to move data from/to host memory, FPGA memory and remote nodes.

Values:

enumerator NOOP: No operation.

enumerator LOCAL_READ: Transfers data from CPU or FPGA memory to the vFPGA stream (axis_(host|card)_recv[i]), depending on sgEntry.local.src_stream.

enumerator LOCAL_WRITE: Transfers data from a vFPGA stream (axis_(host|card)_send[i]) to CPU or FPGA memory, depending on sgEntry.local.src_stream.

enumerator LOCAL_TRANSFER: LOCAL_READ and LOCAL_WRITE in parallel; dataflow is (CPU or FPGA) memory => vFPGA => (CPU or FPGA) memory.

enumerator LOCAL_OFFLOAD: Migrates data from CPU memory to FPGA memory (HBM/DDR)

enumerator LOCAL_SYNC: Migrates data from FPGA memory (HBM/DDR) to CPU memory.

enumerator REMOTE_RDMA_READ: One-side RDMA read operation.

enumerator REMOTE_RDMA_WRITE: One-sided RDMA write operation.

enumerator REMOTE_RDMA_SEND: Two-sided RDMA send operation.

enumerator REMOTE_TCP_SEND: TCP send operation; NOTE: Currently unsupported due to bugs; to be brought back in future releases of Coyote.

enum class CoyoteAllocType

Different types of memory allocation that can be used in Coyote.

Values:

enumerator REG: Regular pages (typically 4KB on Linux)

enumerator THP: Transparent huge pages (THP); obtained by allocating consecutve regular pages; NOTE: Users should use HPF where possible; THP should be used if the system doesn’t natively support huge pages

enumerator HPF: Huge pages (HPF) (typically 2MB on Linux)

enumerator PRM: Partial reconfiguration memory, used for storing reconfiguration bitstreams.

enumerator GPU: Memory on the GPU (for GPU-FPGA DMA)

Functions

inline bool isLocalRead(CoyoteOper oper)

inline bool isLocalWrite(CoyoteOper oper)

inline bool isLocalSync(CoyoteOper oper)

inline bool isRemoteRdma(CoyoteOper oper)

inline bool isRemoteRead(CoyoteOper oper)

inline bool isRemoteWrite(CoyoteOper oper)

inline bool isRemoteSend(CoyoteOper oper)

inline bool isRemoteWriteOrSend(CoyoteOper oper)

inline bool isRemoteTcp(CoyoteOper oper)

file bFunc.hpp: #include <string>

#include <vector>

#include “cThread.hpp”

file cBench.hpp: #include <chrono>

#include <vector>

#include <algorithm>

#include “cDefs.hpp”

file cConn.hpp: #include <atomic>

#include <string>

#include <vector>

#include <iostream>

#include <unistd.h>

#include <sys/un.h>

#include <sys/socket.h>

#include “cTask.hpp”

#include “cDefs.hpp”

file cFunc.hpp: #include <vector>

#include <string>

#include <cstdint>

#include <functional>

#include <filesystem>

#include “bFunc.hpp”

#include “cThread.hpp”

file cGpu.hpp

file cOps.hpp: #include “cDefs.hpp”

file cRcnfg.hpp: #include <atomic>

#include <fcntl.h>

#include <fstream>

#include <unistd.h>

#include <sys/mman.h>

#include <unordered_map>

#include <boost/interprocess/sync/named_mutex.hpp>

#include “cOps.hpp”

#include “cDefs.hpp”

file cSched.hpp: #include <map>

#include <mutex>

#include <vector>

#include <fstream>

#include <cstdint>

#include <syslog.h>

#include “bFunc.hpp”

#include “cTask.hpp”

#include “cRcnfg.hpp”

file cService.hpp: #include <map>

#include <mutex>

#include <vector>

#include <string>

#include <signal.h>

#include <unistd.h>

#include <sys/un.h>

#include <syslog.h>

#include <sys/stat.h>

#include “cFunc.hpp”

#include “cSched.hpp”

#include “cThread.hpp”

file cTask.hpp: #include <map>

#include <vector>

#include <cstdint>

#include “cThread.hpp”

file cThread.hpp: #include <thread>

#include <chrono>

#include <string>

#include <random>

#include <fstream>

#include <iostream>

#include <functional>

#include <unordered_map>

#include <fcntl.h>

#include <netdb.h>

#include <syslog.h>

#include <unistd.h>

#include <sys/mman.h>

#include <sys/ioctl.h>

#include <sys/epoll.h>

#include <sys/eventfd.h>

#include <linux/mman.h>

#include <boost/interprocess/sync/named_mutex.hpp>

#include “cDefs.hpp”

#include “cOps.hpp”

#include “cGpu.hpp”