Software
-
class bFunc
- #include <bFunc.hpp>
Base user function.
This class should not be used directly; instead it is be inherited in the cFunc class, which defines the arbitrary user functions and the corresponding bitstreams. Therefore this class is fully abstract and all the functions are pure virtual. Additionally, since it is a fully abstract class, there is no corresponding .cpp file.
The only reason this class exists is because the cFunc class has a variadic template, making it difficult to include in other classes. For example, the cService class keeps a map of functions added to the Coyote background service, and, since each function is user-defined, it can have different templates and arguments.
Subclassed by coyote::cFunc< ret, args >
Public Functions
-
inline virtual ~bFunc()
-
virtual std::vector<char> run(cThread *coyote_thread, const std::vector<std::vector<char>> &args) = 0
-
virtual int32_t getFid() const = 0
-
virtual std::string getBitstreamPath() const = 0
-
virtual std::pair<void*, uint32_t> getBitstreamPointer() const = 0
-
virtual void setBitstreamPointer(std::pair<void*, uint32_t> bitstream_pointer) = 0
-
virtual std::vector<size_t> getArgumentSizes() const = 0
-
virtual size_t getReturnSize() const = 0
-
inline virtual ~bFunc()
-
class cBench
- #include <cBench.hpp>
Helper class for benchmarking various functions in Coyote.
At a high-level, it executes some function a number of times and records its duration Then, it can be used for outputting run-time statistics, such as average, minimum, maximum etc.
Public Functions
-
cBench(unsigned int n_runs = 1000, unsigned int n_warmups = 100)
Default constructor; user can define number of test runs and also the number of warm-up runs, which don’t affect time measurements.
-
template<class BenchFunc, typename ...BenchArgs, class PrepFunc, typename ...PrepArgs>
inline void execute(BenchFunc const &bench_func, BenchArgs... bench_args, PrepFunc const &prep_func, PrepArgs... prep_args) Benchmark function execution (measure the duration)
We use functional programming + variadic templates: A function takes another function as argument + arbitrary number of other arguments
Note
Even though this is a header file, we define the function here (but only this one, the others are defined in cBench.cpp) The reason for that being is that at compile time get an -fpermissive warning of no definition for this function However, declaring it here and defining it in cBench.cpp doesn’t work as it requires a template specialiastion See here for more details: https://isocpp.org/wiki/faq/templates#templates-defn-vs-decl
- Parameters:
bench_func – Function to be benchmarked
bench_args – Arguments passed to benchmark function
prep_func – Function executed before benchmark (any prep work)
prep_args – Arguments passed to prep function
-
double getAvg()
Returns the mean execution time; averaged over n_runs.
-
double getMin()
Returns the minimum execution time out of the n_runs recorded times.
-
double getMax()
Returns the maximum execution time out of the n_runs recorded times.
-
double getP25()
Returns the P25 execution time out of the n_runs recorded times.
-
double getP50()
Returns the P50 execution time out of the n_runs recorded times.
-
double getP75()
Returns the P75 execution time out of the n_runs recorded times.
-
double getP95()
Returns the P95 execution time out of the n_runs recorded times.
-
double getP99()
Returns the P99 execution time out of the n_runs recorded times.
-
cBench(unsigned int n_runs = 1000, unsigned int n_warmups = 100)
-
class cConn
- #include <cConn.hpp>
Coyote connection class.
A utility class that allows clients to connect to a Coyote background service and submit tasks to be executed on the server side. The class supports both blocking and non-blocking tasks.
Public Functions
-
cConn(std::string sock_name)
Default constructor for local connections.
When called, this constructor create a local connection to a Coyote service, as implemented in cService.hpp
- Parameters:
sock_name – The name of the Coyote socket, as registed by the server
-
~cConn()
Default destructor; sends a request to close the connection.
-
bool isTaskCompleted(int32_t tid)
Checks if a task with the given ID is completed.
- Parameters:
tid – Task ID to check
- Returns:
true if the task is completed, false otherwise
-
template<typename ret, typename ...args>
inline ret task(int32_t fid, args... msg) Submits a task to the Coyote service; blocking - waits until the task is completed.
Note
Implemnted in the header file, since it is a template function.
Note
This function can throw a runtime_error if there are failures in the sending the payload to the server or if the server returns a non-zero code (e.g., timeouts, function not found, etc.)
Note
Users must ensure they pass the correct template arguments, matching the function signature on the server; the server simply serializes a byte array into the target arguments; so if incorrect templates are passed, a wrong value may be returned.
- Parameters:
fid – Function ID of the request
msg – Variable number of arguments to be sent to the server
- Returns:
The return value of executed function
-
template<typename ret, typename ...args>
inline int32_t iTask(int32_t fid, args... msg) Submits a task to the Coyote service; non-blocking - exits immediately after sending the request.
Users should use isTaskCompleted(int32_t tid) to query the status of the task and if complete, getTaskRetVal(int32_t tid) to retrieve the return value.
Note
Implemnted in the header file, since it is a template function.
Note
This function can throw a runtime_error if there are failures in the sending the payload to the server.
Note
Users must ensure they pass the correct template arguments, matching the function signature on the server; the server simply serializes a byte array into the target arguments; so if incorrect templates are passed, a wrong value may be returned.
- Parameters:
fid – Function ID of the request
msg – Variable number of arguments to be sent to the server
- Returns:
Unique task ID
-
template<typename ret>
inline ret getTaskReturnValue(int32_t tid) Obtains the task return value from the server.
This function should only be called after the task has been submitted and marked as completed, by checking isTaskCompleted(int32_t tid). Otherwise, the return value may be wrong / zero.
Note
Implemnted in the header file, since it is a template function.
Note
This function can throw a runtime_error if the server returns a non-zero code for the task. This can happen if the requested function doesn’t exist, timeouts, wrong argument serialization etc.
Note
Users must ensure they pass the correct template for ret, matching the function signature on the server; the server simply serializes a byte array into the response; so if an incorrect incorrect template is passed, a wrong value may be returned.
- Parameters:
tid – Task ID, as obtained from iTask()
- Returns:
The return value of executed function
Private Functions
-
void checkCompletedTasks()
Periodically checks for completed tasks and update the task map.
Private Members
-
int sockfd = -1
Connection socket file descriptor.
-
std::atomic<int32_t> task_counter
An atomic variable; used for generating unique IDs for tasks.
-
std::thread completion_thread
A dedicated thread that periodically checks for completed tasks.
-
bool run_thread
Set to true when the completion thread is running.
-
cConn(std::string sock_name)
-
template<typename ret, typename ...args>
class cFunc : public coyote::bFunc - #include <cFunc.hpp>
User-defined functions.
This class is a template for user-defined functions. Each function is associated with a specific application bitstream and the corresponding software-side function to be executed. The functions are implemented using variadic templates to allow for a variable number of parameters to be passed. This class is expected to be used in conjuction with Coyote services (cService) and requests (cReq). For an example, refer to Example 9 in examples/.
Note
Since this class is a template, it must be implemented in the header file Otherwise, it leads to compilatiation errors. An alternative is to use template specialization, but it is not applicable in this case, since the function arguments are arbitrary and not known ahead of time.
Public Functions
-
inline cFunc(int32_t fid, std::string app_bitstream, std::function<ret(cThread*, args...)> fn)
Default constructor; converts the app_bitstream path to an absolute path.
-
inline ~cFunc()
Default destructor.
-
inline virtual std::vector<char> run(cThread *coyote_thread, const std::vector<std::vector<char>> &x) override
Executes the function with the given arguments.
Note
The cService holds a list of functions registered with the background service To do so, we need to implement a base (non-tempalated) bFunc class (otherwise it becomes very hard to store the functions in a map). However, since the base class is not templated, this function must also be non-templated. Therefore, the run function takes the arguments as a vector of char buffers (std::vector<char>). Each char buffer is then unpacked into the corresponding argument. There are alternatives to this implementation (e.g., using std::any); however, using char buffer provides one of the simplest solutions, with no reliance on complex data types. Additionally, when the function arguments are received in the server (processRequests() function), they are naurally written to a char buffer, since they are contigious, byte-addressable and easily cast to other data types.
- Parameters:
coyote_thread – Pointer to the cThread object
x – List of arguments passed as vector of char buffers, one buffer per argument
- Returns:
The result of the function execution, serialized into a char buffer
-
inline virtual std::pair<void*, uint32_t> getBitstreamPointer() const override
Returns a pointer to the bitstream memory and its size.
-
inline virtual void setBitstreamPointer(std::pair<void*, uint32_t> bitstream_pointer) override
Sets the bitstream memory for this function.
Once the bistream has been loaded from disk to memory (using functions cRcnfg), this function updates the bitstream_pointer variable with its address and size.
- Parameters:
bitstream_pointer – A pair containing the pointer to the bitstream memory and its size
-
inline virtual std::vector<size_t> getArgumentSizes() const override
Returns a vector of sizes, one for of the function arguments.
Example: For args = {int64_t, float, bool}, the return is std::vector<size_t> = {8, 4, 1}
- Returns:
A vector of sizes of the function argument
-
inline virtual size_t getReturnSize() const override
Similar to above, returns the size of the return value of the function.
-
inline virtual int32_t getFid() const override
Getter: Function ID.
-
inline virtual std::string getBitstreamPath() const override
Getter: Bitstream path.
Private Functions
-
template<std::size_t... I>
inline std::tuple<args...> unpackArgs(const std::vector<std::vector<char>> &x, std::index_sequence<I...>) Utility function; unpacks the arguments from a vector of char buffers into a tuple.
This function uses parameter pack expansion and lambda function to unpack the arguments
- Parameters:
x – Vector of char buffers, one for each argument
I – Index sequence for unpacking
- Returns:
A tuple containing the unpacked arguments
Private Members
-
int32_t fid
Unique function identifier.
-
std::string app_bitstream
Path to the application bitstream.
-
std::pair<void*, uint32_t> bitstream_pointer
Once the bitstream is loaded from disk, this variable holds the pointer to the bitstream memory and its size.
-
std::function<ret(cThread*, args...)> fn
Body of the software function to be executed.
Each function is a callable object, which by definition takes a cThread pointer which interacts with the vFPGA, and a variable number of arguments, that represent the parameters of the function. ret represens the return type of the function.
-
inline cFunc(int32_t fid, std::string app_bitstream, std::function<ret(cThread*, args...)> fn)
-
struct CoyoteAlloc
- #include <cOps.hpp>
Public Members
-
CoyoteAllocType alloc = {CoyoteAllocType::REG}
Type of allocated memory.
-
uint32_t size = {0}
Size of the allocated memory.
-
bool remote = {false}
Is this buffer used for remote operations?
-
uint32_t gpu_dev_id = {3}
GPU device ID (when alloc == CoyoteAllocType::GPU)
-
int32_t gpu_dmabuf_fd = {0}
File descriptor for the DMABuff used for GPU memory.
-
void *mem = {nullptr}
Pointer to the allocated memory; the struct keeps track of it so that it can be freed automatically after use.
-
CoyoteAllocType alloc = {CoyoteAllocType::REG}
-
class cRcnfg
- #include <cRcnfg.hpp>
Coyote reconfiguration class Used for loading partial bitstreams to FPGA memory and triggering reconfiguration Used for both shell reconfiguration (dynamic + user layer) and app reconfiguration (2nd-level PR)
The most important function for users here is
reconfigureShell(std::string bistream_path)
` Which is used to reconfigure the entire shell (dynamic + user layer); for more details see comments belowIn general, the flow of reconfiguration is (all of which is abstracted by
reconfigureShell
):Allocate host-side, kernel memory to hold the partial bitsream (void* getMem, internally calling the Coyote driver)
Map the allocated memory to the user-space (using reconfig_mmap from the Coyote driver)
Load the bitsream from disk and store it to the allocated memory
Trigger reconfiguration by writing the memory to FPGA memory and asserting the correct registers
Once complete, release the allocated memory (void freeMem, internally calling the Coyote driver)
Subclassed by coyote::cSched
Public Functions
-
cRcnfg(unsigned int device = 0)
Default reconfiguration constructor.
- Parameters:
device – Target (physical) FPGA device to reconfigure Only important for systems with multiple FPGA cards e.g., reconfiguring 2nd FPGA in a system would mean device = 1
-
~cRcnfg()
Default destructor; free up dynamically allocated bitstream_t memory, remove mutex etc.
-
void reconfigureShell(std::string bitstream_path)
Shell reconfiguration Loads the partial bitstream into the internal memory and triggers reconfiguration.
- Parameters:
bitstream_path – Path to partial bitstream (typically shell_top.bin inside build/bitstreams)
-
void reconfigureApp(std::string bitstream_path, int vfid)
App reconfiguration Loads the partial bitstream into the internal memory and triggers reconfiguration of the specific vFPGA.
- Parameters:
bitstream_path – Path to partial bitstream (typically shell_top.bin inside build/bitstreams)
vfid – vFPGA ID to be reconfigured
Protected Functions
-
uint8_t readByte(std::ifstream &fb)
Helper function, pops and returns the first byte from the input stream (fb)
-
bitstream_t readBitstream(std::ifstream &fb)
Read bitstream from a file stream, that can be used for reconfiguration.
- Parameters:
fb – File input stream, corresponding to a .bin file (most likely shell_top.bin)
- Returns:
bitstream, an in-memory object of type bitstream with virtual address and length
-
void reconfigureBase(bitstream_t bitstream, uint32_t vfid = -1)
Base reconfiguration function, can be used to reconfigure the whole shell or individual vFPGAs.
- Parameters:
bitstream – partial bitstream to use for reconfiguration, obtainable from reconfigureBase
vfid – (optional) vFPGA to reconfigure; default = -1, which reconfigures the entire shell
-
void *getMem(CoyoteAlloc &&alloc)
Allocates a buffer for storing partial bitstream.
- Parameters:
alloc – Allocation parameters; most importantly number of pages for the buffer
-
void freeMem(void *vaddr)
Releases dynamically allocated memory (allocated using the above function) Similar to the standard C/C++ free() function.
- Parameters:
vaddr – corresponding to the buffer to be freed
Protected Attributes
-
int reconfig_dev_fd = {0}
-
pid_t pid
Host-side process ID.
-
uint32_t crid
Unique configuration ID.
-
boost::interprocess::named_mutex mlock
Global mutex, ensuring no two processes are simultaneously allocating bitstream memory on the same object.
-
std::unordered_map<void*, CoyoteAlloc> mapped_pages
Protected Static Attributes
-
static std::atomic_uint32_t crid_gen
A unique generator for crid.
-
class cSched : public coyote::cRcnfg
- #include <cSched.hpp>
Coyote run-time scheduler.
The scheduler is responsible for managing tasks and functions in the Coyote Users can submit aritrary functions, defined through the cFunc class, to the scheduler. Each function contains a path to its app bistream and the corresponding software-side code. Then, the tasks can be submitted to the scheduler (most commonly done from the cService, though it is possible to write code that interacts directly with the scheduler), which dispatches the tasks based on a scheduling policy. Where needed, the scheduler will also reconfigure the vFPGA bitstream with the one correct for the function. Currently, there are two scheduling policieies implemented: (1) first-come, first-served (FCFS) and (2) minimize reconfigurations. The second one will always execute all the tasks with the same bitstream, avoiding the latency inccured by partial reconfiguration, before proceeding to the next task with a different bitstream.
TODO:
Implement more scheduling policies, such as priority-based scheduling
Public Functions
-
void start()
Start the scheduler.
-
void stop()
Stops the scheduler and cleans up resources.
-
bool addTask(std::unique_ptr<cTask> task)
Adds a task to list of tasks to be executed by the scheduler.
- Parameters:
task – Unique pointer to the cTask object representing the task
- Returns:
true if the task was added successfully, false if the task ID already exists or if the task is associated with a function that is not registered
-
bool isTaskCompleted(int32_t tid)
Checks if a task with a given ID is completed.
Note
If task is not found, false is returned
- Parameters:
tid – Task ID to check
- Returns:
true if the task is completed, false otherwise
-
cTask *getTask(int32_t tid)
Gets the task with the given ID.
- Parameters:
tid – Task ID to get
- Returns:
Pointer to the cTask object if found, nullptr otherwise
-
bool isFunctionRegistered(int32_t fid)
Checks if a function with the given ID is registered in the scheduler.
- Parameters:
fid – Function ID to check
-
bFunc *getFunction(int32_t fid)
Gets the function with the given ID.
- Parameters:
fid – Function ID to get
- Returns:
Pointer to the cFunc object if found, nullptr otherwise
-
inline int addFunction(std::unique_ptr<bFunc> fn)
Adds an arbitrary user function to the scheduler.
Each function is uniquely identified by its ID and holds information about the function: path to its bistream and the corresponding software-side code.
Note
Implemented in the header file, since the function is a template.
- Parameters:
fn – Unique pointer to the bFunc object representing the function
- Returns:
0 if the function was added successfully, 1 if bitstream cannot be opened, 2 if the function ID already exists
Public Static Functions
-
static inline cSched *getInstance(int32_t vfid, uint32_t device = 0, bool reorder = true, std::string current_bitstream = "")
Creates an instance of the scheduler for a vFPGA.
If an instance already exists, return the existing instance (“singleton” implementation)
Note
When partial reconfiguration is enabled, the parameter current_bitstream is ignored, since the scheduler will automatically reconfigure the vFPGA with the bitstream corresponding to the function of the task being executed. However, when partial reconfiguration is not enabled, users must specify the current bitstream, so that all tasks that match the bitstream can still be executed without reconfiguration. This scenario could be beneficial for priority-based scheduling or in conjuction with the cService class with one type of function but mutliple connected clients to the service. See schedule(…) for more details.
- Parameters:
vfid – Virtual FPGA ID associated with the service
device – Device number, for systems with multiple vFPGAs
reorder – If true, the scheduler will reorder tasks to minimize the number of reconfigurations.
current_bitstream – If a user alread loaded an application bitstream, it can be marked as the active one
- Returns:
Pointer to a cSched instance
Private Functions
-
cSched(int32_t vfid, uint32_t device, bool reorder, std::string current_bitstream)
Default constructor; private to ensure the class is implemented as a singleton.
-
bool taskChecker(int32_t tid)
A utility function that is reaused throughut the scheduler Does the following checks:
Checks if the task ID is present in the task_id_map
Checks whether the task is non-NULL
As a common sanity check, ensures the ID in the map and the ID in the vector match If not, something went seriously wrong when inserting the task to the list of tasks.
- Parameters:
tid – Task ID to check
- Returns:
true if the task is found and valid, false otherwise
-
void schedule()
The main function of the scheduler.
It iterates through the list of outstanding tasks and executed the outstanding ones. The scheduling policy depends on the variable scheduling_policy, passed to the class constructor. This function will also reconfigure the vFPGA bitstream, if needed.
Private Members
-
int32_t vfid
vFPGA ID associated with the scheduler
-
bool reorder
Allow reordering of tasks to minimize number of reconfigurations.
-
fpgaCnfg fcnfg
Shell configuration as set before hardware synthesis in CMake.
-
std::map<int32_t, std::unique_ptr<bFunc>> functions
A map of the functions loaded to the scheduler, each identified by a unique function ID.
-
std::map<int32_t, int> task_id_map
A simple map from task ID to its position in the tasks vector; simply used for faster lookups of individual tasks.
-
std::mutex tlock
Task lock; there are multiple concurrent threads that access the tasks vector, and since vectors can relocate data (e.g., when adding new elements), this can lead to undefined behaviour. E.g., the scheduler thread iterates through the tasks vector, while the addTask() function could be called in the meantime, which would cause the vector to be resized and may cause the iterators to become invalid.
-
std::thread scheduler_thread
A dedicated thread that runs the scheduler.
-
bool scheduler_running
A flag indicating whether the scheduler thread is running.
-
std::string current_bitstream
The currently loaded bitstream.
-
class cService
- #include <cService.hpp>
Coyote background service.
This class implements the Coyote background service which runs on server node and can execute pre-defined Coyote functions. On the client side, the users can connect to this service through the helper class cConn and submit requests to the loaded functions. The service will automatically reconfigure the vFPGA with the correct bistream. The requests can be local or remote.
TODO:
Remote connections
Note
There is currently a bug in terminating the signals. Since the signal handler is static and limited in parameters, is it not aware of what instance should be terminated. Therefore, for now, the signal handler terminates all instances of the service. Users should only terminate the service once all vFPGAs have finished processing requests.
Public Functions
-
void start()
Starts the service.
This function initializes the daemon, sets up the socket for communication, and starts the scheduler thread to handle incoming requests. It will also accept connections from clients and register them.
-
inline int addFunction(std::unique_ptr<bFunc> fn)
Adds an arbitrary user function to the service.
Note
Implemented in the header file since it is a templated function
- Parameters:
fn – Unique pointer to the bFunc object representing the function
- Returns:
0 if the function was added successfully, 1 if bitstream cannot be opened, 2 if the function ID already exists
Public Static Functions
-
static inline cService *getInstance(std::string name, bool remote, int32_t vfid, uint32_t device = 0, bool reorder = true, uint16_t port = DEF_PORT)
Creates an instance of the service for a vFPGA.
If an instance already exists, return the existing instance (“singleton” implementation)
- Parameters:
name – Unique name for the service
remote – Local or remote service
vfid – Virtual FPGA ID associated with the service
device – Device number, for systems with multiple vFPGAs
reorder – Allow the scheduler to reorder tasks, to minimize reconfigurations
port – Port for remote connections
Private Functions
-
cService(std::string name, bool remote, int32_t vfid, uint32_t device, bool reorder, uint16_t port)
Default constructor; private to ensure the class is implemented as a singleton.
-
void daemonSigHandler(int signum)
Handles signals sent to the background service.
Note
Currently only SIGTERM is handled, which is used to gracefully terminate the service and clean up resources. Other signals are ignored.
-
void initDaemon()
Initializes the background daemon for this service.
-
void initSocket()
Initializes the socket for connections to this service, either local or remote.
-
void cleanConns()
Function that periodically iterates and releases resources (e.g., threads) held by stale connections.
-
void acceptConnectionLocal()
Accepts a local connection (IPC) to this service.
-
void acceptConnectionRemote()
Accepts a connection from a remote client to this service.
-
void processRequests(int connfd)
Processes client requests in a dedicated thread.
This function continuously loops to accept incoming requests for a connected client. It stores the requests in a list and can close the connection if the client sends a close request.
- Parameters:
connfd – The connection file descriptor for the client
-
void sendResponses(int connfd)
Send client responses in a dedicated thread.
This function continuously loops to check whether a client task has been completed and sends the response back to the client.
- Parameters:
connfd – The connection file descriptor for the client
Private Members
-
std::string service_id
Unique service ID, derived from the name.
-
std::string socket_name
Name of the socket for communication.
-
bool is_running
Boolean flag indicating whether the service is running; prevents double-starting daemon.
-
int sockfd
Socket file descriptor for communication.
-
bool remote
Whether the service receives requests from a remote node or locally.
-
int32_t vfid
vFPGA ID associated with the service
-
uint32_t device
Device number, for systems with multiple vFPGAs.
-
uint16_t port
Port for remote connections.
-
std::map<int, std::unique_ptr<cThread>> coyote_threads
A map of the connected clients and their corresponding Coyote threads which are used for executing the functions.
-
std::map<int, std::pair<std::thread, std::thread>> connection_threads
Dedicated threads which process the requests for each connected client; one for incoming request and one for writing the result back.
-
cSched *scheduler
Scheduler instance; handles the execution of tasks as well as reconfiguration, where required.
-
std::atomic<int32_t> task_counter
An atomic variable; used for generating unique IDs for tasks on the server side.
-
std::map<int, std::vector<std::pair<int32_t, int32_t>>> tasks
A map of client-submitted tasks When a client submits a task, it holds an ID which is written back with the task result, so that the client can link the result to the task (see cConn::checkCompletedTasks() for details). However, there is no guarantee that the client-submitted task ID is globally unique (because of multiple clients), so for each task, the cService also stores a server-generated task ID.
-
std::map<int, std::unique_ptr<std::mutex>> task_locks
Task map locks Since the map is written by the processRequests() thread and read by the
sendResponses() thread, thread safety must be ensured when accesing the map.
-
std::map<int, bool> conns_to_clean
A list of connection to be cleaned up; if the bool value is true, the connection is stale and should be cleaned up; false indicated it’s been cleaned up.
-
std::thread cleanup_thread
Dedicated thread that periodically iterates conns_to_clean and release stale connection threads and resources.
-
bool run_cleanup_thread
A boolean flag indicating whether the clean-up thread is running.
Private Static Functions
-
static void sigHandler(int signum)
Just a wrapper around daemonSigHandler; since the handler must be static.
Private Static Attributes
-
static std::map<std::string, cService*> services
Instances of the services.
We only allow one instance of the service per vFPGA on a single device, to ensure that mutliple services do not run in parallel on the same vFPGA, which can lead to multiple reconfigurations, execution conflicts etc. The map of the key is the device ID concatenated with the vFPGA ID.
-
class cTask
- #include <cTask.hpp>
A task represents a single request to execute a function.
This class encapsulates all the necessary logic and metadata to execute a Coyote function (cFunc). For example, the Coyote scheduler (cSched) keeps track of all the tasks and based on the scheduling policy, decides the next one to be executed.
Public Functions
-
cTask(int32_t tid, int32_t fid, size_t ret_val_size, cThread *cthread = nullptr, std::vector<std::vector<char>> fn_args = {})
Default constructor; sets the unique task ID and the associated function, sets the args, init other params to default value.
-
int32_t getTid() const
Getter: Task ID.
-
int32_t getFid() const
Getter: Function ID.
-
bool isCompleted() const
Checks if the task is completed.
-
void setCompleted(bool val)
Sets the value of is_completed.
-
std::vector<std::vector<char>> getArgs() const
Getter: Function arguments.
-
std::vector<char> getRetVal() const
Getter: Function return value.
-
void setRetVal(const std::vector<char> retval)
Setter: Function return value.
-
size_t getRetValSize() const
Getter: Function return value size.
-
int32_t getRetCode() const
Getter: Function return code.
-
void setRetCode(int32_t retcode)
Setter: Function return code.
Private Members
-
int32_t tid
Unique task identifier.
-
int32_t fid
ID of the function to be executed for this task o.
-
bool is_completed
Set to true when the task is completed.
-
cThread *cthread
Pointer to the cThread that executes this task (it is passed to the cFunc as the first argument)
-
std::vector<std::vector<char>> fn_args
Arguments for the function to be executed; see cFunc for detail on why a vector of char buffers is used.
-
std::vector<char> ret_val
Function return value; see cFunc for detail on why a char buffer is is used.
-
size_t ret_val_size
Size of the function return value; primarily a util value used for deserializing the char buffer.
-
int32_t ret_code
Function return code; a non-zero value indicates an error in the function execution.
-
cTask(int32_t tid, int32_t fid, size_t ret_val_size, cThread *cthread = nullptr, std::vector<std::vector<char>> fn_args = {})
-
class cThread
- #include <cThread.hpp>
The cThread class is the core component of Coyote for interacting with vFPGAs.
This class provides methods for memory management, data transfer operations, and synchronization with the vFPGA device. It also handles user interrupts and out-of-band set-up for RDMA operations. It abstracts the interaction with the char vfpga_device in the driver, providing a high-level interface for Coyote operations.
Public Functions
-
cThread(int32_t vfid, pid_t hpid, uint32_t device = 0, void (*uisr)(int) = nullptr)
Default constructor for the cThread.
- Parameters:
vfid – Virtual FPGA ID
hpid – Host process ID
device – Device number, for systems with multiple vFPGAs
uisr – User interrupt (notifications) service routine, called when an interrupt from the vFPGA is received
-
~cThread()
Default destructor for the cThread.
Cleans up the resources used by the cThread, including memory and file descriptors.
-
void userMap(void *vaddr, uint32_t len)
Maps a buffer to the vFPGAs TLB.
- Parameters:
vaddr – Virtual address of the buffer
len – Length of the buffer, in bytes
-
void userUnmap(void *vaddr)
Maps a buffer to the vFPGAs TLB.
- Parameters:
vaddr – Virtual address of the buffer
len – Length of the buffer, in bytes
-
void *getMem(CoyoteAlloc &&alloc)
Allocates memory for this cThread and maps it into the vFPGA’s TLB.
- Parameters:
alloc – CoyoteAlloc object containing the allocation parameters, including size, type (e.g., hugepage, GPU) etc.
- Returns:
Pointer to the alocated memory
-
void freeMem(void *vaddr)
Frees and unmaps previously allocated memory.
- Parameters:
vaddr – Virtual address of the buffer to be freed
-
void setCSR(uint64_t val, uint32_t offs)
Sets a control register in the vFPGA at the specified offset.
- Parameters:
val – Register value to be set
offs – Offset of the control register to be set
-
uint64_t getCSR(uint32_t offs) const
Reads from a register in the vFPGA at the specified offset.
- Parameters:
offs – Offset of the register to be read
- Returns:
Value of the register at the specified offset
-
void invoke(CoyoteOper oper, syncSg sg)
Invokes a Coyote sync or offload operation with the specified scatter-gather list (sg)
Note
Syncs and off-loads are blocking (synchronous) by design
- Parameters:
oper – Operation be invoked, in this case must be either CoyoteOper::LOCAL_SYNC or CoyoteOper::LOCAL_OFFLOAD
sg – Scatter-gather entry, specifying the memory address and length for the operation
-
void invoke(CoyoteOper oper, localSg sg, bool last = true)
Invokes a one-sided local Coyote operation with the specified scatter-gather list (sg)
Note
Local operations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()
Note
Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not
- Parameters:
oper – Operation be invoked, in this case must be either CoyoteOper::LOCAL_READ or CoyoteOper::LOCAL_WRITE
sg – Scatter-gather entry, specifying the memory address, length and stream for the operation
last – Indicates whether this is the last operation in a sequence (default: true)
-
void invoke(CoyoteOper oper, localSg src_sg, localSg dst_sg, bool last = true)
Invokes a two-sided local Coyote operation with the specified scatter-gather list (sg)
Note
Local operations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()
Note
Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not
- Parameters:
oper – Operation be invoked, in this case must be CoyoteOper::LOCAL_TRANSFER
src_sg – Source scatter-gather entry, specifying the memory address, length and stream
dst_sg – Destination scatter-gather entry, specifying the memory address, length and stream
last – Indicates whether this is the last operation in a sequence (default: true)
-
void invoke(CoyoteOper oper, rdmaSg sg, bool last = true)
Invokes an RDMA Coyote operation with the specified scatter-gather list (sg)
Note
Remote oeprations are non-blocking (asynchronous) by design, so users should poll for completion using checkCompleted()
Note
Whenever last is passed as true, the completion counter for the operation is incremented by 1 and an acknowledgement is sent on the hardware-side cq_* interface of the vFPGA with ack_t.host = 1; otherwise it is not
- Parameters:
oper – Operation be invoked, in this case must be CoyoteOper::RDMA_WRITE or CoyoteOper::RDMA_READ
sg – Scatter-gather entry, specifying the RDMA operation parameters
last – Indicates whether this is the last operation in a sequence (default: true)
-
void invoke(CoyoteOper oper, tcpSg sg, bool last = true)
Invokes an RDMA Coyote operation with the specified scatter-gather list (sg)
Note
TCP operations aren’t fully stable in Coyote 0.2.1, to be updated in the future
- Parameters:
oper – Operation be invoked, in this case must be CoyoteOper::RDMA_WRITE or CoyoteOper::RDMA_READ
sg – Scatter-gather entry, specifying the RDMA operation parameters
last – Indicates whether this is the last operation in a sequence (default: true)
-
uint32_t checkCompleted(CoyoteOper oper) const
Returns the number of completed operations for a given Coyote operation type.
- Parameters:
oper – Operation to be queried
- Returns:
Cumulative number of completed operations for the specified operation type, since the last clearCompleted() call
-
void clearCompleted()
Clears all the completion counters (for all operations)
-
void connSync(bool client)
Synchronizes the connection between the client and server.
- Parameters:
client – If true, this cThread acts as a client; otherwise, it acts as a server
-
void *initRDMA(uint32_t buffer_size, uint16_t port, const char *server_address = nullptr)
Sets up the cThread for RDMA operations.
This function creates an out-of-band connection to the server, which is used to exchange the queue pair (QP) between the nodes. Additionally, it allocates a buffer for the RDMA operations and returns a pointer to the allocated buffer.
- Parameters:
buffer_size – Size of the buffer to be allocated for RDMA operations
port – Port number to be used for the out-of-band connection
server_address – Optional server address to connect to; if not provided, this cThread acts as the server
-
void closeConn()
Opposite of initRDMA; releases the the out-of-band connection which was used to exchange QP.
-
void lock()
Locks the vFPGA for exclusive access by this cThread.
Locking ensures no other operation (even from other processes) is performed on the vFPGA concurrently. However, this may not always be desirable, as shown in Example 8 multi-threading. Generally, this method is typically not required and may mainly be needed when there are multiple software processes/threads targetting the same vFPGA simultaneously which can lead to undefined behaviour
-
int32_t getVfid() const
Getter: vFPGA ID (vfid)
-
int32_t getCtid() const
Getter: Coyote thread ID (ctid)
-
pid_t getHpid() const
Getter: Host process ID (hpid)
Protected Functions
-
void mmapFpga()
Utility function, memory mapping all the vFPGA control registers and writeback regions.
-
void munmapFpga()
Utility function, unmapping all the vFPGA control registers and writeback regions.
-
void postCmd(uint64_t offs_3, uint64_t offs_2, uint64_t offs_1, uint64_t offs_0)
Posts a DMA command to the vFPGA.
This function triggers a DMA command by writing the provided offsets to the appropriate control registers.
- Parameters:
offs_3 – Destination address
offs_2 – Destination control signals (e.g., size, offset, stream etc.)
offs_1 – Source address
offs_0 – Source control signals (e.g., size, offset, stream etc.)
-
void sendAck(uint32_t ack)
Sends an ack to the connected remote node via the out-of-band channel.
Note
Utility function, primarily used for syncing clients and servers between benchmarks and operations
- Parameters:
ack – Acknowledgment value to be sent
-
uint32_t readAck()
Reads an ack from the connected remote node via the out-of-band channel.
Note
Utility function, primarily used for syncing clients and servers between benchmarks and operations This function works in conjunction with sendAck() to synchronize operations between the client and server.
- Returns:
Acknowledgment value received from the remote node
-
void doArpLookup(uint32_t ip_addr)
Writes an IP address to a config register so it can be used for ARP lookup.
- Parameters:
ip_addr – IP address to be looked up
-
void writeQpContext(uint32_t port)
Writes the exchanged QP information to the vFPGA config registers.
- Parameters:
ip_addr – IP address to be looked up
Protected Attributes
-
int32_t fd = {0}
vFPGA device file descriptor
-
int32_t vfid = {-1}
vFPGA virtual ID
-
int32_t ctid = {-1}
Coyote thread ID.
-
pid_t hpid = {0}
Host process ID.
-
fpgaCnfg fcnfg
Shell configuration, as set by the user in CMake config.
-
std::unique_ptr<ibvQp> qpair
RDMA queue pair.
-
uint32_t cmd_cnt = {0}
Number data transfer commands sent to the vFPGA.
-
int32_t efd = {-1}
User interrupt file descriptor.
-
int32_t terminate_efd = {-1}
Termination event file descriptor for stopping the user interrupt thread.
-
std::thread event_thread
Dedicated thread for handling user interrupts.
-
volatile uint64_t *cnfg_reg = {0}
vFPGA config registers, if AVX is enabled, as implemented in cnfg_slave_avx.sv; used mainly for starting DMA commands
vFPGA config registers, if AVX is disabled, as implemented in cnfg_slave.sv; used mainly for starting DMA commands
-
volatile uint64_t *ctrl_reg = {0}
User-defined control registers, which can be parsed using axi_ctrl in the vFPGA.
-
volatile uint32_t *wback = {0}
Pointer to writeback region, if enabled.
-
std::unordered_map<void*, CoyoteAlloc> mapped_pages
A map of all the pages that have been allocated and mapped for this thread.
-
int connfd = {-1}
Out-of-band connection file descriptor to a remote node This connection is primarily used for exchanging of QPs and syncing (barriers) between operations
-
int sockfd = {-1}
Out-of-band socket file descriptor for the cThread This socket is initially used to establish an out-of-band connection (connfd) to a remote node for exchanging QP information and for sending/receiving acknowledgments.
-
bool is_connected
Set to true if there is an active out-of-band connection to a remote node for this cThread.
-
cThread(int32_t vfid, pid_t hpid, uint32_t device = 0, void (*uisr)(int) = nullptr)
-
struct localSg
- #include <cOps.hpp>
Scatter-gather entry for local operations (LOCAL_READ, LOCAL_WRITE, LOCAL_TRANSFER)
-
struct rdmaSg
- #include <cOps.hpp>
Scatter-gather entry for RDMA operations (REMOTE_READ, REMOTE_WRITE) NOTE: No field for source/dest address, since these are defined when exchanging queue pair information And, each cThread holds exactly one queue pair, so the source and destination addresses are always the same.
Public Members
-
uint64_t local_offs = {0}
Offset from the local buffer address; in case the buffer to be sent doesn’t need to start from the exchanged virtual address.
-
uint32_t local_stream = {STRM_HOST}
Source buffer stream: HOST or CARD.
-
uint32_t local_dest = {0}
Target source stream in the vFPGA; a value of i will write pull data for the RDMA operation from axis_(host|card)_recv[i] in the vFPGA.
-
uint64_t remote_offs = {0}
-
uint32_t remote_dest = {0}
Target destination stream; a value of i will write write data to axis_(host|card)_send[i] in the remote vFPGA.
-
uint32_t len = {0}
Lenght of the RDMA transfer, in bytes.
-
uint64_t local_offs = {0}
-
namespace coyote
Typedefs
-
using bitstream_t = std::pair<void*, uint32_t>
Bitstream alias: pointer to buffer holding its contents and its length.
Enums
-
enum class CoyoteOper
Various Coyote operations that allow users to move data from/to host memory, FPGA memory and remote nodes.
Values:
-
enumerator NOOP
No operation.
-
enumerator LOCAL_READ
Transfers data from CPU or FPGA memory to the vFPGA stream (axis_(host|card)_recv[i]), depending on sgEntry.local.src_stream.
-
enumerator LOCAL_WRITE
Transfers data from a vFPGA stream (axis_(host|card)_send[i]) to CPU or FPGA memory, depending on sgEntry.local.src_stream.
-
enumerator LOCAL_TRANSFER
LOCAL_READ and LOCAL_WRITE in parallel; dataflow is (CPU or FPGA) memory => vFPGA => (CPU or FPGA) memory.
-
enumerator LOCAL_OFFLOAD
Migrates data from CPU memory to FPGA memory (HBM/DDR)
-
enumerator LOCAL_SYNC
Migrates data from FPGA memory (HBM/DDR) to CPU memory.
-
enumerator REMOTE_RDMA_READ
One-side RDMA read operation.
-
enumerator REMOTE_RDMA_WRITE
One-sided RDMA write operation.
-
enumerator REMOTE_RDMA_SEND
Two-sided RDMA send operation.
-
enumerator REMOTE_TCP_SEND
TCP send operation; NOTE: Currently unsupported due to bugs; to be brought back in future releases of Coyote.
-
enumerator NOOP
-
enum class CoyoteAllocType
Different types of memory allocation that can be used in Coyote.
Values:
-
enumerator REG
Regular pages (typically 4KB on Linux)
-
enumerator THP
Transparent huge pages (THP); obtained by allocating consecutve regular pages; NOTE: Users should use HPF where possible; THP should be used if the system doesn’t natively support huge pages
-
enumerator HPF
Huge pages (HPF) (typically 2MB on Linux)
-
enumerator PRM
Partial reconfiguration memory, used for storing reconfiguration bitstreams.
-
enumerator GPU
Memory on the GPU (for GPU-FPGA DMA)
-
enumerator REG
Functions
-
inline bool isLocalRead(CoyoteOper oper)
-
inline bool isLocalWrite(CoyoteOper oper)
-
inline bool isLocalSync(CoyoteOper oper)
-
inline bool isRemoteRdma(CoyoteOper oper)
-
inline bool isRemoteRead(CoyoteOper oper)
-
inline bool isRemoteWrite(CoyoteOper oper)
-
inline bool isRemoteSend(CoyoteOper oper)
-
inline bool isRemoteWriteOrSend(CoyoteOper oper)
-
inline bool isRemoteTcp(CoyoteOper oper)
-
using bitstream_t = std::pair<void*, uint32_t>
- file bFunc.hpp
- #include <string>#include <vector>#include “cThread.hpp”
- file cBench.hpp
- #include <chrono>#include <vector>#include <algorithm>#include “cDefs.hpp”
- file cConn.hpp
- #include <atomic>#include <string>#include <vector>#include <iostream>#include <unistd.h>#include <sys/un.h>#include <sys/socket.h>#include “cTask.hpp”#include “cDefs.hpp”
- file cFunc.hpp
- #include <vector>#include <string>#include <cstdint>#include <functional>#include <filesystem>#include “bFunc.hpp”#include “cThread.hpp”
- file cGpu.hpp
- file cOps.hpp
- #include “cDefs.hpp”
- file cRcnfg.hpp
- #include <atomic>#include <fcntl.h>#include <fstream>#include <unistd.h>#include <sys/mman.h>#include <unordered_map>#include <boost/interprocess/sync/named_mutex.hpp>#include “cOps.hpp”#include “cDefs.hpp”
- file cSched.hpp
- #include <map>#include <mutex>#include <vector>#include <fstream>#include <cstdint>#include <syslog.h>#include “bFunc.hpp”#include “cTask.hpp”#include “cRcnfg.hpp”
- file cService.hpp
- #include <map>#include <mutex>#include <vector>#include <string>#include <signal.h>#include <unistd.h>#include <sys/un.h>#include <syslog.h>#include <sys/stat.h>#include “cFunc.hpp”#include “cSched.hpp”#include “cThread.hpp”
- file cTask.hpp
- #include <map>#include <vector>#include <cstdint>#include “cThread.hpp”
- file cThread.hpp
- #include <thread>#include <chrono>#include <string>#include <random>#include <fstream>#include <iostream>#include <unordered_map>#include <fcntl.h>#include <netdb.h>#include <syslog.h>#include <unistd.h>#include <sys/mman.h>#include <sys/ioctl.h>#include <sys/epoll.h>#include <sys/eventfd.h>#include <linux/mman.h>#include <boost/interprocess/sync/named_mutex.hpp>#include “cDefs.hpp”#include “cOps.hpp”#include “cGpu.hpp”