2025-11-14
Summary: Deep dive into HAMi-core's CUDA compatibility mechanisms, exploring how this lightweight GPU resource control solution maintains stability across different CUDA versions through intelligent function version fallback, multi-version mapping, and transparent API interception. Learn about HAMi-core's compatibility strategies, potential risks, and production validation approaches.
In the GPU resource management space, HAMi has gained increasing attention as an open-source project.
Within HAMi's architecture, HAMi-core operates closest to the NVIDIA driver layer, providing fine-grained control over GPU memory usage and SM (Streaming Multiprocessor) allocation during container or task execution.
HAMi-core achieves "hard limits" on GPU compute and memory through lightweight injection of libvgpu.so into containers via LD_PRELOAD.
This approach enables GPU pooling, sharing, and multi-tenant isolation without relying entirely on hardware capabilities like MIG or heavyweight virtualization solutions—making it a practical technical path for many cloud-native AI workloads.
However, this "lightweight GPU resource control" approach raises an important question:
HAMi-core's core mechanism intercepts calls between the
CUDA RuntimeandDriverlayers. Can it remain stable when CUDA versions update and driver interfaces change?
This concern is valid, as HAMi-core's implementation relies on intercepting calls between CUDA Runtime and Driver layers:
Docker, Containerd, K8s device plugin) and CUDA Toolkit version compatibility can affect stability.In other words, HAMi-core's lightweight design and flexibility depend on precise control over CUDA API interception. Any changes in the CUDA version chain can challenge this precision.
To address these concerns, we'll publish a series of articles exploring HAMi-core features in depth to help you better understand its implementation mechanisms.
This first article examines CUDA compatibility mechanisms—a critical yet often overlooked aspect. We'll explore how HAMi-core interacts with different CUDA versions, its compatibility strategies, potential risks, and how to validate and mitigate these risks in production environments.
To ensure stable operation across different CUDA versions, HAMi-core implements a multi-layered compatibility mechanism.
HAMi-core implements an intelligent function version fallback mechanism to handle API changes across CUDA versions. When a specific versioned function cannot be found, the system automatically attempts to locate older versions:
Different CUDA versions may use different API suffixes (e.g., _v2, _v3), and some older versions have no suffix.
HAMi-core's prior_function() automatically decrements the version number when lookup fails:

//Project-HAMi/HAMi-core/blob/main/src/cuda/hook.c
int prior_function(char tmp[500]) {
char *pos = tmp + strlen(tmp) - 3;
if (pos[0]=='_' && pos[1]=='v') {
if (pos[2]=='2')
pos[0]='\0';
else
pos[2]--;
return 1;
}
return 0;
}
for (i = 0; i < CUDA_ENTRY_END; i++) {
LOG_DEBUG("LOADING %s %d",cuda_library_entry[i].name,i);
cuda_library_entry[i].fn_ptr = real_dlsym(table, cuda_library_entry[i].name);
if (!cuda_library_entry[i].fn_ptr) {
cuda_library_entry[i].fn_ptr=real_dlsym(RTLD_NEXT,cuda_library_entry[i].name);
if (!cuda_library_entry[i].fn_ptr){
LOG_INFO("can't find function %s in %s", cuda_library_entry[i].name,cuda_filename);
memset(tmpfunc,0,500);
strcpy(tmpfunc,cuda_library_entry[i].name);
while (prior_function(tmpfunc)) {
cuda_library_entry[i].fn_ptr=real_dlsym(RTLD_NEXT,tmpfunc);
if (cuda_library_entry[i].fn_ptr) {
LOG_INFO("found prior function %s",tmpfunc);
break;
}
}
}
}
}
For interfaces with significant cross-version differences, HAMi-core uses g_func_map to maintain mappings between different versions' actual function names.

Examples:
The mapping table enables HAMi-core to route to the correct function variant at runtime based on the actual environment, without hardcoding version-specific logic.

Since CUDA 11, cuGetProcAddress has become the standard way for programs to dynamically resolve Driver API functions. Many modern deep learning frameworks call this function.
HAMi-core intercepts this function to return its own hook function pointers, ensuring:
This is critical for compatibility with modern CUDA environments.
HAMi-core intercepts multiple CUDA Driver API functions at the Driver API layer:

//Project-HAMi/HAMi-core/blob/main/src/cuda/hook.c
/* Context Part */
{.name = "cuDevicePrimaryCtxGetState"},
{.name = "cuDevicePrimaryCtxRetain"},
{.name = "cuDevicePrimaryCtxSetFlags_v2"},
{.name = "cuDevicePrimaryCtxRelease_v2"},
{.name = "cuCtxGetDevice"},
{.name = "cuCtxCreate_v2"},
{.name = "cuCtxCreate_v3"},
{.name = "cuCtxDestroy_v2"},
{.name = "cuCtxGetApiVersion"},
{.name = "cuCtxGetCacheConfig"},
{.name = "cuCtxGetCurrent"},
{.name = "cuCtxGetFlags"},
Hook logic delegates to the original function implementation via CUDA_OVERRIDE_CALL, ensuring:
Beyond injecting resource limit logic, HAMi-core only adds resource management logic when necessary, preserving NVIDIA's native semantics and ensuring functional completeness.
//Project-HAMi/HAMi-core/blob/main/src/include/libcuda_hook.h
#define CUDA_OVERRIDE_CALL(table, sym, ...) \
({ \
LOG_DEBUG("Hijacking %s", #sym); \
cuda_sym_t _entry = (cuda_sym_t)CUDA_FIND_ENTRY(table, sym); \
_entry(__VA_ARGS__); \
})
HAMi-core uses pthread_once to ensure thread-safe and idempotent initialization:
This mechanism ensures correct initialization in multi-threaded environments, avoiding race conditions.
//Project-HAMi/HAMi-core/blob/main/src/libvgpu.c
CUresult cuInit(unsigned int Flags){
LOG_INFO("Into cuInit");
pthread_once(&pre_cuinit_flag,(void(*)(void))preInit);
ENSURE_INITIALIZED();
CUresult res = CUDA_OVERRIDE_CALL(cuda_library_entry,cuInit,Flags);
if (res != CUDA_SUCCESS){
LOG_ERROR("cuInit failed:%d",res);
return res;
}
pthread_once(&post_cuinit_flag, (void(*) (void))postInit);
return CUDA_SUCCESS;
}
When applications query device status such as memory, HAMi-core:
/Project-HAMi/HAMi-core/blob/main/src/cuda/device.c
CUresult cuDeviceTotalMem_v2 ( size_t* bytes, CUdevice dev ) {
LOG_DEBUG("into cuDeviceTotalMem");
ENSURE_INITIALIZED();
size_t limit = get_current_device_memory_limit(dev);
*bytes = limit;
return CUDA_SUCCESS;
}
HAMi-core's approach is elegant: rather than rewriting GPU virtualization, it achieves resource quota control by intercepting CUDA's "entry points," enabling container-level constraints on memory and SM allocation. Its core principle:
Minimum intrusion, maximum compatibility.
//Project-HAMi/HAMi-core/blob/main/src/libvgpu.c
void* __dlsym_hook_section(void* handle, const char* symbol) {
int it;
for (it=0;it<CUDA_ENTRY_END;it++){
if (strcmp(cuda_library_entry[it].name,symbol) == 0){
if (cuda_library_entry[it].fn_ptr == NULL) {
LOG_WARN("NEED TO RETURN NULL");
return NULL;
}else{
break;
}
}
}
DLSYM_HOOK_FUNC(cuInit);
DLSYM_HOOK_FUNC(cuGetProcAddress);
DLSYM_HOOK_FUNC(cuGetProcAddress_v2);
The primary value of this approach:
It provides a "software-controllable" fine-grained isolation mechanism for cloud-native GPU computing, without requiring complete reliance on vendor ecosystems or hardware features.
For this reason, compatibility becomes the most critical consideration for such solutions, directly impacting feature availability.
Understanding CUDA's evolution, mastering interception boundaries, and establishing reliable version validation mechanisms are key to bringing such lightweight control solutions into production environments.
In upcoming articles, we'll continue exploring other critical aspects of HAMi-core. Stay tuned!
To learn more about RiseUnion's GPU pooling, virtualization and computing power management solutions, please contact us: contact@riseunion.io