2025-04-29
HAMi is an intelligent platform designed for heterogeneous GPU resource pooling and scheduling. To support flexible configuration across different environments and requirements, HAMi's scheduler and device plugins offer comprehensive startup parameters. This guide provides a detailed overview of these parameters and their default behaviors to help you quickly get started and optimize your deployment.
Note: This documentation is based on the current master branch and parameters may be updated.
HAMi's scheduler component supports the following configuration options:
Specifies the HTTP server binding address. Default: 127.0.0.1:8080.
Path to the TLS certificate file for HTTPS communication.
Path to the TLS private key file, used in conjunction with cert_file.
Defines the scheduler name written to pod.spec.schedulerName. If empty, uses the default Kubernetes scheduler.
Default GPU memory allocation for pods when not explicitly specified.
Default GPU core utilization percentage for pods when not explicitly specified.
Default number of GPUs allocated per pod when not specified. Default: 1.
Node scheduling policy. Default: "binpack" to consolidate resource allocation.
GPU scheduling policy. Default: "spread" to distribute workloads across GPUs.
Prometheus metrics endpoint binding address. Default: :9395.
Node selection based on labels, with multiple key-value pairs separated by commas.
Queries per second (QPS) limit for kube-apiserver communication. Default: 5.0.
Maximum burst request limit. Default: 10.
Timeout for communicating with the kube-apiserver (in seconds). Default is 30.
Enables pprof performance profiling via HTTP server. Default: false.
HAMi's NVIDIA device plugin supports the following configuration options:
Current node name, automatically read from environment variables by default.
Number of virtual devices to create from a single GPU. Default: 2.
GPU memory scaling factor. Default: 1.0 (no scaling).
GPU core scaling factor. Default: 1.0.
Disables GPU core utilization limits when set. Default: false.
Resource field name for GPU requests in containers. Default: "nvidia.com/gpu".
Resource exposure strategy for MIG-capable (Multi-Instance GPU) devices. Options:
Terminates plugin execution on initialization errors. Default: true (strict mode).
Root path for NVIDIA driver installation. Default: /.
Controls whether DeviceSpecs list is passed to kubelet during Allocate(). Default: false.
Method for passing device lists to runtime. Options:
Device ID passing method. Options:
Ensures GPU Direct Storage (GDS) is enabled for containers at launch.
Ensures Mellanox OpenFabrics (MOFED) is enabled for containers at launch.
Path to configuration file for overriding command-line args or environment variables.
Prefix for CDI annotation keys. Uses preset value by default.
Path to nvidia-ctk tool for CDI spec generation.
NVIDIA driver directory path mounted inside containers for CDI specs.
Logging verbosity level. Default: 0, higher values increase detail.
HAMi's startup parameters are designed with both flexibility and security in mind, enabling fine-grained tuning based on:
Whether you're testing on a single node or deploying in a large-scale production environment, properly configuring these parameters helps maximize GPU resource utilization while enhancing scheduling efficiency and system stability.
To learn more about RiseUnion's GPU pooling, virtualization and computing power management solutions, please contact us: contact@riseunion.io