The KAI Scheduler lets you control how your algorithms use resources.
It allows assigning algorithms to queues, limiting memory, and sharing GPUs between multiple algorithms to improve efficiency.
KAI Scheduler is a Kubernetes component that manages how workloads are scheduled.
It supports advanced features like GPU sharing (using fractions of a GPU) and queue prioritization — allowing multiple algorithms to use the same GPU efficiently.
You can find the KAI Scheduler source code and installation guide here.
Important:
To use this feature, the KAI Scheduler must be installed and configured in your cluster.
It requires Kubernetes version 1.24.x or higher.
Make sure all prerequisites mentioned in the KAI GitHub repository are met before enabling this integration.
If your algorithm tries to use more resources than assigned at runtime, the request will not be strictly limited.
KAI-related settings are defined in the kaiObject section inside your algorithm specification.
This section allows you to control how the algorithm interacts with the scheduler, including queue assignment, memory allocation, and GPU fraction usage.
Example of kaiObject configuration:
{ "kaiObject": { "queue": "gpu-queue", "memory": "512Mi", "fraction": 0.5 } }
| Field | Type | Description | Required |
|---|---|---|---|
| queue | string | Name of the KAI queue to assign the algorithm to. | ✅ Yes |
| memory | string | Memory limit for the algorithm (e.g., "512Mi", "1Gi"). | ❌ No |
| fraction | number | Fraction of GPU usage (e.g., 0.5 for 50% GPU). | ❌ No |
Below is a minimal algorithm configuration that includes kaiObject:
{ "name": "gpu-fraction-algorithm", "algorithmImage": "docker.io/hkubedevtest/my-gpu-algo:latest", "cpu": 0.5, "mem": "512Mi", "gpu": 1, "kaiObject": { "queue": "gpu-queue", "memory": "512Mi", "fraction": 0.5 } }
In this example: