Based on the Kubernetes container engine, AISpike can optimize resource management
and scheduling, so as to improve resource utilization and AI training efficiency.
• GPU multiple granularity strategy
AISpike supports resource isolation and scheduling in the unit of GPU memory
granularity. The user can apply for at least 1G of GPU memory resources for
model development or training. The overall cluster resource utilization can be
increased by 30%.
• Resource affinity scheduling
AISpike enables a sophiscated affinity schedulingstrategy (data affinity and GPU topology affinity) based on the specific business scenarios and accumulated experience of the user.
Data affinity can make full use of the cached application stack image and sample data of the computing node,shorten the task build time; GPU topology affinity can prioritize scheduling of
GPU resources under the same NVLINK or PCIe switch, and make full use of the communication link between GPU cards to improve training efficiency.
• User quota and priority strategy
AISpike adds the priority scheduling and userresource quota restriction strategy in Kubernetes. In a complex environment where multiple users and resources are performing tasks, the strategy ensures flexible allocation and scheduling of computing resources, in order to maximize resource utilization.
For the first time, AISpike added an intelligentidentification mechanism for idle resources, which can effectively identify tasks with low utilization rate and inactive users, timely remind the administrator to release idle resources, and further improve cluster resource performance.