Data is the “fuel” of AI algorithms and the basis forAI development. It is receiving more attention in the building and industrialization of corporate AI applications. AISpike is dedicated to helping enterprise users establish a stable and efficient data management suite to support corporate AI applications by addressing issues such as data storage and circulation, computing acceleration.
• Data lifecycle management to facilitate collaborativedevelopment
AISpike enables data lifecycle management, and supportsenterprise-level data sharing and isolation by setting caterogies of personal data, group data, and public data.
The enterprise administrator have read and write access to public data, and can maintain public data samples, models, or algorithms of the enterprise. By default, the development user can only read the public data.
The development user needs to import public data into personal data before performing data preprocessing and model training. This is to ensure the uniqueness of public data and avoid data conflicts caused by multi-user operations.
• Accelerate data cache to improve training speed
Strong GPU computing power needs the support of high-performancedata IO. AISpike strengthens the data cache acceleration with a data pre-reading scheme and cache-based resource affinity scheduling. Thus, make full use of network idle time or task waiting time to perform data preloading, and greatly reduces the sample data download and shortens the training build time. Training efficiency is improved by 2 to 3 times.
• Support multiple file formats and seamlesslyconnect to enterprise data systems
AISpike supports connection/docking with storagesystems such as NFS, HDFS, and BeeGFS, and implements integration of cloud storage platforms within the enterprise through the standard API interface. Thus, ensure the smooth migration and use of user data.