Platform
A platform for managing cloud workloads across regions, machine types, and lifecycles with advanced scheduling and orchestration capabilities.
Overview
The YellowDog Platform enables comprehensive management of cloud workloads, offering flexibility across geographical regions, machine types, and instance lifecycles. It integrates a powerful scheduler and intelligent orchestration engine to provide fine-grained control and prioritization of workload scheduling in cloud, hybrid, and multi-cloud environments.
The platform supports Linux and Windows, container architectures, and all xPU types, including Intel, AMD, NVIDIA, AWS Trainium/Inferentia, and Graviton/ARM-based instances, which are more cost and energy-efficient. Instance selection can be policy-based or attribute-based, automatically adjusting to meet customer requirements.
YellowDog is designed for massive scale, capable of reaching 1 million vCPUs in 7 minutes and 3.2 million vCPUs in just over half an hour. It can scale to 200,000 nodes and over 30 million vCPUs, demonstrating throughput at 3000 tasks per second.
The platform enhances cost efficiency by utilizing low-price Spot instances, maximizing node utilization, and reducing compute wastage. It also minimizes data transfer and storage costs by optimizing workload mobility, considering data gravity, latency, and confidentiality constraints.
A user-friendly web-based interface provides a real-time dashboard for monitoring usage, performance, and managing compute provisioning, object storage, work scheduling, and platform administration. The platform's security is ISO 27001 certified, ensuring data confidentiality, integrity, and availability.
YellowDog integrates with popular workflow tools and cloud service providers, offering out-of-the-box connectors to object storage services and HPC cluster file systems. It supports integration with third-party schedulers, acting as a unified workload submission platform across cloud and on-prem resources.
