Headless GPU saas Platform

Enterprise-grade AI platform back end

LatAm

LOCATION

Cloud/AI Services

INDUSTRY

Estimate

Service Provided

01. The CLIENT

About the Client

BUSINESS

Our client is a Latin American cloud service provider who is pivoting into an AI service provider with managed, headless on-demand GPU compute for enterprises

BACKGROUND

They wanted a cloud-style platform where GPU VMs could be plugged into most existing enterprise architectures

02. The Project Challenge

INITIAL REQUEST

As the client’s vision was focused on GPU-as-a-service platform with no front end where enterprise customers can access GPU instances and related infrastructure for their needs on-demand without owning the infrastructure.

THE CHALLENGE

The Maven Solutions team has presented clarification and technical options for The BlackCore API to orchestrate Canonical LXD/KVM to safely slice physical GPUs (passthrough/MIG) and handle lifecycle, quotas, and metered billing. Built-in observability (Prometheus/Grafana) would need to offer insights into usage and health at a glance.

The client sought a solution that would:

01

Offer on-demand GPU compute with integrated metered billing & invoicing without having to manage hardware.

02

Be accessible either from customer's internal patforms or developer environment.

03

Support a user journey from picking a GPU profile all the day to FinOps integration.

03. The SOLUTION

PROJECT SOLUTION

Our Strategic Approach

Maven Solutions compared various implementation options and included the customer's additional considerations of offering sevices in regions and for use cases where "plain vanilla" offerings fell short. After thorough analysis, Maven Solutions recommended BlackCore API to orchestrate Canonical LXD/KVM to safely slice physical GPUs (passthrough or MIG) and handle lifecycle, quotas, and metered billing as a best-fit approach.

The solution offers a practical step forward to AWS-like experience that could seamlessly plug in into existing architecture as an additional AI/ML resource so that enterprises could easily plug in a backend API to deliver the following features:

Billing integration

Stripe integration to support metered usage, invoicing, and multi-user team accounts for transparency and flexibility. Usage-based billing tracks compute and storage consumption in real time, while automated invoicing streamlines payment cycles. Team account features provide centralized management of billing, permissions, and cost reporting across multiple users.

usage Observability

Comprehensive observability and usage tracking powered by Prometheus, DCGM, and Grafana, gives teams deep visibility into performance, utilization, and costs. Metrics on GPU, CPU, memory, and networking are collected and exposed in real time, enabling fine-grained monitoring of workloads from experiments to enterprise-scale deployments.

GPU VMs User Experience

Create GPU virtual machines, Jupyter workspaces, and batch jobs with customizable configurations, interactive Jupyter environments for research and development, or schedule batch jobs for large-scale training and inference. Clear workflows, preset templates, and resource scaling options reduce setup time and errors.

GPU orchestration

An orchestration layer that supports both virtual machines with Multi-Instance GPU (MIG) capabilities. Enterprises can allocate GPUs flexibly, whether by spinning up dedicated VMs or leveraging containerized workloads in Kubernetes for greater scalability and automation. MIG support enables fine-grained GPU partitioning, ensuring optimal utilization for diverse workloads ranging from lightweight inference to large-scale training.

Admin console

A powerful observability plug-in accessible via an API designed for managing capacity, usage, and credits across teams and projects allows administrators to monitor real-time resource allocation, tracking of consumption trends, and enforce quotas to maintain fair and efficient usage. Credit-based management provides flexibility for assigning budgets, controlling costs, and supporting departmental or project-level accounting.

Maven Solutions offered to deliver a working MVP within 6 months from start date, offering resources of Golang developers, Quality Assurance specialists, a Scrum Master, a SysAdmin, and a Solution Architect.

04. The Results

Value Delivered

Better Efficiency

Accelerated Innovation: faster realization of test use cases leading to ~60% faster on-demand and scheduled GPU access
Operational Cost Reduction: ~50% cost reduction with tools that help handle lifecycle, quotas, and metered billing
Customer Retention and Experience: better overall user experience with easy-to use self-service access via API

Better Effectiveness

Faster Decision-Making: reliable and timely access to resources, accelerating analytics and operational insights
Flexibility and Agility: self-service approach allows rapid testing of new use cases without delay
Improved Data Consistency: centralized and standardized data management from multiple sources acting as a single source of truth

Better Service

Cloud-Native Scalability: Kubernetes platform for high availability, scalability, and performance
Service Level Improvement: improved service levels with better SLA management and real-time analytics
Compliance and Auditability: consistent enforcement of data, transformation, and access control policies

Founder, CTO

GPU VM Provider

With Maven Solutions, I could easily refine several ideas for an on-demand GPU cloud service for enterprises. I got a proof of concept with an actionable roadmap to make my non-technical ideas into work plans with solid technical backing.