NVIDIA-powered GPU VM Platform

On-demand user convenience without owning hardware or running ops

LatAm

LOCATION

Cloud/AI Services

INDUSTRY

Estimate

Service Provided

01. The CLIENT

About the Client

BUSINESS

Our client is a Latin American cloud service provider who is pivoting into an AI service provider with managed, on-demand GPU compute with cloud-like hosting.

BACKGROUND

They wanted a cloud-style platform where users launch NVIDIA-powered GPU VMs or workspaces on demand via a web portal or Terraform.

02. The Project Challenge

INITIAL REQUEST

As the client’s vision was focused on GPU-as-a-service platform with a self-service portal where customers can purchase and manage GPU instances and related infrastructure for their needs.

THE CHALLENGE

The Maven Solutions team has presented clarification and technical options for The BlackCore API to orchestrate Canonical LXD/KVM to safely slice physical GPUs (passthrough/MIG) and handle lifecycle, quotas, and metered billing. Built-in observability (Prometheus/Grafana) would need to offer insights into usage and health at a glance.

The client sought a solution that would:

01

Offer on-demand GPU compute with integrated metered billing & invoicing without having to manage hardware.

02

Be accessible either from customer's cloud UI web app or an IaC provider like Terraform.

03

Support a user journey from picking a GPU profile to creating a workspace to monitoring usage and paying per metered consumption.

03. The SOLUTION

PROJECT SOLUTION

Our Strategic Approach

Maven Solutions compared various implementation options and included the customer's additional considerations of offering sevices in regions and for use cases where "plain vanilla" offerings fell short. After thorough analysis, Maven Solutions recommended BlackCore API to orchestrate Canonical LXD/KVM to safely slice physical GPUs (passthrough or MIG) and handle lifecycle, quotas, and metered billing as a best-fit approach.

The screenshot mockup below offers a preview of the UI for the end users:

The solution offers a practical step forward to AWS-like experience that would democratize access to high-performance compute for AI/ML, research, and enterprise workloads with a self-service web portal composed of frontend and backend APIs delivering the following features:

Billing integration

Stripe integration to support metered usage, invoicing, and multi-user team accounts for transparency and flexibility. Usage-based billing tracks compute and storage consumption in real time, while automated invoicing streamlines payment cycles. Team account features provide centralized management of billing, permissions, and cost reporting across multiple users.

usage Observability

Comprehensive observability and usage tracking powered by Prometheus, DCGM, and Grafana, gives teams deep visibility into performance, utilization, and costs. Metrics on GPU, CPU, memory, and networking are collected and exposed in real time, enabling fine-grained monitoring of workloads from experiments to enterprise-scale deployments.

GPU VMs User Experience

Create GPU virtual machines, Jupyter workspaces, and batch jobs with customizable configurations, interactive Jupyter environments for research and development, or schedule batch jobs for large-scale training and inference. Clear workflows, preset templates, and resource scaling options reduce setup time and errors.

GPU orchestration

An orchestration layer that supports both virtual machines with Multi-Instance GPU (MIG) capabilities. Users can allocate GPUs flexibly, whether by spinning up dedicated VMs or leveraging containerized workloads in Kubernetes for greater scalability and automation. MIG support enables fine-grained GPU partitioning, ensuring optimal utilization for diverse workloads ranging from lightweight inference to large-scale training.

Admin console

A powerful admin console designed for managing capacity, usage, and credits across teams and projects allows administrators to monitor real-time resource allocation, tracking of consumption trends, and enforce quotas to maintain fair and efficient usage. Credit-based management provides flexibility for assigning budgets, controlling costs, and supporting departmental or project-level accounting.

Maven Solutions offered to deliver a working MVP within 6 months from start date, offering resources of Golang developers, Front-End React developers, Quality Assurance specialists, a Scrum Master, a SysAdmin, and a Solution Architect.

04. The Results

Value Delivered

Better Efficiency

Accelerated Innovation: faster realization of test use cases leading to ~60% faster on-demand and scheduled GPU access
Operational Cost Reduction: ~50% cost reduction with tools that help handle lifecycle, quotas, and metered billing
Customer Retention and Experience: better overall user experience with easy-to use self-service access

Better Effectiveness

Faster Decision-Making: reliable and timely access to resources, accelerating analytics and operational insights
Flexibility and Agility: self-service approach allows rapid testing of new use cases without delay
Improved Data Consistency: centralized and standardized data management from multiple sources acting as a single source of truth

Better Service

Cloud-Native Scalability: Kubernetes platform for high availability, scalability, and performance
Service Level Improvement: improved service levels with better SLA management and real-time analytics
Compliance and Auditability: consistent enforcement of data, transformation, and access control policies

Founder, CTO

GPU VM Provider

With Maven Solutions, I could easily refine the initial idea for a specialized on-demand GPU cloud service that would be as easy to use as AWS, or better. We quickly progressed form an idea to an actionable roadmap.