Platform · Architecture

FlexSys platform architecture.

How grid-aware compute actually works. The operator, the control plane, the gateway, the grid integration — and where the inference router fits.

Three pillars

Compute · Economics · Grid

One platform with three things to do: schedule the work, follow the price, respect the network.

01 · Compute

Kubernetes operator + CRDs

A FlexSysJob CRD captures every workload type — Train, ImageGen, VideoGen, InferenceService — with first-class ResourceRequirements, QoS tier and priority. Real scheduling, not best-effort.

FlexSysJob CRDs · Train / ImageGen / VideoGen / InferenceService
ResourceRequirements + QoS + priority
Open-source operator runs in our cluster or yours

02 · Economics

Real-time grid pricing

AEMO + grid spot pricing is baked directly into the scheduler. Workloads land in the cheapest region right now. Every routing decision writes a row in the audit trail you can inspect per request.

Live AEMO + spot pricing in the planner
Per-request routing decision audit trail
Same price for the customer regardless of where it lands

03 · Grid

CURTAIL + demand response

When the network asks for less load, FlexSys gracefully drains GPU work and bids the freed capacity back into the spot market. Built on Flipped Energy's wholesale + retail experience and the Transgrid DMIA project.

Graceful drain on curtailment signals
Bids freed capacity into wholesale spot
Transgrid DMIA · CSIRO · UQ M&V partners

The full picture

Five layers, all replaceable

From the customer SDK at the top, down to the GPU at the bottom. Each layer has a clear responsibility, a public CRD or HTTP surface, and a repo.

01 · OpenAI client

Customer SDK

Your app keeps using the OpenAI SDK. We're a drop-in URL swap. Same chat/completions, completions and embeddings surface — same tokens, same usage object.

built onOpenAI Python · Node · any HTTP clientrepoOpenAI Python SDK ↗

02 · Auth · Route · Meter

Gateway

YARP-based reverse proxy. Authenticates the API key, asks the router for a routing decision, forwards to the chosen backend, captures usage, stamps decision/cost/latency headers, and writes the audit trail.

built on.NET 9 · YARP · Asp.Net minimal APIsrepogitea.flipped.energy / flexsys-gateway ↗

03 · Plan · Pricing · Audit

ControlPlane

The brain. Owns tenants, plans, model SKUs, pricing per 1M tokens, backend availability, and routing decisions. Exposes an admin API to activate/deactivate backends and inspect routing audit rows.

built on.NET 9 · EF Core · Postgres · open-telemetryrepogitea.flipped.energy / flexsys-control-plane ↗

04 · Kubernetes CRDs

Operator

Watches FlexSysJob custom resources. When a model is activated, it materialises Deployment + Service in the cluster, advertises the URL back to the ControlPlane, and the router targets it on the next decision.

built onkubebuilder · controller-runtime · client-gorepogitea.flipped.energy / flexsys-operator ↗

05 · GPU workloads

Cluster

The actual GPUs serving tokens. Self-hosted in our DC today, or in your cluster (BYOC). The agent reports health and price signal back to the operator. We never see your data plane.

built onkind / k3s / EKS · NVIDIA device plugin · Ollama / vLLMrepogitea.flipped.energy / flexsys-agent ↗

Workload types

What you can submit

Four kinds of FlexSysJob, all sharing the same scheduler and audit trail. We mark each one honestly — what's exercised end-to-end, what's smoke-tested, and what's still scaffolded.

Kind	Typical use case	ResourceRequirements	QoS default	Status
Train	Fine-tuning, LoRA / QLoRA, distillation runs	nvidia.com/gpu: 1–8 · 64–256Gi mem	Best-effort or Burstable · low priority	shipped (smoke-tested in kind)
ImageGen	SDXL / Flux batch + interactive generation	nvidia.com/gpu: 1 · 24–48Gi mem	Burstable · medium priority	shipped (smoke-tested in kind)
VideoGen	Text-to-video / image-to-video pipelines	nvidia.com/gpu: 1–4 · 64Gi+ mem	Burstable · medium priority	scaffolded, not exercised yet
InferenceService	Chat / completions / embeddings (this site's playground)	nvidia.com/gpu: 1 · 16–48Gi mem	Guaranteed · high priority	shipped, live in this demo

Train

shipped (smoke-tested in kind)

Fine-tuning, LoRA / QLoRA, distillation runs

resources: nvidia.com/gpu: 1–8 · 64–256Gi mem
qos: Best-effort or Burstable · low priority

ImageGen

shipped (smoke-tested in kind)

SDXL / Flux batch + interactive generation

resources: nvidia.com/gpu: 1 · 24–48Gi mem
qos: Burstable · medium priority

VideoGen

scaffolded, not exercised yet

Text-to-video / image-to-video pipelines

resources: nvidia.com/gpu: 1–4 · 64Gi+ mem
qos: Burstable · medium priority

InferenceService

shipped, live in this demo

Chat / completions / embeddings (this site's playground)

resources: nvidia.com/gpu: 1 · 16–48Gi mem
qos: Guaranteed · high priority

Bring your own cluster · Shape 3

Run the data plane in your DC

Same operator, your hardware, our control plane.

Your data plane stays in your DC
Tokens, prompts and responses never leave your network. We see usage counts and routing metadata, nothing more.
Same operator runs at Flipped + at you
One open-source kube operator. Identical FlexSysJob CRs. What we run in our cluster is what runs in yours.
Outbound-only agent — no firewall changes
The agent dials out to our control plane over TLS. No inbound NAT, no exposed ingress, no shared VPC.

Talk to us about your DC

opens email to sales@flipped.energy

The control plane is SaaS-hosted by Flipped Energy — that's where tenants, pricing and routing decisions live. The data plane (the operator and the GPUs) stays in your DC. The agent is outbound-only over TLS, so there are no firewall changes, no inbound NAT, and no shared VPC. We see usage counts and routing metadata; prompts and responses never leave your network.

Grid integration · CURTAIL + DMIA

This is real, not theoretical

A regulator-overseen demonstration project on the live Transgrid network.

FlexSys is being demonstrated on the live Transgrid network under the DMIA project (Demand Management Innovation Allowance), with independent panel endorsement secured March 2026. DMIA is a regulator-overseen demonstration project — this is real infrastructure on a real grid, not vapourware.

The CURTAIL leg of the platform pairs the inference scheduler with Flipped Energy’s existing wholesale and retail energy market integrations: when the network operator asks for less load, FlexSys drains GPU work without dropping in-flight requests, and bids the freed capacity back into the wholesale spot market.

StatusDMIA panel endorsement · March 2026

AEMO Pre-Dispatch integration
The scheduler reads AEMO Pre-Dispatch and 5-minute spot signals directly. Workload placement reflects the price the grid is paying right now, not yesterday's tariff curve.
High-resolution DC metering
Targeting IEC 61000-4-30 Class A measurement at the DC point of common coupling. Real instrumentation, regulator-grade.
Independent M&V partners
CSIRO and the University of Queensland are independent measurement and verification partners on the FlexSys CURTAIL trial.

What we are not

Infrastructure, not a middleman

Three things FlexSys deliberately is not. Helps procurement triangulate.

Not a reseller

We don't margin up someone else's GPUs and pretend that's the product. We run the operator, the scheduler and the gateway ourselves.

Not just a router

The inference router on the homepage is one surface. The real product is the platform underneath: the operator, the CRDs, and the grid integration.

Not a marketplace

There is no bidding stack, no third-party seller fan-out, no opaque price discovery. One scheduler, one audit trail, one bill.

Bring it to us

Bring your AI workloads. Or your cluster. Or both.

Get a demo key and try the inference router in 30 seconds, or talk to engineering about running the operator in your own DC.

Get a demo API key Talk to engineering

engineering@flipped.energy · live operator demos by request