FlexSys platform architecture.
How grid-aware compute actually works. The operator, the control plane, the gateway, the grid integration — and where the inference router fits.
Compute · Economics · Grid
One platform with three things to do: schedule the work, follow the price, respect the network.
Kubernetes operator + CRDs
A FlexSysJob CRD captures every workload type — Train, ImageGen, VideoGen, InferenceService — with first-class ResourceRequirements, QoS tier and priority. Real scheduling, not best-effort.
- FlexSysJob CRDs · Train / ImageGen / VideoGen / InferenceService
- ResourceRequirements + QoS + priority
- Open-source operator runs in our cluster or yours
Real-time grid pricing
AEMO + grid spot pricing is baked directly into the scheduler. Workloads land in the cheapest region right now. Every routing decision writes a row in the audit trail you can inspect per request.
- Live AEMO + spot pricing in the planner
- Per-request routing decision audit trail
- Same price for the customer regardless of where it lands
CURTAIL + demand response
When the network asks for less load, FlexSys gracefully drains GPU work and bids the freed capacity back into the spot market. Built on Flipped Energy's wholesale + retail experience and the Transgrid DMIA project.
- Graceful drain on curtailment signals
- Bids freed capacity into wholesale spot
- Transgrid DMIA · CSIRO · UQ M&V partners
Five layers, all replaceable
From the customer SDK at the top, down to the GPU at the bottom. Each layer has a clear responsibility, a public CRD or HTTP surface, and a repo.
Customer SDK
Your app keeps using the OpenAI SDK. We're a drop-in URL swap. Same chat/completions, completions and embeddings surface — same tokens, same usage object.
Gateway
YARP-based reverse proxy. Authenticates the API key, asks the router for a routing decision, forwards to the chosen backend, captures usage, stamps decision/cost/latency headers, and writes the audit trail.
ControlPlane
The brain. Owns tenants, plans, model SKUs, pricing per 1M tokens, backend availability, and routing decisions. Exposes an admin API to activate/deactivate backends and inspect routing audit rows.
Operator
Watches FlexSysJob custom resources. When a model is activated, it materialises Deployment + Service in the cluster, advertises the URL back to the ControlPlane, and the router targets it on the next decision.
Cluster
The actual GPUs serving tokens. Self-hosted in our DC today, or in your cluster (BYOC). The agent reports health and price signal back to the operator. We never see your data plane.
What you can submit
Four kinds of FlexSysJob, all sharing the same scheduler and audit trail. We mark each one honestly — what's exercised end-to-end, what's smoke-tested, and what's still scaffolded.
| Kind | Typical use case | ResourceRequirements | QoS default | Status |
|---|---|---|---|---|
| Train | Fine-tuning, LoRA / QLoRA, distillation runs | nvidia.com/gpu: 1–8 · 64–256Gi mem | Best-effort or Burstable · low priority | shipped (smoke-tested in kind) |
| ImageGen | SDXL / Flux batch + interactive generation | nvidia.com/gpu: 1 · 24–48Gi mem | Burstable · medium priority | shipped (smoke-tested in kind) |
| VideoGen | Text-to-video / image-to-video pipelines | nvidia.com/gpu: 1–4 · 64Gi+ mem | Burstable · medium priority | scaffolded, not exercised yet |
| InferenceService | Chat / completions / embeddings (this site's playground) | nvidia.com/gpu: 1 · 16–48Gi mem | Guaranteed · high priority | shipped, live in this demo |
Fine-tuning, LoRA / QLoRA, distillation runs
- resources
- nvidia.com/gpu: 1–8 · 64–256Gi mem
- qos
- Best-effort or Burstable · low priority
SDXL / Flux batch + interactive generation
- resources
- nvidia.com/gpu: 1 · 24–48Gi mem
- qos
- Burstable · medium priority
Text-to-video / image-to-video pipelines
- resources
- nvidia.com/gpu: 1–4 · 64Gi+ mem
- qos
- Burstable · medium priority
Chat / completions / embeddings (this site's playground)
- resources
- nvidia.com/gpu: 1 · 16–48Gi mem
- qos
- Guaranteed · high priority
Run the data plane in your DC
Same operator, your hardware, our control plane.
- Your data plane stays in your DCTokens, prompts and responses never leave your network. We see usage counts and routing metadata, nothing more.
- Same operator runs at Flipped + at youOne open-source kube operator. Identical FlexSysJob CRs. What we run in our cluster is what runs in yours.
- Outbound-only agent — no firewall changesThe agent dials out to our control plane over TLS. No inbound NAT, no exposed ingress, no shared VPC.
The control plane is SaaS-hosted by Flipped Energy — that's where tenants, pricing and routing decisions live. The data plane (the operator and the GPUs) stays in your DC. The agent is outbound-only over TLS, so there are no firewall changes, no inbound NAT, and no shared VPC. We see usage counts and routing metadata; prompts and responses never leave your network.
This is real, not theoretical
A regulator-overseen demonstration project on the live Transgrid network.
FlexSys is being demonstrated on the live Transgrid network under the DMIA project (Demand Management Innovation Allowance), with independent panel endorsement secured March 2026. DMIA is a regulator-overseen demonstration project — this is real infrastructure on a real grid, not vapourware.
The CURTAIL leg of the platform pairs the inference scheduler with Flipped Energy’s existing wholesale and retail energy market integrations: when the network operator asks for less load, FlexSys drains GPU work without dropping in-flight requests, and bids the freed capacity back into the wholesale spot market.
- AEMO Pre-Dispatch integrationThe scheduler reads AEMO Pre-Dispatch and 5-minute spot signals directly. Workload placement reflects the price the grid is paying right now, not yesterday's tariff curve.
- High-resolution DC meteringTargeting IEC 61000-4-30 Class A measurement at the DC point of common coupling. Real instrumentation, regulator-grade.
- Independent M&V partnersCSIRO and the University of Queensland are independent measurement and verification partners on the FlexSys CURTAIL trial.
Infrastructure, not a middleman
Three things FlexSys deliberately is not. Helps procurement triangulate.
We don't margin up someone else's GPUs and pretend that's the product. We run the operator, the scheduler and the gateway ourselves.
The inference router on the homepage is one surface. The real product is the platform underneath: the operator, the CRDs, and the grid integration.
There is no bidding stack, no third-party seller fan-out, no opaque price discovery. One scheduler, one audit trail, one bill.
Bring your AI workloads. Or your cluster. Or both.
Get a demo key and try the inference router in 30 seconds, or talk to engineering about running the operator in your own DC.