AI inference is the next frontier.

base compute is an AI inference lab. We build the runtimes and infrastructure that make powerful AI run on-device - fast, private, and at near-zero marginal cost.

Already shipped

baseRT - the fastest LLM runtime on Apple Silicon.

baseRT is tuned to get the most out of Apple Silicon for local LLMs, so you get the throughput that makes on-device inference practical at scale. It's the performance baseline we extend across our edge stack.

Open source

Inference throughput (tok/s) on Apple M4 Pro

baseRTUZUMLXLlama.cpp
QWEN 3 0.6B (4bit)
baseRT
462
UZU
398
MLX
385
Llama.cpp
255
Llama 3.2 1B (4bit)
baseRT
302
UZU
290
MLX
278
Llama.cpp
207

Inference is the next frontier of AI.
And it's moving to the edge.

Frontier AI costs $300-1,000 a day. The destination is $20 a month.

That gap closes when AI runs on hardware organisations already own - not in a data centre they don't control. Local inference has no usage meter. At scale, it approaches zero marginal cost.

Cloud-only AI is unsustainable on privacy, latency, connectivity and cost. The industry has to move to the edge. The question is who builds the infrastructure that gets it there.

What changed

The hardware is ready. The models are ready.

From pro laptops to low-power edge devices, AI-ready hardware is already mainstream.

Open-source models have caught up to the frontier.

89%

Of common chatbot queries can be answered correctly by local models.

Stanford University Research, 2026

GPT-4 level, open-weight.

The capability gap between open and closed has been bridged. DeepSeek R1, Qwen 3, and Llama 4 are there.

The problem

The software layer for enterprise edge AI is immature.

Where current runtimes stop

llama.cpp, MLX, Ollama, ExecuTorch, ONNX Runtime

These tools are strong for local experiments, but they stop at raw inference and leave major production gaps.

Performance is also unresolved: current runtimes still leave significant throughput on the table and are rarely tuned to the silicon they actually run on.

The hardware is ready. The models are ready. The infrastructure to deploy them in production is not.

What enterprises still need

  • No model distribution
  • No fleet management
  • No monitoring
  • No audit trails
  • No governance

What we're building

The enterprise stack for edge AI.

Runtime

Natively optimised for each hardware target - Apple Silicon, Nvidia Jetson, commodity x86. We extract what the defaults leave on the table.

Fleet management

Deploy, update and manage local AI across an entire organisation - from a handful of devices to thousands.

Model distribution

Secure, auditable delivery of models to any device or on-premises environment. Nothing in transit to third-party infrastructure.

Monitoring & governance

Usage visibility with the audit trails enterprise and government actually require. In the architecture from day one.

We're building the edge AI stack.
Come talk to us.

Working on edge AI, regulated infrastructure, or want to follow what we're doing - we'd like to hear from you.

Contact