Architecture

Two tracks, one owned spine.

StretchAI builds in two tracks on a shared, from-scratch foundation: a general-intelligence line (AGPT) and a dedicated agentic line (SIA). Different jobs, one principle — the weights are always ours.

The spine

Built from nothing, owned end to end.

Most "custom" models are fine-tunes of a licensed foundation. We took the harder path: our own tokenizer, our own architecture, weights trained from random. The same from-scratch pipeline produces every owned model, one after another.

01

AGPT-1 · embeddings

A ~23M-parameter encoder with our own 32k byte-level tokenizer — trained, serving, and built to bring semantic search to the platform.

02

AGPT-Nano · decoder

A 110M-parameter from-scratch decoder, ours from random initialization — a fully-owned generative model, small by design.

03

The agentic brain

The same spine grows the owned weights behind the SIA agentic line — teacher-assisted today, fully our own as it matures.

Track 1 — General

AGPT — Agile Generative Pre-trained Transformer.

AGPT is not one monolithic model. It's a routing layer that dispatches each request to the right size and specialty, with hot-swappable skill modules that add expertise without full retraining — a system-level mixture of experts. The product calls the router; the router resolves which model serves it. Nothing changes for the user when a model graduates underneath.

Sizes — Mini (fast, high-volume), Pro (balanced), Max (deepest reasoning).
Text specialties — code, business, domain, and utility skills, added as modular adapters rather than separate models.
Distinct classes — embeddings, vision, voice, and imaging run on their own runtimes, a separate build track from the text router.

The full catalog — twenty-eight models across eight capability lanes — is registered and graduates one model at a time, from candidate to production, under governed sign-off.

Track 2 — Agentic

SIA — a generation of agents that use tools.

SIA is a dedicated agentic line: a generalist that chats, reasons, and — critically — calls tools reliably. It's scoped deliberately to tool and function calling with bounded-domain agency, not open-ended autonomous coding or web control. The line is numbered by capability era — SIA-1.0 → 1.5 → 2.0 → 3.0 — and the public name stays constant as the brain underneath grows from teacher-assisted to fully owned.

The canonical tokenizer

The foundation everything else is trained against, built once and on purpose: a byte-level BPE vocabulary with atomic tool-call tags (so a tool call is a single, unambiguous token sequence), a real separate pad and stop token (the fix for run-on generation), a shipped chat template so training and serving are identical, and reserved slots for what comes next. License-clean, ours.

A modern decoder

The 2025 decoder stack, implemented from scratch: RMSNorm · RoPE · SwiGLU · grouped-query attention (GQA) · QK-Norm · z-loss, tied embeddings, no biases. Right-sized for reliability and on-prem serving rather than raw scale.

The verified tool-call dataset — our moat

We don't borrow agent datasets; we generate our own and verify every example. Open models author candidate tool calls, then a rule verifier checks each one against the real tool schema — correct tool name, required parameters present, no hallucinated keys, right types — and only verified examples survive. The set spans single, parallel, and dependent calls, plus a heavy share of abstention cases (knowing when not to call a tool). Tens of thousands of verified examples, growing.

Training and the harness

Post-training is light supervised fine-tuning, then reinforcement learning where the verifier becomes the reward — the model is rewarded for tool calls that actually pass schema validation. At serving time a tool-execution harness runs the loop: the model proposes a call, the harness validates and executes it, then feeds the result back. The model never executes anything itself. That harness — planning, memory, grounding, scoped search — is where most of the agent's real-world value lives.

Modalities as tools

Vision, image generation, speech, and retrieval are wired in as tools the agent orchestrates, self-hosted from the best open models rather than rented behind an API. We don't try to build every model in the world — the moat is the owned agent that conducts them, on infrastructure we control.

Evaluation

Measured, not asserted.

SIA-1.0 is trained and measured against a held-out harness that includes tools the model never saw in training — so we're scoring generalization, not memorization.

95.2%SIA-1.0 (3B) overall, internal harness

100%Correct tool selection

100%Valid tool-call format

Novel toolsGeneralizes beyond training

These are our own numbers from an in-development harness, reported honestly: the remaining gap is abstention — knowing when to hold back — which is the explicit target of the next generation's reinforcement stage. We're wiring the public, comparable benchmarks (BFCL, τ-bench) next, with a stated goal of matching the best small open agent models at three billion parameters or fewer.

Honest bound: a small model does not match a frontier model's raw intelligence — that's a scale gap no fine-tuning closes. "Acts like a frontier agent" means the system — a small, reliable core plus the harness — scoped to its domain. That's the bet, stated plainly.

Ownership

Owned now, more owned every generation.

The general AGPT models — AGPT-1 and AGPT-Nano — are ours from random initialization: no foundation model underneath. The agentic SIA line is teacher-assisted today: open models help generate and distill the training signal, but the weights we ship are always our own, and the from-scratch brain takes over entirely as the line matures. Today, StretchGPT's live chat already runs on one of our own models, fine-tuned and served on our own hardware. We never ship someone else's weights. Most AI is rented. Ours is owned — and we can show you exactly how.