Shipyard Inference

Cut your LLM bill
without changing providers

Shipyard Inference gives production teams a self-hostable gateway and hosted control plane for routing, caching, compression, budgets, and reliability across Anthropic, OpenAI, UsePod, Nous, and OpenRouter. Start with one workload. Prove savings in days, not quarters.

Run a savings audit View on GitHub

“Your agents are one rate limit, one deprecation, one surprise bill away from going dark.”

How it works

Instrument one workload

Start with a single production app. Shipyard records usage, cost, cache hits, and routing decisions so you can see the baseline clearly before you change traffic.

Prove savings in shadow mode

Compare Shipyard's routing and caching against the customer's default model path, then show the delta in a signed report they can trust.

Move to a managed plan

Once the baseline is trusted, customers can graduate to the hosted control plane for policy, budgets, reliability, and optional SLA coverage.

Capabilities

Routing, payments, and telemetry as composable primitives — without ripping out the providers you already trust.

One LLMProvider interface

Anthropic, OpenAI, UsePod, Nous/Hermes, and OpenRouter behind the same chat() and chatStream() API. Every provider accepts baseURL, so you can point at any OpenAI- or Anthropic-compatible proxy without changing app code.

Cost-aware Router + auto-tier

Router is itself an LLMProvider — drop it in anywhere a provider is expected. costOptimized() picks the cheapest capable model per request; auto-tier infers the quality floor from the prompt (size, tools, requested output) so it lands on the cheapest model that's good enough, not the globally cheapest. Routers compose, and routingHints.tier always overrides.

withFailover routing

Wrap any pair of providers with withFailover(primary, fallback). Retryable errors — 429, 5xx, model-deprecated — fall through to the next candidate via the same SDK call. Streaming commits after the first token, so output never duplicates.

x402 USDC on Solana

createPayingFetch settles HTTP 402 responses transparently using createSolanaPayProvider (keypair signer) or createPayboxPaymentProvider (Paybox card credential). Idempotent per nonce; spend caps per-request and per-process.

Drop-in OpenAI gateway

shipyard-gateway runs an OpenAI-compatible HTTP server in front of the Router. Point any OpenAI client at it — streaming included. Responses carry x-shipyard-model / x-shipyard-provider / x-shipyard-cost-usd, and it auto-reports telemetry for every app that passes through, no per-app wiring.

Semantic cache + compression

SemanticCacheStore matches paraphrases by embedding similarity (exact-match MemoryCacheStore ships too). slidingWindowCompression keeps recent turns intact; summarizeCompression rolls older ones up with a cheap model. Off by default — flip on per Router.

Operator command center

One central dashboard for every place Shipyard inference runs. A drop-in reporter — or the gateway automatically — feeds every routing decision, failover, retry, cache hit, and completed request to a hub that persists to disk and serves a live SPA: requests/RPM, actual vs baseline vs saved $, gross margin, latency p50/p95/p99, per-user/model/provider breakdowns, and billing & settlement.

Provable savings telemetry

Set a baselineModel and every request reports what it actually cost vs what the baseline would have — measured, not guessed (the baseline now falls back to the request's intended model). MemoryUsageRecorder rolls up per-user totals; default-on Anthropic prompt caching bills the stable prefix at the ~0.1x cache-read rate.

Quick start

One Router across Anthropic, OpenAI, and UsePod — cheapest capable model per request, drop in anywhere a provider is expected.

shipyard-inference

// npm install shipyard-inference

import {
  Router,
  AnthropicProvider,
  createUsePodProvider,
  costOptimized,
} from "shipyard-inference"

const router = new Router({
  candidates: [
    {
      id: "anthropic",
      provider: new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY }),
    },
    {
      id: "usepod",
      provider: createUsePodProvider({ token: process.env.USEPOD_TOKEN }),
    },
  ],
  strategy: costOptimized(),
})

// Cheapest capable model per request.
const response = await router.chat({
  system: "You are a helpful assistant.",
  messages: [{ role: "user", content: "Hello!" }],
  tools: [],
})

// Auto-tier infers the quality floor per request; routingHints: { tier: "frontier" } overrides.
// Add hundreds more models via createOpenRouterProvider() as another candidate.
// Stream with router.chatStream(...) — usage + cost on every done event.
// Wire onEvent to the operator hub for live cost / savings / margin across every app.

Want to see it end to end? examples/chat-portal is a runnable chat UI on the Router — boots in demo mode with no keys, and shows live cost, baseline, and savings on every reply.

Built on

Open standards and open infrastructure. No proprietary lock-in.

Anthropic + OpenAI

Inference engine

Frontier providers. Shipyard Inference is additive — you keep using what you trust, the Router just makes them cheaper and more resilient.

UsePod

Inference engine

Wallet-funded inference proxy in front of Anthropic and OpenAI. Fund a USDC balance, get a per-account proxy URL, and the Router uses it like any other candidate.

usepod.ai

Nous Research / Hermes

createNousProvider() speaks to the Nous portal, and Hermes Agent plugs straight in through the OpenAI-compatible gateway.

portal.nousresearch.com

OpenRouter

createOpenRouterProvider() adds hundreds more models — Gemini, Llama, Mistral, and the long tail — as candidates the Router can pick from.

openrouter.ai

x402 on Solana

HTTP-native USDC payments. createSolanaPayProvider settles 402 responses with a keypair signer; createPayboxPaymentProvider routes through a Paybox card credential with passkey-gated approvals.

Building inference infrastructure? Let’s talk.

Powers the Shipyard stack

Every Shipyard product that talks to an LLM runs on Shipyard Inference. The continuity story compounds across the stack.

Dock

Consumer-facing first mate. First Shipyard product to consume the SDK.

Explore

Shipyard OS

Agent command center. Always-On Crews tier sits on top of Shipyard Inference.

Explore

Buoy

API gating. Wrapped APIs pay their own LLM bills via the same primitive.

Explore

Join the waitlist

We’ll reach out as the UsePodProvider and withFailover() routing ship — and again when there’s a managed Always-On Crews tier to try.

Roadmap

v0.14

Ready now. The operator command center — a central hub + live dashboard that the gateway feeds automatically (telemetry for every app, no per-app wiring), showing actual vs baseline vs saved $, margin, latency, and billing & settlement. Plus auto-tier routing: the Router infers each request’s quality floor and lands on the cheapest model that’s good enough.

v0.12

Ready. payboxSettle() — the billing half: meter accrued USDC, then settle it on-chain from a Paybox wallet to a treasury. Provable savings, with the baseline falling back to the request’s intended model.

v0.5–v0.11

Ready. Semantic cache + compression, createOpenRouterProvider, the full Paybox surface (payboxSigner / payboxSecret), MPP session settlement, createWalletInference, UsePod prepaid funding, and retry-with-jitter.

v0.4

Ready. Anthropic, OpenAI, UsePod, and Nous/Hermes providers. Cost-aware Router, withFailover, streaming with usage/$ telemetry, the OpenAI-compatible shipyard-gateway, and x402-on-Solana payments (keypair signer + Paybox).

Agents that don’t get shut off.

Open source. MIT licensed. Built for the Shipyard stack, usable anywhere.

Get Started with Shipyard Inference View on GitHub

Cut your LLM billwithout changing providers

How it works

Instrument one workload

Prove savings in shadow mode

Move to a managed plan

Capabilities

One LLMProvider interface

Cost-aware Router + auto-tier

withFailover routing

x402 USDC on Solana

Drop-in OpenAI gateway

Semantic cache + compression

Operator command center

Provable savings telemetry

Quick start

Built on

Anthropic + OpenAI

UsePod

Nous Research / Hermes

OpenRouter

x402 on Solana

Powers the Shipyard stack

Dock

Shipyard OS

Buoy

Join the waitlist

Roadmap

Agents that don’t get shut off.

Cut your LLM bill
without changing providers