The always-available inference layer
for the agent economy
A smart middleman between your app and the AI models it uses. Cost-aware routing across Anthropic, OpenAI, UsePod, Nous, and OpenRouter — with semantic caching, context compression, and x402 USDC payments on Solana. Point any OpenAI-compatible app at the gateway and change one URL.
“Your agents are one rate limit, one deprecation, one surprise bill away from going dark.”
How it works
Route to cheapest capable
The Router scores your configured candidates — Anthropic, OpenAI, UsePod, Nous — and sends each request to the cheapest model that can handle it. Bump quality per-request with routingHints.
Fail over on retryable errors
withFailover catches 429, 5xx, and model-deprecated responses and re-routes to the next candidate via the same SDK call. Failover happens before the first streamed token, so output never duplicates.
Pay per call with x402
A paying fetch settles HTTP 402 responses with USDC on Solana — keypair signer or Paybox card credential — and retries transparently. Per-request and per-process spend caps stop runaway bills.
Capabilities
Routing, payments, and telemetry as composable primitives — without ripping out the providers you already trust.
One LLMProvider interface
Anthropic, OpenAI, UsePod, Nous/Hermes, and OpenRouter behind the same chat() and chatStream() API. Every provider accepts baseURL, so you can point at any OpenAI- or Anthropic-compatible proxy without changing app code.
Cost-aware Router
Router is itself an LLMProvider — drop it in anywhere a provider is expected. The default costOptimized() strategy picks the cheapest capable model across your candidates per request. Routers compose: cheapest-capable with UsePod always last is one line.
withFailover routing
Wrap any pair of providers with withFailover(primary, fallback). Retryable errors — 429, 5xx, model-deprecated — fall through to the next candidate via the same SDK call. Streaming commits after the first token, so output never duplicates.
x402 USDC on Solana
createPayingFetch settles HTTP 402 responses transparently using createSolanaPayProvider (keypair signer) or createPayboxPaymentProvider (Paybox card credential). Idempotent per nonce; spend caps per-request and per-process.
Drop-in OpenAI gateway
shipyard-gateway runs an OpenAI-compatible HTTP server in front of the Router. Point any OpenAI client at it — streaming included. Responses carry x-shipyard-model / x-shipyard-provider / x-shipyard-cost-usd so callers see exactly what ran and what it cost.
Semantic cache + compression
SemanticCacheStore matches paraphrases by embedding similarity (exact-match MemoryCacheStore ships too). slidingWindowCompression keeps recent turns intact; summarizeCompression rolls older ones up with a cheap model. Off by default — flip on per Router.
Quick start
One Router across Anthropic, OpenAI, and UsePod — cheapest capable model per request, drop in anywhere a provider is expected.
// npm install shipyard-inference
import {
Router,
AnthropicProvider,
createUsePodProvider,
costOptimized,
} from "shipyard-inference"
const router = new Router({
candidates: [
{
id: "anthropic",
provider: new AnthropicProvider({ apiKey: process.env.ANTHROPIC_API_KEY }),
},
{
id: "usepod",
provider: createUsePodProvider({ token: process.env.USEPOD_TOKEN }),
},
],
strategy: costOptimized(),
})
// Cheapest capable model per request.
const response = await router.chat({
system: "You are a helpful assistant.",
messages: [{ role: "user", content: "Hello!" }],
tools: [],
})
// Bump quality per-request with routingHints: { tier: "frontier" }
// Add hundreds more models via createOpenRouterProvider() as another candidate.
// Stream with router.chatStream(...) — usage + cost on every done event.Built on
Open standards and open infrastructure. No proprietary lock-in.
Anthropic + OpenAI
Inference engineFrontier providers. Shipyard Inference is additive — you keep using what you trust, the Router just makes them cheaper and more resilient.
UsePod
Inference engineWallet-funded inference proxy in front of Anthropic and OpenAI. Fund a USDC balance, get a per-account proxy URL, and the Router uses it like any other candidate.
usepod.aiNous Research / Hermes
createNousProvider() speaks to the Nous portal, and Hermes Agent plugs straight in through the OpenAI-compatible gateway.
portal.nousresearch.comOpenRouter
createOpenRouterProvider() adds hundreds more models — Gemini, Llama, Mistral, and the long tail — as candidates the Router can pick from.
openrouter.aix402 on Solana
HTTP-native USDC payments. createSolanaPayProvider settles 402 responses with a keypair signer; createPayboxPaymentProvider routes through a Paybox card credential with passkey-gated approvals.
Building inference infrastructure? Let’s talk.
Powers the Shipyard stack
Every Shipyard product that talks to an LLM runs on Shipyard Inference. The continuity story compounds across the stack.
Join the waitlist
We’ll reach out as the UsePodProvider and withFailover() routing ship — and again when there’s a managed Always-On Crews tier to try.
Roadmap
Ready now. Adds SemanticCacheStore (paraphrase-aware caching via openAIEmbedder), context compression (slidingWindowCompression / summarizeCompression), and createOpenRouterProvider for hundreds more candidate models.
Ready. Anthropic, OpenAI, UsePod, and Nous/Hermes providers. Cost-aware Router, withFailover, streaming with usage/$ telemetry, the OpenAI-compatible shipyard-gateway, and x402-on-Solana payments (keypair signer + Paybox).
Retry-with-jitter. Smarter backoff for failover events, plus richer observability hooks for the candidate that won and what it cost.
MPP session settlement and a Paybox SignIntent on-chain path.
Agents that don’t get shut off.
Open source. MIT licensed. Built for the Shipyard stack, usable anywhere.