Meter AI tokens
Meter LLM input and output tokens end to end — declared in the product, reported from the backend.
Meter LLM input and output tokens end to end — declared in the product, reported from the backend.
An AI proxy bills on tokens, not request counts. The pattern is a two-part loop:
declare token dimensions as @Meters and bind them to a chat route's reports
in the product, then send the real per-request token counts from your backend
with withUsage. The gateway reserves an estimate before the call and settles
the actual value on the response.
This is the same reports + withUsage mechanism as
Add a metered capability, specialized for LLM
usage. We build a small llm-api product to keep the example self-contained.
estimate is the pre-request admission value the gateway holds before the
upstream reports the truth — set it to a typical input size so admission checks
are realistic.
// product/product.config.ts
import { Product, Requests, Meter, Feature, Plan } from "@farthershore/product";
@Product({
name: "llm-api",
origin: "https://api.llm.example.com",
displayName: "LLM API",
})
export default class LlmApi {
@Requests()
requests!: unknown;
// SUM meters: the upstream reports a value per request; the platform adds
// them up. `estimate` is reserved pre-request for input tokens.
@Meter("input_tokens", { display: "Input Tokens", unit: "token", estimate: 500 })
inputTokens!: unknown;
@Meter("output_tokens", { display: "Output Tokens", unit: "token", estimate: 500 })
outputTokens!: unknown;
@Feature("chat", {
plans: ["pro"],
routes: {
// Both token meters are dynamic reports; you can override the input-side
// admission estimate per route if this endpoint runs larger prompts.
"POST /v1/chat/completions": {
reports: ["input_tokens", "output_tokens"],
estimates: { input_tokens: 1000 },
},
},
})
chat!: unknown;
// Pay-as-you-go: no recurring fee, priced per token. micros are micro-dollars
// per unit — 15 micros = $0.000015 / input token.
@Plan("pro", {
name: "Pro",
meter: {
input_tokens: { micros: 15 },
output_tokens: { micros: 60 },
},
limits: { requests: { rate: 600, interval: "minute", enforcement: "enforce" } },
})
pro!: unknown;
}
A meter listed under reports must carry an estimate (here, via the meter's
own estimate or the route's estimates). A pre-request meter with no estimate
fails the build — the gateway needs a number to reserve before your upstream
runs. requests = 1 is still inherited on every metered route automatically.
Install @farthershore/backend, set FS_RUNTIME_TOKEN (see
Operate via an agent for minting one), verify the
gateway signature, then return the model's usage figures with withUsage. The
helper signs usage into the gateway response path — no extra network call.
import { fartherShore, withUsage } from "@farthershore/backend";
const fs = fartherShore.initFromEnv(); // derives everything from FS_RUNTIME_TOKEN
export async function POST(request: Request) {
const url = new URL(request.url);
const body = new Uint8Array(await request.clone().arrayBuffer());
// Fail-closed verification: identity comes only from the gateway's signature,
// never from the plaintext X-FS-* headers.
await fs.verifyRequest({
method: request.method,
path: url.pathname,
query: url.search,
headers: request.headers,
body,
});
const completion = await callModel(await request.json());
return withUsage(
request,
Response.json(completion),
{
input_tokens: completion.usage.prompt_tokens,
output_tokens: completion.usage.completion_tokens,
},
// Optional free-form pricing/analytics context persisted with the event.
{ measureContext: { model: completion. } },
);
}
The meter keys you pass to withUsage must match the @Meter keys in the
product (input_tokens, output_tokens); a mismatch is rejected. Values are
validated locally before signing — they must be non-negative finite numbers.
If your upstream is Express, verify with the middleware and report the same way:
const fs = fartherShore.initFromEnv();
app.use(fs.middleware()); // fail-closed verify → req.fartherShore
farthershore build --format json
Push product/**, then drive a real chat request through a
test persona and read the breakdown:
# Token meters appear per-dimension once traffic flows.
farthershore usage summary llm-api --format json
farthershore build succeeds and the IR lists input_tokens + output_tokens
as SUM meters with estimates.POST /v1/chat/completions call is allowed and forwarded.input_tokens / output_tokens in the usage summary match the model's
usage.prompt_tokens / usage.completion_tokens.reports + withUsage loop.FS_RUNTIME_TOKEN and a test persona via the CLI.