TokenLensAI

$5 free credits · no card required · 60-second setup

Cut your LLM bill 40-70%.
Auditable to the token.

One line change to your OpenAI base URL. We semantically cache, route to the cheapest capable model, and price every request against a versioned book. Your provider keys, your data, your savings.

Drop-in for OpenAI & Anthropic SDKs Bring your own provider keys No prompt storage by default SOC 2 controls in place

↓ try it now, no signup ↓

terminal · drop-in compatible

# Anonymous demo — no API key, no signup, 8 calls/hour per IP
curl https://tokenlens.co.in/api/public/v1/demo \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"explain semantic caching in one line"}]}'

# Production — same OpenAI-compatible contract, your provider keys
curl https://tokenlens.co.in/api/public/v1/chat/completions \
  -H "Authorization: Bearer $TOKENLENS_KEY" \
  -d '{"model":"auto","messages":[{"role":"user","content":"hi"}]}'

Requests routed (7d)

—

Models tracked live

—

Avg semantic cache hit

—

Public model index

Live

Public Model Economics Index — see the data

Try it now — no signup

Run a real call. See your savings.

Hit the public demo endpoint with your own prompt. We use the actual provider token count to size your monthly savings — no estimation.

Live demo · gemini-2.5-flash-lite · 8 free calls/hr

Requests per day50,000

Avg input tokens 1500

Avg output tokens 500

Current default model (what you'd use without TokenLens)

Semantic cache hit35%

Routed to cheaper model60%

Estimated monthly spend

$13,125$3,617

You save with TokenLens

$9,508/ month (72%)

≈ $114,093 per year

Compare savings across providers

Same workload, three baselines — click a row to set it as your default.

50,000 req/day · 1500→500 tok

Baseline model	Monthly spend	With TokenLens	You save	%
GPT-4o OpenAI	$13,125	$3,617	$9,508	72%
Claude Sonnet 4 Anthropic	$18,000	$4,885	$13,115	73%
Gemini 2.5 Pro Google	$10,313	$2,886	$7,427	72%

Cache hit 35% · 60% routed to Gemini Flash-Lite. Adjust the sliders above to see the comparison update live.

How it works

Live in 10 minutes, not 10 weeks

One config change. Your provider keys stay yours. We never train on your data.

Point to TokenLens

Swap your OpenAI or Anthropic base URL. No SDK rewrite, no prompt changes — your existing code keeps working.

baseURL: 'https://tokenlens.co.in/api/public/v1'

We govern every request

Each call is matched against the semantic cache, scored against routing policy, redacted for PII, and forwarded to the cheapest capable model.

cache · route · redact · forward

Observe, attribute, forecast

Live stream every request, attribute spend per team/prompt, forecast next-month cost, and run evals — all from one console.

/live · /cost-attribution · /forecast · /evals

The platform

Nine surfaces. One console.

Everything you need to run production LLMs — already shipping, not on a roadmap.

Smart Model Routing

Per-request routing across OpenAI, Anthropic, Gemini and self-hosted models with quality-aware fallbacks.

Open Smart Model Routing

Semantic Cache

Vector-aware cache returns sub-50ms answers for semantically similar prompts — zero model call.

Open Semantic Cache

Governance & Guardrails

Policy engine with PII redaction, prompt firewalls, region restriction and per-key budgets.

Open Governance & Guardrails

Live Stream

Every request as it lands — inspect raw provider usage, latency, route decision, cache verdict.

Open Live Stream

Cost Attribution

Tag spend by team, customer, prompt or feature flag. Chargeback-ready exports.

Open Cost Attribution

Spend Forecast

Trend-based projections with confidence bands and budget burn alerts before you blow the month.

Open Spend Forecast

Evals

Side-by-side prompt & model evals on your real traffic. Promote winners without redeploying.

Open Evals

Batch Pipelines

Inngest-powered batch jobs for offline scoring, backfills and reconciliation.

Open Batch Pipelines

Provider Health

Real-time provider status, latency and error rates feed routing decisions automatically.

Open Provider Health

Developer platform

Drop-in SDKs and a full Management API

Native Node and Python packages that extend the official OpenAI and Anthropic clients. A REST API for provisioning keys, budgets and policies from Terraform, Helm or your CI.

@tokenlens/sdk (Node) & tokenlens (Python) — extend OpenAI/Anthropic clients
REST: GET / POST / PATCH / DELETE on /api/public/v1/admin/keys
Terraform & Helm snippets for declarative key + policy management
OpenAI-compatible /chat/completions, /messages, /embeddings, /models

node · openai-compatible

import OpenAI from "@tokenlens/sdk/openai";

const ai = new OpenAI({
  apiKey: process.env.TOKENLENS_KEY,
  // baseURL is already tokenlens.co.in/api/public/v1
});

const res = await ai.chat.completions.create({
  model: "gpt-4o-mini",       // routed → cheapest capable
  messages: [{ role: "user", content: "summarize this PR" }],
});

// usage, cache verdict & route decision exposed on res._tokenlens
console.log(res._tokenlens.cache_hit, res._tokenlens.routed_to);

How we count tokens

Every number is auditable

We don't estimate your bill — we record the provider's own usage payload, price it against a dated book, and re-verify it on a schedule.

Provider returns exact usage

Every OpenAI, Anthropic and Gemini response includes a usage object with prompt_tokens and completion_tokens. We record it byte-for-byte — no estimation.

token_source = exact

Priced against a versioned book

Costs come from model_price_history, snapshotted at the moment of the request. Old rows keep their historical price even when providers re-price.

price_at(model, created_at)

Reconciled hourly, automatically

A scheduled job re-verifies every request from the last 7 days against the price book and rolls up per-day savings — dashboards never drift.

cron · 15 * * * *

Open any request in /live → Inspect to see the raw provider usage payload that produced the number.

Enterprise-grade

Built for the things procurement asks about

PII redaction

Localized regex engine strips emails, phones, Aadhaar/PAN, card numbers before the prompt leaves your tenant.

Per-key budgets

Daily and monthly USD caps enforced at the gateway — keys auto-throttle when they hit the line.

Region restriction

Pin a key to specific provider regions for data-residency compliance.

Soft-revoke audit

DELETE never destroys — keys are soft-revoked with timestamp for forensic replay.

RBAC

Admin role enforced via a separate user_roles table and a SECURITY DEFINER has_role() function.

Versioned price book

Costs snapshotted at request time so historical analytics never drift.

Pricing

Pay for value, not tokens

Start free. Upgrade when your usage grows.

Starter

$5 free credits + $50/mo gateway spend. No card.

$5 free credits, no card required
$50/mo gateway spend included
1 workspace
OpenAI-compatible gateway
Semantic cache + Live stream
Governance Marketplace access
Community support

Ship your LLM control plane today

Free to start. Bring your own provider keys. See your first auditable cost report in minutes.

Cut your LLM bill 40-70%. Auditable to the token.