Development

Tool Ingestion

How tokenuse discovers, validates, parses, deduplicates, and prices local AI tool records.

Tool Ingestion

tokenuse reads usage data directly from local files written by AI coding tools. There is no proxy, no platform API key, no telemetry endpoint, and no live watcher. Optional quota sync actions can write local limit sidecars under the tokenuse config directory; Copilot quota sync uses the existing local Copilot OAuth token, while Claude.ai and ChatGPT (Codex) quota sync use a session cookie you store in the OS keychain. All three are opt-in and triggered explicitly from the Config page.

The UI calls these sources tools. Internally each one is implemented as a ToolAdapter under src/tools/<name>/.

Supported Tools

ToolStatusSource formatToken qualityDoc
Claude CodeimplementedJSONL session files under ~/.claude/projects/, Claude Desktop agent sessions, optional status-line limits sidecarexact usage, cache reads/writes, tool calls, file-backed 5h/weekly limit snapshotsclaude-code.md
CursorimplementedSQLite state.vscdb and ~/.cursor/projects/**/agent-transcriptsexact when tokenCount exists; estimated fallback otherwisecursor.md
CodeximplementedJSONL rollouts under ~/.codex/sessions/exact per-turn token-count deltascodex.md
GitHub CopilotimplementedJSONL events from legacy CLI, VS Code Copilot Chat transcripts, optional quota sidecarlegacy output exact when present; transcripts estimated; quota snapshots from confirmed local synccopilot.md
GeminiimplementedJSON/JSONL chat sessions under ~/.gemini/tmp/<project_hash>/chats/exact usage, cache reads, thoughts, tool callsgemini.md
Claude.ai subscriptionimplemented (limits-only)sidecar written by opt-in Config-page sync of claude.ai/api/organizations/{uuid}/usage and /overage_spend_limitexact 5h / 7d / Opus / Sonnet / Extra Usage gauges; rendered inside the Claude Code sectionclaude-subscription.md
ChatGPT (Codex) subscriptionimplemented (limits-only)sidecar written by opt-in Config-page sync of chatgpt.com/backend-api/wham/usageexact 5h / 7d / credits gauges; rendered inside the Codex sectioncodex-subscription.md

Data Path

flowchart LR A["local tool files"] --> B["adapter.discover()"] B --> C["Vec<SessionSource>"] C --> D["archive source fingerprint"] D -->|changed| E["adapter.parse(source, seen)"] E --> F["append ParsedCall rows"] C --> L["adapter.parse_limits(source)"] L --> M["append LimitSnapshot rows"] F --> G["archive.db"] M --> G G --> H["DashboardData"] E -. "dedup_key" .-> I["shared seen set"]

The same seen: &mut HashSet<String> is shared across every tool adapter during one sync, so re-reading the same local record only contributes once. The archive also enforces uniqueness on (tool, dedup_key), which lets changed sources be reparsed without duplicating historical calls.

Internal Adapter Contract

All tool adapters implement the same trait in src/tools/mod.rs:

pub trait ToolAdapter: Send + Sync {
    fn id(&self) -> &'static str;
    fn display_name(&self) -> &'static str;
    fn discover(&self) -> Result<Vec<SessionSource>>;
    fn parse(
        &self,
        source: &SessionSource,
        seen: &mut HashSet<String>,
    ) -> Result<Vec<ParsedCall>>;
    fn parse_limits(&self, source: &SessionSource) -> Result<Vec<LimitSnapshot>> { /* default */ }
    fn source_fingerprint(&self, source: &SessionSource) -> Result<String> { /* default */ }

    fn model_display(&self, model: &str) -> String { /* default */ }
    fn tool_display(&self, tool: &str) -> String { /* default */ }
}

ParsedCall from src/tools/types.rs is the normalized record every adapter emits and every dashboard aggregator consumes. See architecture.md for field meanings and aggregation behavior.

Pricing

Pricing is embedded as two books: costs/pricing-upstream.json for broad upstream coverage and costs/pricing-overrides.json for official rows, aliases, tool-scoped rows, provenance, and effective dates. Usage ingestion never fetches pricing; the Config page can download local pricing-upstream.json and pricing-overrides.json books only after confirmation. See Pricing and cache rates for source evidence and tool-specific caveats.

cost = multiplier * (
    input_tokens * input_rate
  + output_tokens * output_rate
  + cache_creation_input_tokens * cache_write_rate
  + cache_read_input_tokens * cache_read_rate
  + web_search_requests * web_search_rate
)

Model lookup canonicalizes model names, resolves tool-scoped aliases first, applies effective dates, then falls back through global aliases and prefix matches to a default Sonnet row. cursor-auto is a direct Cursor Auto pricing row. Claude Opus fast mode applies the row’s fast_multiplier.

Refresh the embedded maintainer books with:

cargo run -- --refresh-prices

Adding a New Tool

  1. Create src/tools/<name>/{mod.rs, config.rs, discovery.rs, parser.rs}.
  2. Put every path, env var, glob, SQL query, and source constant in config.rs.
  3. Implement ToolAdapter in mod.rs and register it in tools::registry().
  4. Add a variant to app::Tool, update its label and cycle order, and update ingest::matches_tool.
  5. Add display names in aggregation helpers such as tool_short_label when needed.
  6. Override source_fingerprint only when the default file/directory metadata fingerprint is too broad or too narrow for the source.
  7. Write docs/development/tools/<name>.md and add it to the supported tools table above.
  8. Add parser tests for source validation, token mapping, deduplication, project detection, and tool/bash extraction.

Verification

  • cargo test runs parser unit tests, pricing lookup tests, aggregation tests, and render smoke tests.
  • cargo run launches the TUI and falls back to sample data when the archive has no local calls.
  • cargo run -- --list-projects syncs the archive and prints normalized project/tool inventory rows for debugging source attribution.