Development

Tool Ingestion

How tokenuse discovers, validates, parses, deduplicates, and prices local AI tool records.

Tool Ingestion

tokenuse reads usage data directly from local files written by AI coding tools. There is no proxy, no platform API key, no telemetry endpoint, and no live watcher. Optional quota sync actions can write local limit sidecars under the tokenuse config directory; Copilot quota sync uses the existing local Copilot OAuth token, while Claude.ai and ChatGPT (Codex) quota sync use a session cookie you store in the OS keychain. All three are opt-in and triggered explicitly from the Config page.

The UI calls these sources tools. Internally each one is implemented as a ToolAdapter under src/tools/<name>/.

Supported Tools

Tool	Status	Source format	Token quality	Doc
Claude Code	implemented	JSONL session files under `~/.claude/projects/`, Claude Desktop agent sessions, optional status-line limits sidecar	exact usage, cache reads/writes, tool calls, file-backed 5h/weekly limit snapshots	claude-code.md
Cursor	implemented	unified `state.vscdb`, AgentKv/request context, `~/.cursor/chats/**/store.db`, transcripts, and AI tracking	one canonical user turn; exact/mixed/estimated provenance; tools, files, code, timing and modes	cursor.md
Codex	implemented	JSONL rollouts under `~/.codex/sessions/`	exact per-turn token-count deltas	codex.md
GitHub Copilot	implemented	JSONL events from legacy CLI, VS Code Copilot Chat transcripts, optional quota sidecar	legacy output exact when present; transcripts estimated; quota snapshots from confirmed local sync	copilot.md
Gemini	implemented	JSON/JSONL chat sessions under `~/.gemini/tmp/<project_hash>/chats/`	exact usage, cache reads, thoughts, tool calls	gemini.md
Claude.ai subscription	implemented (limits-only)	sidecar written by opt-in Config-page sync of `claude.ai/api/organizations/{uuid}/usage` and `/overage_spend_limit`	exact 5h / 7d / Opus / Sonnet / Extra Usage gauges; rendered inside the Claude Code section	claude-subscription.md
ChatGPT (Codex) subscription	implemented (limits-only)	sidecar written by opt-in Config-page sync of `chatgpt.com/backend-api/wham/usage`	exact 5h / 7d / credits gauges; rendered inside the Codex section	codex-subscription.md

Data Path

flowchart LR
    A["local tool files"] --> B["adapter.discover()"]
    B --> C["Vec<SessionSource>"]
    C --> D["archive source fingerprint"]
    D -->|changed| E["adapter.parse(source, seen)"]
    E --> F["append ParsedCall rows"]
    C --> L["adapter.parse_limits(source)"]
    L --> M["append LimitSnapshot rows"]
    F --> G["archive.db"]
    M --> G
    G --> H["DashboardData"]
    E -. "dedup_key" .-> I["shared seen set"]

The same seen: &mut HashSet<String> is shared across every tool adapter during one sync, so re-reading the same local record only contributes once. The archive also enforces uniqueness on (tool, dedup_key), which lets changed sources be reparsed without duplicating historical calls.

ParsedCall also carries archive v5 interaction/token/timestamp quality. Its transient superseded_dedup_keys is used only by Cursor’s unified reconstruction: exact legacy rows are removed transactionally after their canonical turn is accepted and the list itself is never persisted.

Internal Adapter Contract

All tool adapters implement the same trait in src/tools/mod.rs:

pub trait ToolAdapter: Send + Sync {
    fn id(&self) -> &'static str;
    fn display_name(&self) -> &'static str;
    fn discover(&self) -> Result<Vec<SessionSource>>;
    fn parse(
        &self,
        source: &SessionSource,
        seen: &mut HashSet<String>,
    ) -> Result<Vec<ParsedCall>>;
    fn parse_limits(&self, source: &SessionSource) -> Result<Vec<LimitSnapshot>> { /* default */ }
    fn source_fingerprint(&self, source: &SessionSource) -> Result<String> { /* default */ }

    fn tool_display(&self, tool: &str) -> String { /* default */ }
}

ParsedCall from src/tools/types.rs is the normalized record every adapter emits and every dashboard aggregator consumes. Adapters preserve the observed model id; the shared registry resolves display name, canonical id, provider, and family after parsing. See Model normalisation for that contract and architecture.md for field meanings and aggregation behavior.

Pricing

Pricing is embedded as two books: costs/pricing-upstream.json for broad upstream coverage and costs/pricing-overrides.json for official rows, aliases, tool-scoped rows, provenance, and effective dates. Usage ingestion never fetches pricing; the Config page can download local pricing-upstream.json and pricing-overrides.json books only after confirmation. See Pricing and cache rates for source evidence and tool-specific caveats.

cost = multiplier * (
    input_tokens * input_rate
  + output_tokens * output_rate
  + cache_creation_input_tokens * cache_write_rate
  + cache_read_input_tokens * cache_read_rate
  + web_search_requests * web_search_rate
)

Model lookup canonicalizes model names, resolves tool-scoped aliases first, applies effective dates, then falls back through global aliases and prefix matches to a default Sonnet row. cursor-auto is a direct Cursor Auto pricing row. Claude Opus fast mode applies the row’s fast_multiplier.

Refresh the embedded maintainer books with:

cargo run -- --refresh-prices

Adding a New Tool

Create src/tools/<name>/{mod.rs, config.rs, discovery.rs, parser.rs}.
Put every path, env var, glob, SQL query, and source constant in config.rs.
Implement ToolAdapter in mod.rs and register it in tools::registry().
Add a variant to app::Tool, update its label and cycle order, and update ingest::matches_tool.
Add display names in aggregation helpers such as tool_short_label when needed.
Override source_fingerprint only when the default file/directory metadata fingerprint is too broad or too narrow for the source.
Write docs/development/tools/<name>.md and add it to the supported tools table above.
Add parser tests for source validation, token mapping, deduplication, project detection, and tool/bash extraction.

Verification

cargo test runs parser unit tests, pricing lookup tests, aggregation tests, and render smoke tests.
cargo run launches the TUI and falls back to sample data when the archive has no local calls.
cargo run -- --list-projects syncs the archive and prints normalized project/tool inventory rows for debugging source attribution.