Tool parser

GitHub Copilot

Copilot CLI and VS Code transcript ingestion, model inference, and tool normalization.

GitHub Copilot

Copilot has two supported on-disk layouts: the legacy CLI agent under ~/.copilot/ and VS Code Copilot Chat transcripts under workspace storage. tokenuse reads both through src/tools/copilot/.

Status: implemented.

Where the Data Lives

Legacy CLI Agent

~/.copilot/session-state/<session-id>/
    events.jsonl
    workspace.yaml

workspace.yaml is parsed for a scalar cwd: line and used as the project path. events.jsonl is the timeline.

VS Code Extension

PlatformWorkspace storage
macOS~/Library/Application Support/Code/User/workspaceStorage/<hash>/
macOS Insiders~/Library/Application Support/Code - Insiders/User/workspaceStorage/<hash>/
Linux~/.config/Code/User/workspaceStorage/<hash>/
Linux Insiders/server~/.config/Code - Insiders/User/workspaceStorage/<hash>/, ~/.vscode-server/data/User/workspaceStorage/<hash>/
Windows%APPDATA%/Code/User/workspaceStorage/<hash>/
Windows Insiders%APPDATA%/Code - Insiders/User/workspaceStorage/<hash>/

Inside each workspace hash directory:

GitHub.copilot-chat/transcripts/<session>.jsonl

A transcript file only parses as Copilot when its first line has type == "session.start" and data.producer == "copilot-agent". When that session.start event includes data.context.cwd, the cwd is the authoritative project path. If absent, tokenuse falls back to workspace.yaml, the VS Code workspace.json folder name, and then the workspace hash.

flowchart TD A["legacy session-state dir"] --> B["events.jsonl"] A --> C["workspace.yaml cwd"] D["VS Code workspaceStorage"] --> E["transcripts/*.jsonl"] J["tokenuse limits/copilot.json"] --> K["quota_snapshots"] E --> F["first line data.producer == copilot-agent"] B --> G["legacy parser"] F --> H["transcript parser"] C --> G C --> H G --> I["ParsedCall output"] H --> I K --> L["LimitSnapshot output"]

Record Format

Legacy events.jsonl

Legacy events store their payload under data. A legacy assistant message only emits a ParsedCall when the current model has been set by session.model_change and data.outputTokens is positive.

{ "type": "session.model_change",
  "timestamp": "2026-04-26T10:00:00Z",
  "data": { "newModel": "claude-sonnet-4-5" } }

{ "type": "user.message",
  "timestamp": "2026-04-26T10:00:01Z",
  "data": { "content": "fix the typo in README" } }

{ "type": "assistant.message",
  "timestamp": "2026-04-26T10:00:02Z",
  "data": {
    "messageId": "m1",
    "outputTokens": 220,
    "toolRequests": [
      { "toolCallId": "tooluse_xyz", "name": "bash",
        "arguments": "{\"command\":\"ls -la | wc -l\"}" },
      { "toolCallId": "tooluse_yyy", "name": "edit_file" }
    ]
  } }

VS Code Transcripts

VS Code transcript payloads also live under data. The parser validates the first session.start line, uses data.context.cwd for the project path, and estimates tokens from message text.

{ "type": "session.start",
  "data": {
    "sessionId": "x",
    "producer": "copilot-agent",
    "model": "gpt-5",
    "context": { "cwd": "/Users/me/Code/tokens" }
  } }

{ "type": "user.message",
  "data": { "content": "hello world" } }

{ "type": "assistant.message",
  "data": {
    "messageId": "abc",
    "content": "sure thing",
    "reasoningText": "let me think",
    "toolRequests": [
      { "toolCallId": "toolu_bdrk_01ZZ", "name": "read_file" },
      { "toolCallId": "toolu_bdrk_02YY", "name": "edit_file" }
    ]
  } }

The current transcript parser does not use data.model for pricing. It infers one model alias per transcript from tool-call id prefixes.

Token & Cost Mapping

ParsedCall fieldLegacy sourceVS Code transcript source
input_tokens0latest data.content.len() / 4, rounded up
output_tokensdata.outputTokensdata.content.len() / 4 plus data.reasoningText.len() / 4, both rounded up, unless explicit data.outputTokens exists
reasoning_tokens0data.reasoningText.len() / 4, rounded up
cache_creation_input_tokens00
cache_read_input_tokens00
modellatest session.model_change.data.newModelinferred alias from tool-call ids
timestamptop-level timestamp, parsed as RFC3339top-level timestamp when present; otherwise None
projectworkspace.yaml cwd:, then discovered sourcesession.start.data.context.cwd, then workspace.yaml, then VS Code workspace.json folder name or workspace hash

Transcript reasoning tokens are preserved in reasoning_tokens and folded into output_tokens so estimated transcript cost includes generated reasoning text.

Model Inference

When parsing VS Code transcripts, count recognized data.toolRequests[].toolCallId prefixes across the whole transcript and use the most common alias:

PrefixAliasPricing target
toolu_bdrk_anthropic-autoSonnet alias
toolu_vrtx_anthropic-autoSonnet alias
tooluse_anthropic-autoSonnet alias
call_openai-autoGPT-5 alias

If no recognized prefix appears, the parser uses copilot-auto, which currently falls through pricing lookup to the snapshot fallback.

GitHub’s usage-based Copilot billing starts on June 1, 2026 and includes cached tokens, but these local transcript sources do not expose reliable cache buckets today. tokenuse therefore keeps cache_read_input_tokens and cache_creation_input_tokens at 0 for Copilot and treats local cost as an estimate. See Pricing and cache rates.

Deduplication

  • Legacy: copilot:<session_id>:<message_id>, where session_id is the parent directory name and message_id is data.messageId.
  • VS Code: copilot:<session_id>:<message_id>, where session_id is the transcript file stem and message_id is data.messageId.

Tools / Bash Extraction

Walk data.toolRequests[] and normalize each name:

Copilot nameNormalized
bash, run_in_terminal, kill_terminalBash
read_fileRead
edit_file, write_file, replace_string_in_file, apply_patchEdit
create_fileWrite
delete_fileDelete
search_files, file_searchGrep
find_filesGlob
list_directory, list_dirLS
web_searchWebSearch
fetch_webpageWebFetch
github_repoGitHub
memoryMemory

For Bash-class calls, parse arguments as a JSON string and split command or cmd with tools::jsonl::split_bash_commands.

flowchart LR A["data.toolRequests array"] --> B["normalize tool name"] A -->|bash class| C["parse arguments JSON"] C --> D["command or cmd"] D --> E["split_bash_commands"] B --> F["tools"] E --> G["bash_commands"]

Known Limitations

  • Legacy events without a positive data.outputTokens value are skipped.
  • Legacy input tokens are currently recorded as 0 because the legacy format only exposes output tokens in the supported path.
  • VS Code transcript token counts are estimates based on chars / 4.0; treat Copilot totals as approximate.
  • VS Code data.model is currently ignored for pricing; tool-call id inference picks one model alias for the whole transcript. Auto aliases are displayed as Copilot-specific model buckets.
  • workspace.yaml parsing reads only the scalar cwd: line used by Copilot session-state files. If Copilot starts writing richer YAML, replace the small parser with a YAML crate.

Rate-limit snapshots

Copilot transcripts do not include quota state. tokenuse imports Copilot limits from a local sidecar:

<config dir>/tokenuse/limits/copilot.json

The sidecar can be either the raw GET https://api.github.com/copilot_internal/user payload or the wrapper object written by the Config page sync action:

{
  "observed_at": "2026-01-15T12:00:00Z",
  "source": "https://api.github.com/copilot_internal/user",
  "payload": {
    "copilot_plan": "individual_pro",
    "quota_reset_date": "2026-02-01",
    "quota_snapshots": {
      "premium_interactions": {
        "entitlement": 300,
        "percent_remaining": 31.16,
        "remaining": 93,
        "unlimited": false,
        "timestamp_utc": "2026-01-15T12:02:00Z"
      }
    }
  }
}

tokenuse skips unlimited snapshots with no entitlement, converts percent_remaining into used_percent, and emits one LimitSnapshot per constrained quota key. quota_reset_date is treated as a monthly reset at 00:00 UTC unless a future quota key indicates a weekly window.

The Config page’s Copilot sync action is explicit and confirmed. It reads the existing GitHub Copilot OAuth token from local github-copilot config files, fetches the quota payload from GitHub, writes the sidecar above, then syncs the archive so Usage gauges update immediately. Builds without the quota-sync feature keep this action unavailable.