v1.0 · Apr 9, 2026

Scoring Rubric

Eight dimensions, five maturity levels, weighted by team size. Every criterion is tool-agnostic and research-backed. Old scores stay valid under the version they were scored against. See the changelog for the full evolution from v0.1 to v1.0.

Core Principles

Tool-agnostic

We score practices, never tools. A team using only open-source can hit a perfect score. Every dimension passes the Tool Swap Test: swap tools for equivalents and the score moves ≤±5 points.

Research-backed

Every level criterion cites a primary source (IBM documentation study, Anthropic attention research, AVRS framework, AI Harness Scorecard).

Versioned, never retroactive

Old scores remain valid under the version they were scored against. Users can opt to re-score under the latest rubric.

Publicly documented

Full rubric, weights, criteria, and changelog are published and diffable.

Maturity Levels

None
0–20
Basic
21–40
Developing
41–60
Advanced
61–80
Elite
81–100

Weights by Team Size

Indie weights skew toward what one person controls. Large team weights shift toward organizational capabilities that prevent drift at scale.

DimensionIndieSmall (2-10)Medium (10-50)Large (50+)
Project Context Files0.200.180.140.10
Memory & Persistence0.200.140.100.08
Documentation-as-Context0.080.120.140.14
Team Context Sharing0.020.100.140.16
Context Window Optimization0.100.080.100.10
Code Organization for AI0.100.140.120.10
Tool Configuration0.200.140.100.10
Measurement & Feedback Loops0.100.100.160.22
Total1.001.001.001.00

Detailed Criteria

Dimension 1

Project Context Files

CLAUDE.md, .cursorrules, AGENTS.md, copilot-instructions.md: existence, quality, freshness

None020

No AI context files in repo

Basic2140

One file exists, minimal (<50 lines), no architecture info, stale (>90 days)

Developing4160

Context files cover basics, updated within 90 days, includes some coding guidelines

Advanced6180

Multiple files (CLAUDE.md + .cursorrules or AGENTS.md), includes architecture + dev guidelines + testing commands, updated within 30 days

Elite81100

Hierarchical context (project/subdirectory/personal), CI-checked for freshness, includes examples of correct/incorrect patterns, auto-maintained or reviewed weekly

Dimension 2

Memory & Persistence

Cross-session memory, hierarchical context (global → project → subdirectory), session management

None020

No persistent context between AI sessions; start from scratch every time

Basic2140

Manually paste context at session start, or rely on single context file

Developing4160

Use tool-native memory features (Claude memory, Cursor notepads), clear sessions between tasks

Advanced6180

Hierarchical context (global → project → local), session naming conventions, compaction strategies documented

Elite81100

Cross-IDE persistent memory, automated memory consolidation, project-aware context that follows developers across tools

Dimension 3

Documentation-as-Context

ARCHITECTURE.md, ADRs, API specs (OpenAPI), README depth: docs AI can consume

None020

No README or only a stub; no architecture docs

Basic2140

README exists with setup instructions; no architecture or decision docs

Developing4160

README + basic ARCHITECTURE.md; some inline code comments for complex logic

Advanced6180

ARCHITECTURE.md with component relationships + data flow, 3+ ADRs, API specs (OpenAPI/Swagger), structured for AI consumption

Elite81100

Living architecture docs updated with code changes, comprehensive ADR library, machine-readable API specs, docs treated as load-bearing infrastructure

Dimension 4

Team Context Sharing

Shared rules, org-level instructions, team prompts, shared skills/subagents, CLAUDE.local.md pattern

None020

Each developer maintains their own AI setup independently

Basic2140

One shared context file checked into git

Developing4160

Shared rules files + convention for personal overrides (CLAUDE.local.md pattern)

Advanced6180

Shared skills/subagent definitions, team prompt templates, org-level instructions, onboarding includes AI context setup

Elite81100

Cross-team context sharing, quarterly reviews to prune/consolidate, context updates triggered by agent failures, institutional knowledge encoded in durable artifacts

Dimension 5

Context Window Optimization

Token budget management, context ordering, compression, subagent delegation

None020

No awareness of token limits; paste everything and hope

Basic2140

Aware of limits; manually trim context when hitting walls

Developing4160

Follow basic ordering (important info first), use session clearing between tasks

Advanced6180

Critical rules at beginning of context, current work at end, delegate investigations to subagents, use compaction with preservation directives

Elite81100

Measured token budgets, automated context pruning, semantic chunking, subagent architecture for parallel investigation, compress-then-query patterns

Dimension 6

Code Organization for AI

Naming consistency, type safety, directory predictability, monorepo structure, co-located tests

None020

Inconsistent naming, no type system, unpredictable directory structure

Basic2140

Some naming convention, basic directory structure, mix of typed/untyped code

Developing4160

Consistent naming convention enforced by linter, TypeScript with partial strict mode, predictable directories

Advanced6180

Full TypeScript strict mode, consistent naming across entire codebase, co-located tests, explicit dependency injection, monorepo with clear package boundaries

Elite81100

Type-safe cross-package contracts, naming linted at CI, small focused files, predictable directory layouts documented for AI

Dimension 7

Tool Configuration

MCP servers, hooks, IDE-specific configs, plugins, skill/subagent definitions

None020

Default tool settings, no customization

Basic2140

One IDE configured (e.g., basic .cursorrules or VS Code settings)

Developing4160

Multiple tools configured, basic MCP server setup, some hooks

Advanced6180

MCP servers for external integrations (GitHub, databases, monitoring), hooks for deterministic actions, IDE-specific configs per team member

Elite81100

Full MCP ecosystem (5+ servers), custom skills/subagents for common workflows, plugins shared across team, hooks that enforce code quality gates, CI integration

Dimension 8

Measurement & Feedback Loops

AI code quality tracking, acceptance rates, turnover metrics, AI vs human code quality comparison

None020

No tracking of AI effectiveness

Basic2140

Anecdotal sense of AI helps; no data

Developing4160

Track AI-assisted PRs or time savings informally; occasional team retros on AI workflows

Advanced6180

Measure AI code acceptance rate (target 25-45%), track AI vs human code turnover ratio (<1.3x), PR cycle time, quarterly reviews

Elite81100

AI code quality dashboard, automated turnover tracking, context quality metrics across sessions, feedback loops that update context files based on measured failures, ROI tracking

Methodology

The Context Management Index evolved through 7 versions over 6 months (October 2025 → April 2026) based on external research from the IBM documentation study, Anthropic's attention papers, the AVRS framework, and the AI Harness Scorecard. Methodology and research credits maintained by the ByteRover research team.

Scoring is a pure function: (signals, rubric, teamSize) => score. Same inputs always produce the same output. The function is shared between the web app, CLI, and API.

Read the full development history in the rubric changelog.

Frequently asked questions

The questions CTOs and engineering leaders ask us most often.

What is the Context Management Index?
The Context Management Index is a public benchmark that scores engineering teams on AI context management maturity across 8 research-backed dimensions: project context files, memory & persistence, documentation-as-context, team context sharing, context window optimization, code organization for AI, tool configuration, and measurement & feedback loops.
How is the score calculated?
Each of the 8 dimensions is scored 0–100, then combined into a weighted composite (also 0–100). Weights vary by team size: indie weights skew toward what one person controls, while large-team weights shift toward organizational practices. The scoring engine is a pure function: same inputs always produce the same output.
Is the rubric tool-agnostic?
Yes. The rubric scores practices, never specific tools. A team using only open-source (Claude Code + CLAUDE.md + claude-mem) can achieve a perfect score. Every dimension passes the Tool Swap Test: if a team swaps their tool for an equivalent one, the score stays within ±5 points.
What are the 8 dimensions of AI context management?
The 8 dimensions are: (1) Project Context Files (CLAUDE.md, .cursorrules, AGENTS.md), (2) Memory & Persistence, (3) Documentation-as-Context (ARCHITECTURE.md, ADRs, OpenAPI), (4) Team Context Sharing, (5) Context Window Optimization, (6) Code Organization for AI, (7) Tool Configuration (MCP servers, hooks, skills), and (8) Measurement & Feedback Loops.
What are the maturity levels?
There are 5 maturity levels per dimension: None (0–20), Basic (21–40), Developing (41–60), Advanced (61–80), and Elite (81–100). Each level has specific, auditable criteria backed by research from sources like the IBM contextual documentation study, Anthropic attention papers, and the AVRS framework.
How often is the rubric updated?
The rubric is versioned. v1.0 is the first public release; v0.1 through v0.9 were internal research drafts that evolved based on external research. New versions are released as new research emerges, but old scores never change retroactively. A score under v1.0 stays valid under v1.0 forever.
Can I score against an older rubric version?
Yes. Every historical version (v0.1 through v1.0) remains scoreable for academic and comparison purposes. The full version history with diffs between versions is published on the rubric changelog page.
Does using ByteRover give my team a higher score?
No. The rubric scores practices, not tools. ByteRover, Mem0, claude-mem, Cursor, Claude Code, and every other tool appear in 'Tools that can help' recommendations, but never in scoring criteria. Top leaderboard positions are held by teams using a wide variety of tool stacks, including fully open-source ones.
How does my team get assessed?
Four ways: a 16-question quiz (5 minutes, no signup), a GitHub repo scan (read-only, metadata only), a CLI scan run locally, or a file upload (CLAUDE.md, .cursorrules, AGENTS.md). All four methods feed the same scoring function and produce a comparable score.
Who built the Context Management Index?
The Context Management Index is built by ByteRover and operated as an independent public benchmark. ByteRover is the publisher; the research methodology is maintained by the ByteRover research team. Scoring is independent of ByteRover and every other vendor. The rubric scores practices, not products.