正文

TokenPoints：用美元重新定义软件工作量估算

当AI代理成为代码创作的主力，传统的工时和故事点估算方法已显过时。TokenPoints框架提出用LLM推理成本（美元）作为新的工作量度量标准，为AI驱动的软件开发提供了一种诚实、可验证的估算方式。

AI开发工作量估算软件工程LLM成本敏捷开发项目管理

发布时间 2026/04/29 23:43最近活动 2026/04/29 23:52预计阅读 9 分钟

章节 01

TokenPoints: Re-defining Software Workload Estimation with Dollars (Main Guide)

TokenPoints: Re-defining Software Workload Estimation with Dollars

Abstract: When AI agents become the main force in code creation, traditional time and story point estimation methods are outdated. The TokenPoints framework proposes using LLM reasoning cost (in dollars) as a new workload measurement standard, providing an honest and verifiable estimation method for AI-driven software development.

Core Insight: TokenPoints addresses the question of 'how much AI resources are needed for a task' by using dollar-denominated LLM token costs as an objective, verifiable metric, replacing time or story points which are subjective or irrelevant in the AI era.

章节 02

Background: Obsolescence of Traditional Estimation Methods

Background: Paradigm Shift in Estimation

Software engineering workload estimation has long been a challenge. Traditional methods use time as a proxy, but it confuses time with complexity, ignores individual differences, and is easily distorted. Agile story points attempt to solve these issues but are abstract, unfalsifiable, and manipulable.

By 2026, as AI agents become the main code creators, the question 'how many hours to complete this task' loses meaning. The key question becomes: 'how much AI resources are needed?' TokenPoints was created to answer this.

章节 03

Core Idea: Dollars as an Honest Workload Measure

Core Principles of TokenPoints

TokenPoints' core insight: In AI-driven development, workload cost is reflected in LLM token consumption, which translates to quantifiable dollars—an objective, verifiable, cross-team metric. It has six pillars:

Dollars are more honest than time: Time records human input, not actual workload. AI takes most coding work, so model cost reflects computational complexity.
Differences are information, not noise: Cost variations across teams/codebases signal real complexity distribution.
Results over output: Prioritize business value over code lines—less token cost for key issues is better than more for marginal improvements.
Local calibration is critical: Default scales are starting points; teams must calibrate based on their codebase, models, and tools.
Multi-dimensional thinking: Dollar cost is one dimension—consider technical debt, maintenance, learning curves.
Human time still matters: Humans are irreplaceable in需求理解, architecture, review, testing—separate AI cost from human time.

章节 04

TokenPoints Scale System: From XS to XL

TokenPoints Scale System

The framework defines scales from XS to XL, each with cost ranges and scenarios:

Scale	Cost Range	Typical Scenarios	Human Time
XS	< $1	Precise edits, auto-completion	<30 mins
S	$1-$8	Single-file feature/fix,5-15 dialogues	30 mins-2h
M	$8-$40	Multi-file feature,15-40 dialogues	2-8h
L	$40-$160	Refactoring, deep debugging, cross-module changes	1-3 days
XL	$160-$400	Architecture changes, multi-system coordination	3+ days
??	Unknown	Need exploration (spike)	Time-boxed

Tasks over XL must be split; if not, it signals insufficient understanding. Scales are calibrated for 2026 AI dev sessions but teams should adjust for context (large codebases, complex dependencies, model choices).

章节 05

Implementation Path: Gradual Adoption

Implementation Steps

TokenPoints advocates gradual adoption:

Read Manifesto: Spend ~5 mins understanding the six pillars. If you disagree with core principles, it may not fit your team.
Familiarize with Scales: ~10 mins to grasp XS-XL levels (no need to memorize).
Trial Template: Use the estimation template for 10 tasks (2 iterations) to collect data—don’t change workflows yet.
Team Calibration: After 2 iterations, adjust scale boundaries using actual data to build team-specific benchmarks.
Integrate into Workflows: Once calibrated, integrate into Scrum/Kanban using provided guides.

章节 06

Data Collection & Calibration

Data Collection for Calibration

Effective calibration requires tracking:

Initial TokenPoints estimate
Actual token count and dollar cost
Model combinations (cost varies by model)
Codebase size/complexity
Task type (new feature, bug fix, refactoring)
Delivered business value

Analyze data to identify estimation biases (e.g., which tasks are under/overestimated) and adjust scale definitions.

章节 07

Common Misuses & Avoidance Strategies

Common Mistakes & How to Avoid

Over-optimizing cost: Don’t sacrifice code quality/maintainability for lower costs—prioritize results over savings.
Ignoring human time: Track both AI cost and human time (需求澄清, architecture, review are critical).
Rigid scale application: Default scales are starting points—calibrate for your team’s context.
Ignoring context: Same task may cost differently across codebases—consider maturity, tech stack, team familiarity.

章节 08

Conclusion & Community Contribution

Conclusion

TokenPoints is an honest approach: it acknowledges estimation difficulties and uses AI-era tools for clearer metrics. It’s not a panacea but a data-driven starting point for planning.

For AI-transforming teams, TokenPoints offers a chance to re-think workload—embrace dollar-based honest metrics to focus on cost prediction and value.

Community Contribution

TokenPoints is v0.1—feedback on name, scales, principles is welcome. Most valuable contributions are anonymized calibration data (averages, models, codebase context). The project uses CC BY 4.0 license (free to use/fork with attribution).