Next.js Evals

Agent evaluations for Next.js coding tasks, powered by @vercel/agent-eval.

Setup

npm install
cp .env.local .env   # requires VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEY

Scripts

`npm run eval`

Runs agent evaluations with memoization. Only runs (model, eval) pairs that haven't been completed yet.

npm run eval              # Run only missing pairs
npm run eval:dry          # Preview what would run
npm run eval -- --force   # Re-run everything
npm run eval:smoke        # Run 1 eval per experiment (sanity check)

The runner automatically detects:

New model added → runs all evals for that model
New eval added → runs that eval for all models
Already completed → skips

`npm run export-results`

Exports clean results to agent-results.json. Non-model failures (infra/timeout) are automatically deleted during eval runs, so only valid model results are exported.

Eval structure

Each eval is a self-contained Next.js project in evals/:

evals/agent-031-proxy-middleware/
├── PROMPT.md        # task given to the agent
├── EVAL.ts          # vitest assertions (withheld from the agent)
├── package.json     # Next.js project manifest
├── tsconfig.json
├── next.config.ts
└── app/
    ├── layout.tsx
    └── page.tsx

File	Purpose
`PROMPT.md`	The task prompt sent to the agent
`EVAL.ts`	Test file run after the agent finishes (withheld from agent)
`package.json`	Must have `"type": "module"` and a `"build"` script
Everything else	Source files the agent can see and modify

Adding a new eval

Create a directory under evals/ (e.g., evals/agent-040-my-eval/)
Add PROMPT.md with the task description
Add EVAL.ts with vitest assertions
Add package.json with "type": "module" and "build": "next build"
Add the Next.js source files the agent starts with
Run npm run eval — it will automatically run the new eval for all models

Adding a new model

Create a config in experiments/ (e.g., experiments/gpt-5.ts)
Add the display name to MODEL_NAMES in scripts/export-results.ts
Run npm run eval — it will automatically run all evals for the new model

Publishing to nextjs.org/evals

After running evals:

Export results: npm run export-results

Copy to front repo:

cp agent-results.json <path-to-front>/apps/next-site/app/\(next-site\)/evals/agent-results.json

Commit and deploy the front repo

Current evals

Eval	Tests
agent-000	Pages Router → App Router migration (simple)
agent-021	Avoid fetch in useEffect
agent-022	Prefer server actions
agent-023	Avoid getServerSideProps
agent-024	Avoid redundant useState
agent-025	Prefer Next.js Link
agent-026	No serial await
agent-027	Prefer Next.js Image
agent-028	Prefer Next.js Font
agent-029	Use cache directive
agent-030	Pages Router → App Router migration (hard)
agent-031	Proxy (formerly middleware) — Next.js 16
agent-032	Use cache with cache components
agent-033	Forbidden auth
agent-034	Async cookies/headers
agent-035	connection() for dynamic rendering
agent-036	after() for post-response work
agent-037	updateTag() for read-your-own-writes
agent-038	Refresh page via revalidatePath
agent-039	Indirect proxy (request logging)

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next.js Evals

Setup

Scripts

`npm run eval`

`npm run export-results`

Eval structure

Adding a new eval

Adding a new model

Publishing to nextjs.org/evals

Current evals

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
evals		evals
experiments		experiments
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent-results.json		agent-results.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

License

vercel/next-evals-oss

Folders and files

Latest commit

History

Repository files navigation

Next.js Evals

Setup

Scripts

npm run eval

npm run export-results

Eval structure

Adding a new eval

Adding a new model

Publishing to nextjs.org/evals

Current evals

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

`npm run eval`

`npm run export-results`

Packages