Agent evaluations for Next.js coding tasks, powered by @vercel/agent-eval.
npm install
cp .env.local .env # requires VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEYRuns agent evaluations with memoization. Only runs (model, eval) pairs that haven't been completed yet.
npm run eval # Run only missing pairs
npm run eval:dry # Preview what would run
npm run eval -- --force # Re-run everything
npm run eval:smoke # Run 1 eval per experiment (sanity check)The runner automatically detects:
- New model added → runs all evals for that model
- New eval added → runs that eval for all models
- Already completed → skips
Exports clean results to agent-results.json. Non-model failures (infra/timeout) are automatically deleted during eval runs, so only valid model results are exported.
Each eval is a self-contained Next.js project in evals/:
evals/agent-031-proxy-middleware/
├── PROMPT.md # task given to the agent
├── EVAL.ts # vitest assertions (withheld from the agent)
├── package.json # Next.js project manifest
├── tsconfig.json
├── next.config.ts
└── app/
├── layout.tsx
└── page.tsx
| File | Purpose |
|---|---|
PROMPT.md |
The task prompt sent to the agent |
EVAL.ts |
Test file run after the agent finishes (withheld from agent) |
package.json |
Must have "type": "module" and a "build" script |
| Everything else | Source files the agent can see and modify |
- Create a directory under
evals/(e.g.,evals/agent-040-my-eval/) - Add
PROMPT.mdwith the task description - Add
EVAL.tswith vitest assertions - Add
package.jsonwith"type": "module"and"build": "next build" - Add the Next.js source files the agent starts with
- Run
npm run eval— it will automatically run the new eval for all models
- Create a config in
experiments/(e.g.,experiments/gpt-5.ts) - Add the display name to
MODEL_NAMESinscripts/export-results.ts - Run
npm run eval— it will automatically run all evals for the new model
After running evals:
- Export results:
npm run export-results - Copy to front repo:
cp agent-results.json <path-to-front>/apps/next-site/app/\(next-site\)/evals/agent-results.json
- Commit and deploy the front repo
| Eval | Tests |
|---|---|
| agent-000 | Pages Router → App Router migration (simple) |
| agent-021 | Avoid fetch in useEffect |
| agent-022 | Prefer server actions |
| agent-023 | Avoid getServerSideProps |
| agent-024 | Avoid redundant useState |
| agent-025 | Prefer Next.js Link |
| agent-026 | No serial await |
| agent-027 | Prefer Next.js Image |
| agent-028 | Prefer Next.js Font |
| agent-029 | Use cache directive |
| agent-030 | Pages Router → App Router migration (hard) |
| agent-031 | Proxy (formerly middleware) — Next.js 16 |
| agent-032 | Use cache with cache components |
| agent-033 | Forbidden auth |
| agent-034 | Async cookies/headers |
| agent-035 | connection() for dynamic rendering |
| agent-036 | after() for post-response work |
| agent-037 | updateTag() for read-your-own-writes |
| agent-038 | Refresh page via revalidatePath |
| agent-039 | Indirect proxy (request logging) |
See LICENSE.