Mar 23, 2026 · 13 min read 150 Claude Code Agents Got the Same Data. They Produced Different Results. #ai-agents #research #reproducibility #claude-code #llm-evaluation
Mar 23, 2026 · 12 min read VIBEPASS: AI Models Can Write Code But Cannot Find Their Own Bugs #ai-code-quality #research #code-review #vibepass #llm-evaluation