· 13 min read
#research
7 posts
· 3 min read
Human-Certified Module Repositories: Trust Infrastructure for AI-Assembled Code
· 3 min read
SWE-CI: Can AI Agents Actually Maintain a Codebase Over Time?
· 13 min read
Vibe Code Bench: Best AI Model Hits 58% on Real Web App Development
· 12 min read
VIBEPASS: AI Models Can Write Code But Cannot Find Their Own Bugs
· 8 min read
The Wrong Benchmark: Why "Human-Level" Misses What Actually Matters in AI Refactoring
· 10 min read