Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
METR's time-horizon of coding tasks does not mean what you think it means (killerstorm.github.io)
1 point by killerstorm 51 days ago | hide | past | favorite | 1 comment


tl;dr: If calculate "the human time horizon using the same methodology as we do for models", it's only 1.5 hours @ 50% success rate for the baseline experts METR hired, and it was surpassed by o3 in April 2025, 6 months ahead METR's prediction.

METR considers this "raw baseline" largely irrelevant as it might be affected by people getting bored / not paid enough, etc. But they admit this introduces a bias which makes reported numbers less relevant for human-vs-AI comparison.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: