I use Claude Code daily to work on a large Python codebase and I'm yet to see the it hallucinating a variable or method (I always ask it to write and run unit tests, so that may be helping). Anyway, I don't think that's a problem at all, most problems I face with AI-generated code are not solved by a borrow-checker or a compiler: bad architecture, lack of forward-thinking, hallucinations in the contract of external API calls, etc.
Unless we figure out how to make 1 billion+ tokens multimodal context windows (in a commercially viable way) and connect them to Google Docs/Slack/Notion/Zoom meetings/etc, I don't think it will simplify that much. Most of the work is adjusting your mental model to the fact that the agent is a stateless machine that starts from scratch every single time and has little-to-no knowledge besides what's in the code, so you have to be very specific about the context of the task in some ways.
It's different from assigning a task to a co-worker who already knows the business rules and cross-implications of the code in the real world. The agent can't see the broader picture of the stuff it's making, it can go from ignoring obvious (to a human that was present in the last planning meeting) edge cases to coding defensively against hundreds of edge cases that will never occur, if you don't add that to your prompt/context material.
The wealthiest people in tech aren't spending 10s of billions on this without the expectation of future profits. There's risk, but they absolutely expect the bets to be +EV overall.
I have an idea for a reverse turing test where humans have to convince an LLM that they are an LLM. I suspect that most people would fail, proving that humans lack intelligence.
I used to watch archived episodes of Computer Chronicles on YouTube almost every night before going to bed back in 2016~2018. It was my bedtime entertainment, watching those recordings from another era of computing and observing the hosts' enthusiasm for things we take for granted today. As a late millennial, it helped me experience a bit of what the 80s and 90s were like in computing.
reply