Hacker Newsnew | past | comments | ask | show | jobs | submit | throwawayffffas's commentslogin

I made a small tiled map editor in under two hours, including frequent interruptions. Traditionally it would have taken me at least twice that realistically maybe 8 hours with all my nit picking compulsions.

That's funny that's all the things I don't trust it to do. I actually use it the other way around, give it a big non-specific task, see if it works, specify better, retry, throw away 60% - 90% of the generated code, fix bugs in a bunch of places and out comes an implemented feature.

Agreed. Claude is horrible at munging git history and can destroy the thing I depend on to fix Claude's messes. I always do my git rebasing by hand.

The first iteration of Claude code is usually a big over-coded mess, but it's pretty good at iterating to clean it up, given proper instruction.


I give the agent the following standing instructions:

"Make the smallest possible change. Do not refactor existing code unless I explicitly ask."

That directive cut down considerably on the amount of extra changes I had to review. When it gets it right, the changes are close to the right size now.

The agent still tries to do too much, typically suggesting three tangents for every interaction.


LLM generated code is technical debt. If you are still working on the codebase the next day it will bite you. It might be as simple as an inconvenient interface, a bunch of duplicated functions that could just be imported, but eventually you are going to have to pay it.

All code is technical debt though. We can't spend infinite hours finding the absolute minima of technical debt introduced for a change, so it is just finding the right balance. That balance is highly dependent on a huge amount of factors: how core is the system, what is the system used for, what stage of development is the system, etc.

I spend about half my day working on LLM-generated code and half my day working on non-LLM-generated code, some written by senior devs, some written by juniors.

The LLM-generated code is by far the worst technical debt. And a fair bit of that time is spent debugging subtle issues where it doesn't quite do what was prompted.


Untested undocumented LLM code is technical debt, but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like. You just need testing to be so good it guarantees the behavior you care about, and that is easier in our age of AI coding agents.

> but if you do specs and tests it's actually the opposite, you can go beyond technical debt and regenerate your code as you like.

Having to write all the specs and tests just right so you can regenerate the code until you get the desired output just sounds like an expensive version of the infinite monkey theorem, but with LLMs instead of monkeys.


You can have it write the specs and tests, too, and review and refine them much faster than you could write them.

... so you hand-write the specs and tests?

I use LLMs to generate tests as well, but sometimes the tests are also buggy. As any competent dev knows, writing high-quality tests generally takes more time than writing the original code.


In your comment replace “LLM” with “Human SWE” and statement will still be correct in vast majority of the situations :)

That's legit true. All code is technical debt. Human SWEs have one saving grace. Sometimes they refactor and reduce some of the debt.

A human SWE can use an LLM to refactor and reduce some of the debt just as easily too. I think fundamentally, the possible rate of new code and new technical debt introduced by LLMs is much higher than a human SWE. Left unchecked, a human still needs sleep and more humans can't be added with more compute.

There's an interesting aspect to the LLM debt being taken on though in that I'm sure some are taking it on now in the bet/hopes that further advancements in LLMs will make it more easily addressable in the future before it is a real problem.


Are people not reviewing and refactoring LLM code?

> Hearing people on tech twitter say that LLMs always produce better code than they do by hand was pretty enlightening for me.

That's hilarious LLM code is always very bad. It's only merit is it occasionally works.

> LLMs can produce better code for languages and domains I’m not proficient in.

I am sure that's not true.


I think it says more about who's still on tech twitter vs. anything about the llm....

It seems true by construction. If you're not proficient in a language than the bar for "better than you" is necessarily lower.

If you are an experienced developer in general your code will in general be better than an LLMs even in languages you are not proficient. The LLM code might be more in keeping with the languages conventions but that does not make the code automatically superior. LLMs typically produce bad code in other dimensions.

> But doing "prompt-driven development" or "vibe coding" with an Agentic LLM was an incredibly disapointing experience for me. It required an immense amount of baby sitting, for small code changes, made slowly, which were often wrong. All the while I sat there feeling dumber and dumber, as my tokens drained away.

Yeah I find they are useful for large sweeping changes, introducing new features and stuff, mostly because they write a lot of the boilerplate, granted with some errors. But for small fiddly changes they suck, you will have a much easier time doing these changes your self.


Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas.

At least until recently with a lot of models the following scenario was almost certain:

User: You must not say elephant under any circumstances.

User: Write a small story.

Model: Alice and bob.... There that's a story where the word elephant is not included.


I use qubes OS and don't fear they will destroy my system. But I have never seen them try to do stuff outside of the working dir. Has your experience been different?

Do you remember analog TVs? Switching channels was a sub second affair.

It was sub frame. You would literally see the set re-sync to the new timing (since each station's vblank would not necessarily be happening at the same time).

I remember our first digital TV crashing and needing to reboot it.

"Wow"! we said. This is the future. Having to reboot the TV.


Use a linter that can auto fix some of the problems and have an automatic formatter. Ruff can do both. It will decrease your cleanup workload.

Don't get too hanged up on typing. Pythons duck typing is a feature not a bug. It's ok to have loose types.

On duplicate code, in general you should see at least two examples of a pattern before trying to abstract it. Make sure the duplication/similarity is semantic and not incidental, if you abstract away incidental duplication, you will very quickly find yourself in a situation where the cases diverge and your abstraction will get in your way.

In general coding agents are technical debt printers. But you can still pay it off.


Totally agree on the debt printer metaphor. I might steal it.

The value is in the reliability, not necessarily the speed. In most cases 24 hours are not enough for me to give you a full rundown of what I want done. You can make it 10k and a week and you will probably get better results. I say this with no actual knowledge but my intuition is that the overlap between people that will engage with a 24 hour turnaround promise and the people that have the authority to spend 5k is much smaller than the corresponding week/10k combination.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: