Hacker Newsnew | past | comments | ask | show | jobs | submit | SatvikBeri's commentslogin

I think that article was basically wrong. They asked the agent not to provide any commentary, then gave an unsolvable task, and wanted the agent to state that the task was impossible. So they were basically testing which instructions the agent would refuse to follow.

Purely anecdotally, I've found agents have gotten much better at asking clarifying questions, stating that two requirements are incompatible and asking which one to change, and so on.

https://spectrum.ieee.org/ai-coding-degrades


Sure, here are my own examples:

* I came up with a list of 9 performance improvement ideas for an expensive pipeline. Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all. It made prototypes that had bad code quality but tested the core ideas. One approach cut the time down by 50%, I rewrote it with better code and it's saved about $6,000/month for my company.

* My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

* When AWS RDS costs spiked one month, I set Claude Code to investigate and it found the reason was a misconfigured backup setting

* I'll use Claude to throw together a bunch of visualizations for some data to help me investigate

* I'll often give Claude the type signature for a function, and ask it to write the function. It generally gets this about 85% right


>My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

Ok, please help me understand. Or is this more of a nanny?


Not technically a nanny, but not dissimilar. In this case, they do several types of work (house cleaning, watching 1-3 kids, daytime and overnights, taking kids out.) They are very competent – by far the best we've found in 3 years – and charge different rates for the different types of work. We also need to track mileage etc. for reimbursement.

They had a spreadsheet for tracking but I found it moderately annoying – it was taking 5-10 minutes a week, so normally I wouldn't have bothered to write a different tool, but with vibe coding it was fairly trivial.


How did you give Clause access to AWS?

It does ok with using the AWS cli

Just awscli

Why is your babysitting bill so complicated?

There are several different types of work they can do, each one of which has a different hourly rate. The time of day affects the rate as well, and so can things like overtime.

It's definitely a bit of an unusual situation. It's not extremely complicated, but it was enough to be annoying.


Jesus, are you ok? Can’t you just, like, give em a 20 when you get home?

I find it quite funny you’ve invented this overly complex payment structure for your babysitter and then find it annoying. Now you’ve got a CLI tool for it.


why assume the billing model is being imposed by the customer rather than the service provider?

GP has provided an anecdote with no supporting evidence, nor any code examples. So it is as fair to assume the story is a fabrication as much as it is to assume it has any truth to it

I am really shocked at the response this trivial anecdote has gotten.

I could state it much more generically: we had an annoying Excel sheet that took ~10 minutes a week, I vibe coded a command line tool that brought it down to ~1 minute a week. I don't think this is unusual or hard to believe in any way.


Yes! You should absolutely always assume a random stranger on HN is outright lying about a trivial anecdote to farm meaningless karma.

Or instigating conflict?

What...what conflict do you think I'm instigating, exactly? Whether the command line is a better interface than Excel?

I didn't choose the payment structure, and the point is that a CLI is not a high bar. Something that we used to spend ~10 minutes a week on with spreadsheets is now ~1 minute/week.

Why didn’t you work out a more manageable billing structure with them?! Or to put it another way: if it took you 10 minutes a week with spreadsheets to even figure out what their bill is, how on earth did they verify your invoices were even correct? And if they couldn’t—or if it took more than 10 minutes each week—why wouldn’t they prefer a billing system they could verify they were being paid correctly?

Jesus! is this HN or personal finance forum? Who cares why they do it a certain way. Did they ask for your advices?

If you work like this in a company, you’ll end up with overcomplicated mess.

Now, people with Claude Code, are ready to produce a big pile of shit in a short time.


I’m not trying to give advice, I’m just curious about their arrangement. When I did consulting, I hated billing, and would have wanted a system that was as easy as possible.

Are you serious?

“Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all.”

I doubt you verified the boring edge cases.


I mean, as I said, I literally had Claude prototype them, and then I rewrote the working one from scratch. I didn't commit any of the code written by Claude.

I usually use one instance, sometimes two. But this is a reasonable account of Chris Rackaukas using 32 instances at a time to do boilerplate maintenance across a bunch of open source repositories: https://www.stochasticlifestyle.com/claude-code-in-scientifi...

> I have had to spend like 4am-10am every morning Sunday through Saturday for the last 10 years on this stuff before the day gets started just to keep up on the “simple stuff” for the hundreds of repos I maintain. And this neverending chunk of “meh” stuff is exactly what it seems fit to do. So now I just let the 32 bots run wild on it and get straight to the real work, and it’s a gamechanger.


> do wonder how hard it'd be to mask the default "marketing copywriter" tone of the LLM by asking it to assume some other tone in your prompt.

Fairly easy, in my wife's experience. She repeatedly got accused of using chatgpt in her original writing (she's not a native english speaker, and was taught to use many of the same idioms that LLMs use) until she started actually using chatgpt with about two pages of instructions for tone to "humanize" her writing. The irony is staggering.


Setting up a new dev instance took 2+ hours with pip at my work. Switching to uv dropped the Python portion down to <1 minute, and the overall setup to 20 minutes.

A similar, but less drastic speedup applied to docker images.


I handle our company's RDS instances, and probably spend closer to 2 hours a year than 2 hours a month over the last 8 years.

It's definitely expensive, but it's not time-consuming.


Of course. But people also have high uptime servers with long-running processes they barely touch.


That's why I give the LLM a readonly connection


This is much better than MCP, which also stuffs every session's precious context with potentially irrelevant instructions.


They could just make mcps dynamically loaded in the same way no?


It is still worse as it consumes more context giving instructions for custom tooling whereas the LLM already understands how to connect to and query a read-only SQL service with standard tools


> Python is sometimes slower (hot loops), but for that you have Numba

This is a huge understatement. At the hedge fund I work at, I learned Julia by porting a heavily optimized Python pipeline. Hundreds of hours had gone into the Python version – it was essentially entirely glue code over C.

In about two weeks of learning Julia, I ported the pipeline and got it 14x faster. This was worth multiple senior FTE salaries. With the same amount of effort, my coworkers – who are much better engineers than I am – had not managed to get any significant part of the pipeline onto Numba.

> And if something is truly performance critical, it should be written or rewritten in C++ anyway.

Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.


> People usually use C++ or Julia. All of the fastest answers are in Julia

That's surprising to me and piques my interest. What sort of pipeline is this that's faster in Julia than C++? Does Julia automatically use something like SIMD or other array magic that C++ doesn't?


I use Rust instead of C++, but I also see my Julia code being faster than my Rust code.

In my view, it's not that Julia itself is faster than Rust - on the contrary, Rust as a language is faster than Julia. However, Julia's prototyping, iteration speed, benchmarking, profiling and observability is better. By the time I would have written the first working Rust version, I would have written it in Julia, profiled it, maybe changed part of the algorithm, and optimised it. Also, Julia makes more heavy use of generics than Rust, which often leads to better code specialization.

There are some ways in which Julia produces better machine code that Rust, but they're usually not decisive, and there are more ways in which Rust produces better machine code than Julia. Also, the performance ceiling for Rust is better because Rust allows you to do more advanced, low level optimisations than Julia.


This is pretty much it – when we had follow up interviews with the C++ devs, they had usually only had time to try one or two high-level approaches, and then do a bit of profiling & iteration. The Julia devs had time to try several approaches and do much more detailed profiling.


The main thing is just that Julia has a standard library that works with you rather than working against you. The built in sort will use radix sort where appropriate and a highly optimized quicksort otherwise. You get built in matrices and higher dimensional arrays with optimized BLAS/LaPack configured for you (and CSC+structured sparse matrices). You get complex and rational numbers, and a calling convention (pass by sharing) which is the fast one by default 90% of the time instead of being slow (copying) 90% of the time. You have a built in package manager that doesn't require special configuration, that also lets you install GPU libraries that make it trivial to run generic code on all sorts of accelerators.

Everything you can do in Julia you can do in C++, but lots of projects that would take a week in C++ can be done in an hour in Julia.


To be clear, the fastest theoretically possible C++ is probably faster than the fastest theoretically possible Julia. But the fastest C++ that Senior Data Engineer candidates would write in ~2 hours was slower than the fastest Julia (though still pretty fast! The benchmark for this problem was 10ms, and the fastest C++ answer was 3 ms, and the top two Julia answers were 2.3ms and .21ms)

The pipeline was pretty heavily focused on mathematical calculations – something like, given a large set of trading signals, calculate a bunch of stats for those signals. All the best Julia and C++ answers used SIMD.


> Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.

It would be fun if you could share a similar pipeline problem to your take-home (I know you can't share what's in your interview). I started off in scientific Python in 2003 and like noodling around with new programming languages, and it's great to have challenges like this to work through. I enjoyed the 1BRC problem in 2024.


The closest publicly available problem I can think of is the 1 billion rows challenge. It's got a bigger dataset, but with somewhat simpler statistics – though the core engineering challenges are very similar.

https://github.com/gunnarmorling/1brc


The C++ devs at your firm must be absolutely terrible if a newcomer using a scripting language can write faster software, or you are not telling the whole story. All of NumPy, Julia, MATLAB, R, and similar domain-specific, user-friendly libraries and platforms use BLAS and LAPACK for numerical calculations under the hood with some overhead depending on the implementation, so a reasonably optimized native implementation should always be faster. By the looks of it the C++ code wasn't compiled with -O3 if it can be trivially beaten by Julia.


Are you aware that Julia is a compiled language with a heavy focus on performance? It is not in the same category as NumPy/MATLAB/R


Julia is not a scripting language and can match C performance on many tasks.


I'm pretty enthusiastic about LLMs and use them on my 8 year old codebase with ~500kloc. I work at a hedge fund and can trace most of my work to dollars.


For what it's worth, I've been using a fairly minimal setup (24 lines of CLAUDE.md, no MCPs, skills, or custom slash commands) since 3.7 and I've only noticed Claude Code getting significantly better on each model release.


Share it!


I posted my CLAUDE.md here, there's really not much to it: https://news.ycombinator.com/item?id=46262674


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: