More

SatvikBeri · 2026-01-20T15:58:26 1768924706

I think that article was basically wrong. They asked the agent not to provide any commentary, then gave an unsolvable task, and wanted the agent to state that the task was impossible. So they were basically testing which instructions the agent would refuse to follow.

Purely anecdotally, I've found agents have gotten much better at asking clarifying questions, stating that two requirements are incompatible and asking which one to change, and so on.

https://spectrum.ieee.org/ai-coding-degrades

SatvikBeri · 2026-01-20T15:55:28 1768924528

Sure, here are my own examples:

* I came up with a list of 9 performance improvement ideas for an expensive pipeline. Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all. It made prototypes that had bad code quality but tested the core ideas. One approach cut the time down by 50%, I rewrote it with better code and it's saved about $6,000/month for my company.

* My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

* When AWS RDS costs spiked one month, I set Claude Code to investigate and it found the reason was a misconfigured backup setting

* I'll use Claude to throw together a bunch of visualizations for some data to help me investigate

* I'll often give Claude the type signature for a function, and ask it to write the function. It generally gets this about 85% right

sauwan · 2026-01-20T20:14:39 1768940079

>My wife and I had a really complicated spreadsheet for tracking how much we owed our babysitter – it was just complex enough to not really fit into a spreadsheet easily. I vibecoded a command line tool that's made it a lot easier.

Ok, please help me understand. Or is this more of a nanny?

SatvikBeri · 2026-01-21T01:27:58 1768958878

Not technically a nanny, but not dissimilar. In this case, they do several types of work (house cleaning, watching 1-3 kids, daytime and overnights, taking kids out.) They are very competent – by far the best we've found in 3 years – and charge different rates for the different types of work. We also need to track mileage etc. for reimbursement.

They had a spreadsheet for tracking but I found it moderately annoying – it was taking 5-10 minutes a week, so normally I wouldn't have bothered to write a different tool, but with vibe coding it was fairly trivial.

abrookewood · 2026-01-21T02:06:26 1768961186

How did you give Clause access to AWS?

mickeyr · 2026-01-21T02:22:18 1768962138

It does ok with using the AWS cli

SatvikBeri · 2026-01-21T02:40:26 1768963226

Just awscli

mrdependable · 2026-01-20T17:44:51 1768931091

Why is your babysitting bill so complicated?

SatvikBeri · 2026-01-20T18:14:55 1768932895

There are several different types of work they can do, each one of which has a different hourly rate. The time of day affects the rate as well, and so can things like overtime.

It's definitely a bit of an unusual situation. It's not extremely complicated, but it was enough to be annoying.

whackernews · 2026-01-20T23:33:11 1768951991

Jesus, are you ok? Can’t you just, like, give em a 20 when you get home?

I find it quite funny you’ve invented this overly complex payment structure for your babysitter and then find it annoying. Now you’ve got a CLI tool for it.

mcpeepants · 2026-01-21T00:09:53 1768954193

why assume the billing model is being imposed by the customer rather than the service provider?

irlnanny · 2026-01-21T00:23:47 1768955027

GP has provided an anecdote with no supporting evidence, nor any code examples. So it is as fair to assume the story is a fabrication as much as it is to assume it has any truth to it

SatvikBeri · 2026-01-21T01:29:48 1768958988

I am really shocked at the response this trivial anecdote has gotten.

I could state it much more generically: we had an annoying Excel sheet that took ~10 minutes a week, I vibe coded a command line tool that brought it down to ~1 minute a week. I don't think this is unusual or hard to believe in any way.

garciasn · 2026-01-21T01:24:49 1768958689

Yes! You should absolutely always assume a random stranger on HN is outright lying about a trivial anecdote to farm meaningless karma.

fn-mote · 2026-01-21T01:57:00 1768960620

Or instigating conflict?

SatvikBeri · 2026-01-21T02:13:26 1768961606

What...what conflict do you think I'm instigating, exactly? Whether the command line is a better interface than Excel?

SatvikBeri · 2026-01-21T01:22:11 1768958531

I didn't choose the payment structure, and the point is that a CLI is not a high bar. Something that we used to spend ~10 minutes a week on with spreadsheets is now ~1 minute/week.

mcphage · 2026-01-21T03:27:48 1768966068

Why didn’t you work out a more manageable billing structure with them?! Or to put it another way: if it took you 10 minutes a week with spreadsheets to even figure out what their bill is, how on earth did they verify your invoices were even correct? And if they couldn’t—or if it took more than 10 minutes each week—why wouldn’t they prefer a billing system they could verify they were being paid correctly?

jryle70 · 2026-01-21T05:42:46 1768974166

Jesus! is this HN or personal finance forum? Who cares why they do it a certain way. Did they ask for your advices?

mdavid626 · 2026-01-21T06:50:04 1768978204

If you work like this in a company, you’ll end up with overcomplicated mess.

Now, people with Claude Code, are ready to produce a big pile of shit in a short time.

mcphage · 2026-01-21T18:07:52 1769018872

I’m not trying to give advice, I’m just curious about their arrangement. When I did consulting, I hated billing, and would have wanted a system that was as easy as possible.

mdavid626 · 2026-01-21T06:46:54 1768978014

Are you serious?

“Most of these were really boring and tedious to implement (basically a lot of special cases) and I wasn't sure which would work, so I had Claude try them all.”

I doubt you verified the boring edge cases.

SatvikBeri · 2026-01-21T13:28:56 1769002136

I mean, as I said, I literally had Claude prototype them, and then I rewrote the working one from scratch. I didn't commit any of the code written by Claude.

SatvikBeri · 2026-01-05T17:14:26 1767633266

I usually use one instance, sometimes two. But this is a reasonable account of Chris Rackaukas using 32 instances at a time to do boilerplate maintenance across a bunch of open source repositories: https://www.stochasticlifestyle.com/claude-code-in-scientifi...

> I have had to spend like 4am-10am every morning Sunday through Saturday for the last 10 years on this stuff before the day gets started just to keep up on the “simple stuff” for the hundreds of repos I maintain. And this neverending chunk of “meh” stuff is exactly what it seems fit to do. So now I just let the 32 bots run wild on it and get straight to the real work, and it’s a gamechanger.

SatvikBeri · 2025-12-26T21:30:17 1766784617

> do wonder how hard it'd be to mask the default "marketing copywriter" tone of the LLM by asking it to assume some other tone in your prompt.

Fairly easy, in my wife's experience. She repeatedly got accused of using chatgpt in her original writing (she's not a native english speaker, and was taught to use many of the same idioms that LLMs use) until she started actually using chatgpt with about two pages of instructions for tone to "humanize" her writing. The irony is staggering.

SatvikBeri · 2025-12-26T21:26:46 1766784406

Setting up a new dev instance took 2+ hours with pip at my work. Switching to uv dropped the Python portion down to <1 minute, and the overall setup to 20 minutes.

A similar, but less drastic speedup applied to docker images.

SatvikBeri · 2025-12-20T19:58:49 1766260729

I handle our company's RDS instances, and probably spend closer to 2 hours a year than 2 hours a month over the last 8 years.

It's definitely expensive, but it's not time-consuming.

prisenco · 2025-12-21T00:24:10 1766276650

Of course. But people also have high uptime servers with long-running processes they barely touch.

SatvikBeri · 2025-12-20T18:33:33 1766255613

That's why I give the LLM a readonly connection

wahnfrieden · 2025-12-20T21:33:29 1766266409

This is much better than MCP, which also stuffs every session's precious context with potentially irrelevant instructions.

kristo · 2025-12-20T23:37:45 1766273865

They could just make mcps dynamically loaded in the same way no?

wahnfrieden · 2025-12-21T00:56:10 1766278570

It is still worse as it consumes more context giving instructions for custom tooling whereas the LLM already understands how to connect to and query a read-only SQL service with standard tools

SatvikBeri · 2025-12-15T21:11:15 1765833075

> Python is sometimes slower (hot loops), but for that you have Numba

This is a huge understatement. At the hedge fund I work at, I learned Julia by porting a heavily optimized Python pipeline. Hundreds of hours had gone into the Python version – it was essentially entirely glue code over C.

In about two weeks of learning Julia, I ported the pipeline and got it 14x faster. This was worth multiple senior FTE salaries. With the same amount of effort, my coworkers – who are much better engineers than I am – had not managed to get any significant part of the pipeline onto Numba.

> And if something is truly performance critical, it should be written or rewritten in C++ anyway.

Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.

sbrother · 2025-12-16T05:46:18 1765863978

> People usually use C++ or Julia. All of the fastest answers are in Julia

That's surprising to me and piques my interest. What sort of pipeline is this that's faster in Julia than C++? Does Julia automatically use something like SIMD or other array magic that C++ doesn't?

jakobnissen · 2025-12-16T07:25:58 1765869958

I use Rust instead of C++, but I also see my Julia code being faster than my Rust code.

In my view, it's not that Julia itself is faster than Rust - on the contrary, Rust as a language is faster than Julia. However, Julia's prototyping, iteration speed, benchmarking, profiling and observability is better. By the time I would have written the first working Rust version, I would have written it in Julia, profiled it, maybe changed part of the algorithm, and optimised it. Also, Julia makes more heavy use of generics than Rust, which often leads to better code specialization.

There are some ways in which Julia produces better machine code that Rust, but they're usually not decisive, and there are more ways in which Rust produces better machine code than Julia. Also, the performance ceiling for Rust is better because Rust allows you to do more advanced, low level optimisations than Julia.

SatvikBeri · 2025-12-16T17:07:07 1765904827

This is pretty much it – when we had follow up interviews with the C++ devs, they had usually only had time to try one or two high-level approaches, and then do a bit of profiling & iteration. The Julia devs had time to try several approaches and do much more detailed profiling.

adgjlsfhk1 · 2025-12-16T06:36:18 1765866978

The main thing is just that Julia has a standard library that works with you rather than working against you. The built in sort will use radix sort where appropriate and a highly optimized quicksort otherwise. You get built in matrices and higher dimensional arrays with optimized BLAS/LaPack configured for you (and CSC+structured sparse matrices). You get complex and rational numbers, and a calling convention (pass by sharing) which is the fast one by default 90% of the time instead of being slow (copying) 90% of the time. You have a built in package manager that doesn't require special configuration, that also lets you install GPU libraries that make it trivial to run generic code on all sorts of accelerators.

Everything you can do in Julia you can do in C++, but lots of projects that would take a week in C++ can be done in an hour in Julia.

SatvikBeri · 2025-12-16T16:40:26 1765903226

To be clear, the fastest theoretically possible C++ is probably faster than the fastest theoretically possible Julia. But the fastest C++ that Senior Data Engineer candidates would write in ~2 hours was slower than the fastest Julia (though still pretty fast! The benchmark for this problem was 10ms, and the fastest C++ answer was 3 ms, and the top two Julia answers were 2.3ms and .21ms)

The pipeline was pretty heavily focused on mathematical calculations – something like, given a large set of trading signals, calculate a bunch of stats for those signals. All the best Julia and C++ answers used SIMD.

pbowyer · 2025-12-16T09:16:52 1765876612

> Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.

It would be fun if you could share a similar pipeline problem to your take-home (I know you can't share what's in your interview). I started off in scientific Python in 2003 and like noodling around with new programming languages, and it's great to have challenges like this to work through. I enjoyed the 1BRC problem in 2024.

SatvikBeri · 2025-12-16T17:05:58 1765904758

The closest publicly available problem I can think of is the 1 billion rows challenge. It's got a bigger dataset, but with somewhat simpler statistics – though the core engineering challenges are very similar.

https://github.com/gunnarmorling/1brc

drnick1 · 2025-12-16T05:53:57 1765864437

The C++ devs at your firm must be absolutely terrible if a newcomer using a scripting language can write faster software, or you are not telling the whole story. All of NumPy, Julia, MATLAB, R, and similar domain-specific, user-friendly libraries and platforms use BLAS and LAPACK for numerical calculations under the hood with some overhead depending on the implementation, so a reasonably optimized native implementation should always be faster. By the looks of it the C++ code wasn't compiled with -O3 if it can be trivially beaten by Julia.

SatvikBeri · 2025-12-16T16:40:59 1765903259

Are you aware that Julia is a compiled language with a heavy focus on performance? It is not in the same category as NumPy/MATLAB/R

postflopclarity · 2025-12-16T22:16:07 1765923367

Julia is not a scripting language and can match C performance on many tasks.

SatvikBeri · 2025-12-14T17:27:31 1765733251

I'm pretty enthusiastic about LLMs and use them on my 8 year old codebase with ~500kloc. I work at a hedge fund and can trace most of my work to dollars.

SatvikBeri · 2025-12-14T16:50:35 1765731035

For what it's worth, I've been using a fairly minimal setup (24 lines of CLAUDE.md, no MCPs, skills, or custom slash commands) since 3.7 and I've only noticed Claude Code getting significantly better on each model release.

fuckinpuppers · 2025-12-16T06:34:01 1765866841

Share it!

SatvikBeri · 2025-12-16T16:41:38 1765903298

I posted my CLAUDE.md here, there's really not much to it: https://news.ycombinator.com/item?id=46262674