More

aoeusnth1 · 2026-01-01T04:00:12 1767240012

We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?

llmslave2 · 2026-01-01T04:40:17 1767242417

Pretty neat how this exponential progress hasn't resulted in exponential productivity. Perhaps you could explain your perspective on that?

viraptor · 2026-01-01T04:55:46 1767243346

Writing the code itself was never the main bottleneck. Designing the bigger solution, figuring out tradeoffs, taking to affected teams, etc. takes as much time as it used to. But still, there's definitely a significant improvement in code production part in many areas.

mgfist · 2026-01-01T05:31:27 1767245487

Because that requires adoption. Devs on hackernews are already the most up to date folks in the industry and even here adoption of LLMs is incredibly slow. And a lot of the adoption that does happen is still with older tech like ChatGPT or Cursor.

belmont_sup · 2026-01-01T08:28:59 1767256139

What’s the newer tech?

TeodorDyakov · 2026-01-01T10:50:34 1767264634

Claude Code With Opus 4.5

HPMOR · 2026-01-01T05:11:46 1767244306

I think this is an open question still and very interesting. Ilya discussed this on the Dwarkesh podcast. But the capabilities of LLMs is clearly exponential and perhaps super exponential. We went from something that could string together incoherent text in 2022 to general models helping people like Terrance Tao and Scott Aaronson write new research papers. LLMs also beat IMO and the ICPC. We have entered the John Henry era for intellectual tasks...

tsimionescu · 2026-01-01T09:25:13 1767259513

> LLMs also beat IMO and the ICPC

Very spurious claims, given that there was no effort made to check whether the IMO or ICPC problems were in the training set or not, or to quantify how far problems in the training set were from the contest problems. IMO problems are supposed to be unique, but since it's not at the frontier of math research, there is no guarantee that the same problem, or something very similar, was not solved in some obscure manual.

llmslave2 · 2026-01-01T05:30:00 1767245400

> But the capabilities of LLMs is clearly exponential and perhaps super exponential

By what metric?

utopiah · 2026-01-01T08:22:39 1767255759

BS metric... /s

barrenko · 2026-01-01T12:12:25 1767269545

Sir, we're in a modern economy, we don't ever ever look at productivity graphs (this is not to disparage LLMs, just a comment on productivity in general)

aoeusnth1 · 2026-01-01T05:09:52 1767244192

It has! CLs/engineer increased by 10% this year.

LLMs from late 2024 were nearly worthless as coding agents, so given they have quadrupled in capability since then (exponential growth, btw), it's not surprising to see a modestly positive impact on SWE work.

Also, I'm noticing you're not explaining yourself :)

surajrmal · 2026-01-01T08:17:49 1767255469

I think this is happening by raising the floor for job roles which are largely boilerplate work. If you are on the more skilled side or work in more original/ niche areas, AI doesn't really help too much. I've only been able to use AI effectively for scaling refactors, not really much in feature development. It often just slows me down when I try to use it. I don't see this changing any time soon.

llmslave2 · 2026-01-01T05:33:11 1767245591

Hey, I'm not the OG commentator, why do I have to explain myself! :)

When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

lopatin · 2026-01-01T07:17:53 1767251873

> Hey, I'm not the OG commentator, why do I have to explain myself! :)

The issue is that you're not acknowledging or replying to people's explanations for _why_ they see this as exponential growth. It's almost as if you skimmed through the meat of the comment and then just re-phrased your original idea.

> When Fernando Alonso (best rookie btw) goes from 0-60 in 2.4 seconds in his Aston Martin, is it reasonable to assume he will near the speed of light in 20 seconds?

This comparison doesn't make sense because we know the limits of cars but we don't yet know the limits of LLMs. It's an open question. Whether or not an F1 engine can make it the speed of light in 20 seconds is not an open question.

llmslave2 · 2026-01-01T08:26:58 1767256018

It's not in me to somehow disprove claims of exponential growth when there isn't even evidence provided of it.

My point with the F1 comparison is to say that a short period of rapid improvement doesn't imply exponential growth and it's about as weird to expect that as it is for an f1 car to reach the speed of light. It's possible you know, the regulations are changing for next season - if Leclerc sets a new lap record in Australia by .1 ms we can just assume exponential improvements and surely Ferrari will be lapping the rest of the field by the summer right?

Madmallard · 2026-01-01T06:51:18 1767250278

LLMs a year ago were more able to do a complex project I've repeatedly tried to do than they are now.

scotty79 · 2026-01-01T07:36:43 1767253003

Try Antigravity with Gemini 3 Pro. Seems very capable to me.

scotty79 · 2026-01-01T07:34:17 1767252857

How long before introduction of computers lead to increases in average productivity? How long for the internet? Business is just slow to figure out how to use anything for its benefit, but it eventually gets there.

spectralista · 2026-01-01T10:56:26 1767264986

The best example is that even ATM machines didn't reduce bank teller jobs.

Why? Because even the bank teller is doing more than taking and depositing money.

IMO there is an ontological bias that pervades our modern society that confuses the map for the territory and has a highly distorted view of human existence through the lens of engineering.

We don't see anything in this time series, because this time series itself is meaningless nonsense that reflects exactly this special kind of ontological stupidity:

https://fred.stlouisfed.org/series/PRS85006092

As if the sum of human interaction in an economy is some kind of machine that we just need to engineer better parts for and then sum the outputs.

Any non-careerist, thinking person that studies economics would conclude we don't and will probably not have the tools to properly study this subject in our lifetimes. The high dimensional interaction of biology, entropy and time. We have nothing. The career economist is essentially forced to sing for their supper in a type of time series theater. Then there is the method acting of pretending to be surprised when some meaningless reductionist aspect of human interaction isn't reflected in the fake time series.

fmbb · 2026-01-01T08:44:03 1767257043

> How long before introduction of computers lead to increases in average productivity?

I think it never did. Still has not.

https://en.wikipedia.org/wiki/Productivity_paradox

aoeusnth1 · 2026-01-01T01:44:05 1767231845

Is there zero skill in managing agents?

aoeusnth1 · 2025-12-30T00:02:37 1767052957

Is that really what he's saying here?

He's not against the technology, I think he's just feeling like there's a lot of potential that he's not quite grasping yet.

BearOso · 2025-12-30T00:10:19 1767053419

This guy is one of the top names in AI. This is pure propaganda written to instill "fear of missing out" and encouraging people to buy into his platform, lest they become "obsolete."

PaulHoule · 2025-12-30T00:46:54 1767055614

It’s a little shocking to me that this sentiment hasn’t floated higher in the discussion. Regardless of how he feels, this is the way he wants you to feel.

Big picture it’s about emotional intelligence and if you are losing your shit you’re going to flail around. I think you should pick up some near-frontier tools and use them to improve your usual process, always keeping your feet on the ground. “Vibe coding” was always about getting you and keeping you over your head. Resist it!

gsf_emergency_6 · 2025-12-30T01:24:17 1767057857

vive vibe live or it doesnt matter?

Maybe Devs should handle copilots as Swiss prana-bindu their shots

(Therefore gun laws at a longer timescale)

Of course we have to ask aeb if he has ever run into someone who trips (only, of course) while hunting ;) have you?

aebtebeten · 2025-12-30T10:44:48 1767091488

the french on the good hunter^W vibe coder vs the bad vibe coder: https://www.youtube.com/watch?v=QuGcoOJKXT8

given that the 3 hares seem to currently lack a signification, I'd be up for squatting? Or would Paul prefer 3 fennecs? Should anyone wish to oppose us, as Bigwig said: "silflay hraka, u embleer rah"

a slightly more pragmatic story for shunya as better mousetrap: just as we now routinely have our calculations done for us in binary, but record results in decimal (in PDF invoices, say), ancient romans (among other cultures) would have someone do their calculations on a counting https://en.wikipedia.org/wiki/Counting_board board, but recorded (only the non-zero) results in roman numerals.

(these days we can spot the algebraists via a sibboleth: they start their papers and books with section/chapter 0)

> « Les hommes sont comme les chiffres : ils n'acquièrent de valeur que par leur position. » —NB

gsf_emergency_6 · 2025-12-31T03:56:07 1767153367

How we seem to be doing:

https://www.neatorama.com/2012/05/18/10-facts-you-might-not-...

Re boney quote, that's one heuristic for HN mods

TIL Mozilla would have done better channelling the Finnic fennec (Vs rebranding "pinko"). Globe-wrappin Oxygen Auroras it wasn't.

Haploid fox

https://en.wikipedia.org/wiki/Inari_%C5%8Ckami#:~:text=The%2...

weregiraffe · 2025-12-30T06:30:06 1767076206

Are you grok, or having a stroke?

MarcelOlsz · 2025-12-30T07:20:00 1767079200

I understood perfectly what he's saying, but then again schizo is a language I speak fluently. Are you having a stroke?

PaulHoule · 2025-12-30T18:18:24 1767118704

To be fair I did have just a touch of thought disorder which led me to write "vive" instead of "vibe" and I did correct it when it was pointed out without explaining it which made that comment seem even weirder than it originally was.

Izkata · 2025-12-30T20:09:11 1767125351

I actually read their comment as "vibe vibe live" which combined with the unknown terms in the next line (a reference to Dune combined with something else, I guess?) made GGP's question fit quite well.

8note · 2025-12-30T04:44:41 1767069881

on the other hand, it does currently feel like when angular and react were starting to come out, and there was a billion different javascript libraries to learn with a new one coming out every couple weeks, and you arent quite sure what you should spend your time on and how much, vs now where you just learn react, and maybe extend to next.js

LLM forward development has a lot of things going on, and it really isn't clear yet what is going be the common standard in a few years time in terms of dev ux, async tools, ci/cd tools, in production and offline workflows, etc.

its an easy time to hop down a wrong path picking subpar tools or not experimenting further, but if you just wait, the people who try the right tools are going to be way ahead on making products for their customers.

sailingparrot · 2025-12-30T02:27:46 1767061666

Uncharitable take. His last public stance on this a few months ago when he released nanochat was that he didn’t use coding LLM for it, even though he tried, because they were not good enough and he was just losing time, so coded everything manually. Andrej is already set for life, and has moved into education where most of what he does is released for free.

neilv · 2025-12-30T03:23:17 1767064997

Exactly. I think some of the commenters were unaware of some of the context, and got an entirely different read on the piece.

robotresearcher · 2025-12-30T07:48:55 1767080935

> Is that really what he's saying here?

No it’s absolutely not. But I thought it’d be fun to offer Adams’ brilliant hyperbole for an affectionate ribbing of Karpathy. Both of them are great communicators of ideas.

aoeusnth1 · 2025-12-30T00:00:00 1767052800

This is also what I did. Actually, Claude did it.

aoeusnth1 · 2025-12-29T04:19:11 1766981951

It lets Claude directly type into Codex as if it were the user, or vice versa

zingar · 2025-12-29T12:48:05 1767012485

And you’re finding that it can’t do that without tmux?

aoeusnth1 · 2025-12-28T20:40:43 1766954443

It did come from Claude, though, not Anthropic.

xp84 · 2025-12-29T18:05:45 1767031545

If I were Anthropic I would have some kind of TOS restriction saying that you can't use their trademark to represent what you use their API to enable. It's just inappropriate. Even if you are a full anti-AI activist, it seems clear that the blame for specific things 'Claude' does in response to a deliberate prompt should fall on the person(s) operating it, and as such they shouldn't be allowed to make it appear that this is what Anthropic designed Claude to do.

aoeusnth1 · 2025-12-22T20:02:06 1766433726

You're right, sqrt(E[d^2]) > E[|d|] because of Jensen's inequality. The latter is about 0.8 sqrt(N).

aoeusnth1 · 2025-12-19T02:35:39 1766111739

Can you give an example of some ridiculous comments

aoeusnth1 · 2025-12-18T18:03:49 1766081029

tokens/s don't match, so unlikely

aoeusnth1 · 2025-12-17T16:55:57 1765990557

I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.