More

Davidzheng · 2026-01-01T03:34:57 1767238497

Robotics is coming FAST. Faster than LLM progress in my opinion.

wh0knows · 2026-01-01T03:46:34 1767239194

Curious if you have any links about the rapid progression of robotics (as someone who is not educated on the topic).

It was my feeling with robotics that the more challenging aspect will be making them economically viable rather than simply the challenge of the task itself.

beardedwizard · 2026-01-01T16:22:27 1767284547

I mentioned military in my reply to the sibling comment - that is the most ready example. What anduril and others are doing today may be sloppy, but it's moving very quickly.

throw1235435 · 2026-01-01T10:23:20 1767263000

The question is how rapid the adoption is. The price of failure in the real world is much higher ($$$, environmental, physical risks) vs just "rebuild/regenerate" in the digital realm.

beardedwizard · 2026-01-01T16:21:22 1767284482

Military adoption is probably a decent proxy indicator - and they are ready to hand the kill switch to autonomous robots

throw1235435 · 2026-01-01T20:15:00 1767298500

Maybe. There the cost of failure again is low. Its easier to destroy than to create. Economic disruption to workers will take a bit longer I think.

Don't get me wrong; I hope that we do see it in physical work as well. There is more value to society there; and consists of work that is risky and/or hard to do - and is usually needed (food, shelter, etc). It also means that the disruption is an "everyone" problem rather than something that just affects those "intellectual" types.

Davidzheng · 2026-01-01T03:34:08 1767238448

It's irrational to genuinely hold false beliefs about capabilities of LLMs. But at this point I assume around half of the skeptics are emotionally motivated anyway.

jdhsgsvsbzbd · 2026-01-01T05:18:18 1767244698

As opposed to having skin in the game for llms and are blind to their flaws???

I'd assume that around half of the optimists are emotionally motivated this way.

Davidzheng · 2025-12-31T03:27:19 1767151639

There are other avenues of income. You can invade other industries which are slow on AI uptake and build an AI-from-ground competitor with large advantages over peers. There are hints of this (not AI-from-ground but with more AI) with deepmind's drug research labs. But this can be a huge source of income. You can kill entire industries which inevitably cannot incorporate AI as fast as AI companies can internally.

Davidzheng · 2025-12-31T03:20:50 1767151250

Um meta didn't achieve the same results yet. And does it matter if they can all achieve the same results if they all manage high enough payoffs? I think subscription based income is only the beginning. Next stage is AI-based subcompanies encroaching on other industries (e.g. deepmind's drug company)

Davidzheng · 2025-12-30T03:57:21 1767067041

Wait.

Davidzheng · 2025-12-21T09:01:02 1766307662

Big error bars and METR people are saying the longer end of the benchmark are less accurate right now. I think they mean this is a lower bound!

scellus · 2025-12-21T09:44:00 1766310240

It's complicated. Opus 4.5 is actually not that good at the 80% threshold but is above others at 50% threshold of completion. I read there's a single task around 16h that the model completed, and the broad CI comes from that.

METR currently simply runs out of tasks at 10-20h, and as a result you have a small N and lots of uncertainty there. (They fit a logistic to the discrete 0/1 results to get the thresholds you see in the graph.) They need new tasks, then we'll know better.

JohnnyMarcone · 2025-12-21T22:40:08 1766356808

Thanks for this comment. I've been trying to find anything about the huge error bars. Do you have any sources you can share for further reading?

Davidzheng · 2025-12-21T08:59:13 1766307553

The text continues "with current AI tools" which is not clearly defined to me (does it mean current Gen + scaffold? Anything which is llm reasoning model? Anything built with a large llm inside? ). In any case, the title is misleading for not containing the end of the sentence. Please can we fix the title?

Davidzheng · 2025-12-21T08:59:52 1766307592

Also i think the main source of interest is because it is said by Terry, so that should be in the title too.

Davidzheng · 2025-12-18T14:55:04 1766069704

I think there are two separate things. Slowness of progress in research is good bc it signals high value/difficulty. This I wholeheartedly agree. The other is, the slowness of solving a given problem is good, which is less clear.

I think indubitably intelligence should be linked to speed. If you can since everything faster I think smarter is a correct label. What I also think is true is that slowness can be a virtue in solving problems for a person and as a strategy. But this is usually because fast strategies rely on priors/assumptions and ideas which generalize poorly; and often more general and asymptotically faster algorithms are slower when tested on a limited set or on a difficulty level which is too low

lachlan_gray · 2025-12-18T15:40:13 1766072413

I think part of the message is that speed isn't a free lunch. If an intelligence can solve "legible" problems quickly, it's symptomatic of a specific adaption for identifying short paths.

So when you factor speed into tests, you're systematically filtering for intelligences that are biased to avoid novelty. Then if someone is slow to solve the same problems, it's actually a signal that they have the opposite bias, to consider more paths.

IMO the thing being measured by intelligence tests is something closer to "power" or "competitive advantage".

Jensson · 2025-12-18T15:54:34 1766073274

> Then if someone is slow to solve the same problems, it's actually a signal that they have the opposite bias, to consider more paths.

No this isn't true, most of the time they just don't consider any paths at all and are just dumb.

And the bias towards novelty doesn't make you slow, ADHD is biased towards novelty and people wouldn't call those slow.

lachlan_gray · 2025-12-18T19:52:52 1766087572

What I meant is, assuming that they do find solutions. If they're not doing anything of course that's different.

In the article, "speed" is about reaching specific answers in a specific window of time, the bane of ADHD.

DrewADesign · 2025-12-18T15:01:13 1766070073

I haven’t looked into the source study, so who knows if it’s good, but I recall this article about smart people taking longer to provide answers to hard problems because they take more into consideration, but are much more likely to be correct.

https://bigthink.com/neuropsych/intelligent-people-slower-so...

Davidzheng · 2025-12-18T02:07:22 1766023642

on aistudio the free tier limits on all models are decent

mark_l_watson · 2025-12-18T15:29:28 1766071768

I turned on API billing on API Studio in the hope of getting the best possible service. As long as you are not using the Gemini thinking and research APIs for long-running computations, the APIs are very inexpensive to use.

Davidzheng · 2025-12-17T17:50:19 1765993819

what if the lie is a logical deduction error not a fact retrieval error

rat9988 · 2025-12-17T18:14:45 1765995285

The error rate would still be improved overall and might make it a viable tool for the price depending on the usecase.