Curious if you have any links about the rapid progression of robotics (as someone who is not educated on the topic).
It was my feeling with robotics that the more challenging aspect will be making them economically viable rather than simply the challenge of the task itself.
I mentioned military in my reply to the sibling comment - that is the most ready example. What anduril and others are doing today may be sloppy, but it's moving very quickly.
The question is how rapid the adoption is. The price of failure in the real world is much higher ($$$, environmental, physical risks) vs just "rebuild/regenerate" in the digital realm.
Maybe. There the cost of failure again is low. Its easier to destroy than to create. Economic disruption to workers will take a bit longer I think.
Don't get me wrong; I hope that we do see it in physical work as well. There is more value to society there; and consists of work that is risky and/or hard to do - and is usually needed (food, shelter, etc). It also means that the disruption is an "everyone" problem rather than something that just affects those "intellectual" types.
It's irrational to genuinely hold false beliefs about capabilities of LLMs. But at this point I assume around half of the skeptics are emotionally motivated anyway.
There are other avenues of income. You can invade other industries which are slow on AI uptake and build an AI-from-ground competitor with large advantages over peers. There are hints of this (not AI-from-ground but with more AI) with deepmind's drug research labs. But this can be a huge source of income. You can kill entire industries which inevitably cannot incorporate AI as fast as AI companies can internally.
Um meta didn't achieve the same results yet. And does it matter if they can all achieve the same results if they all manage high enough payoffs? I think subscription based income is only the beginning. Next stage is AI-based subcompanies encroaching on other industries (e.g. deepmind's drug company)
It's complicated. Opus 4.5 is actually not that good at the 80% threshold but is above others at 50% threshold of completion. I read there's a single task around 16h that the model completed, and the broad CI comes from that.
METR currently simply runs out of tasks at 10-20h, and as a result you have a small N and lots of uncertainty there. (They fit a logistic to the discrete 0/1 results to get the thresholds you see in the graph.) They need new tasks, then we'll know better.
The text continues "with current AI tools" which is not clearly defined to me (does it mean current Gen + scaffold? Anything which is llm reasoning model? Anything built with a large llm inside? ). In any case, the title is misleading for not containing the end of the sentence. Please can we fix the title?
I think there are two separate things. Slowness of progress in research is good bc it signals high value/difficulty. This I wholeheartedly agree. The other is, the slowness of solving a given problem is good, which is less clear.
I think indubitably intelligence should be linked to speed. If you can since everything faster I think smarter is a correct label. What I also think is true is that slowness can be a virtue in solving problems for a person and as a strategy. But this is usually because fast strategies rely on priors/assumptions and ideas which generalize poorly; and often more general and asymptotically faster algorithms are slower when tested on a limited set or on a difficulty level which is too low
I think part of the message is that speed isn't a free lunch. If an intelligence can solve "legible" problems quickly, it's symptomatic of a specific adaption for identifying short paths.
So when you factor speed into tests, you're systematically filtering for intelligences that are biased to avoid novelty. Then if someone is slow to solve the same problems, it's actually a signal that they have the opposite bias, to consider more paths.
IMO the thing being measured by intelligence tests is something closer to "power" or "competitive advantage".
I haven’t looked into the source study, so who knows if it’s good, but I recall this article about smart people taking longer to provide answers to hard problems because they take more into consideration, but are much more likely to be correct.
I turned on API billing on API Studio in the hope of getting the best possible service. As long as you are not using the Gemini thinking and research APIs for long-running computations, the APIs are very inexpensive to use.
reply