Agree! And this is why it is a bad idea IMHO for agents to sit at the abstraction layer of browser or below (OS). Even at the browser-addon level it's dangerous. It runs with the user’s authority across contexts and erodes zero-trust by becoming a confused deputy: https://en.wikipedia.org/wiki/Confused_deputy_problem
Yeh I don't think there's much value in a credo if it celebrates Altman. He's a terrible idol to have. He compared Trump to Hitler in 2016, then donated $1M to his inauguration and tweeted about being in an "NPC trap" when he criticized him. Took about six weeks after the election to flip. Testified to Congress that AI regulation is "essential," then lobbied against California's safety bill when it actually showed up. His own board fired him for lying to them for years. His safety team leads quit in protest saying safety took a backseat to shiny products. Multiple former colleagues, including the people who left to start Anthropic, describe psychological abuse and manipulation. Claims a $65k salary while sitting on a billion-dollar fortune built through conflicts of interest. He's not a good guy. He's a guy who says whatever serves him in the moment and has left a trail of people warning us about exactly that.
The incumbants Goodreads and their owner Amazon have indeed done such a poor job at this. Seven years ago I tried creating a basic graph using collaborative-filtering (effectively using our actual reading patterns as the embeddings space instead of semantics [human X likes book Y so likers of Y might like other things that human X has enjoyed]). It works well to this day (ablf.io) but the codebase is so ugly I've not had the bravery to update its data in a couple of years.
Yes imo this is very useful but there's not a clear industry standard on how to do so yet, which I imagine will change? Tell me if i'm missing something
I think it's become a bit of a cliche/clique'y thing amongst a certain population. I don't know its origins (tumblr emo crowd??) but I first encountered it in Silicon Valley. The Collison brothers used to love doing it, as did Altman. I feel it projects a kind of stream-of-thought with an aloofness, like "i dont care enough for correct form. language bends to my unique thoughts. read this if you like, i dont care lol".
All-lowercase comes accross as the text equivalent of a hoodie and jeans: comfortable, a bit defensive against being seen as trying too hard, and now so common it barely reads as rebellion.
As I understand it the root was people using the iPhone with autocorrect turned off. That’s how someone from the tumblr emo crowd (where it was definitely prevalent!) explained it to me, and the reason was because there was a lot of culture specific terminology used (including deliberate misspellings of words) that was difficult if autocorrect was switched on.
By extension you can see how that could also apply to tech.
The prompts aren't the key to the attack, though. They were able to get around guardrails with task decomposition.
There is no way for the AI system to verify whether you are white hat or black hat when you are doing pen-testing if the only task is to pen-test. Since this is not part of a "broader attack" (in the context), there is no "threat".
I don't see how this can be avoided, given that there are legitime uses to every step of this in creating defenses to novel attacks.
Yes, all of this can be done with code and humans as well - but it is the scale and the speed that becomes problematic. It can adjust in real-time to individual targets and does not need as much human intervention / tailoring.
Is this obvious? Yes - but it seems they are trying to raise awareness of an actual use of this in the wild and get people discussing it.
I agree that there will be no single call or inference that presents malice. But I feel like they could still share general patterns of orchestration (latencies, concurrencies, general cadences and parallelization of attacks, prompts used to granulaize work, whether prompts themselves have been generated in previous calls to Claude). There's a bunch of more specific telltales they could have alluded to. I think it's likely they're being obscure because they don't want to empower bad actors, but that's not really how the cybersecurity industry likes to operates. Maybe Anthropic believes this entire AI thing is a brand new security regime and so believe existing resiliences are moot. That we should all follow blindly as they lead the fight. Their narrative is confusing. Are they being actually transparent or transparency-"coded"?
I agree so much with this. And am so sick of AI labs, who genuinely do have access to some really great engineers, putting stuff out that just doesn't pass the smell test. GPT-5's system card was pathetic. Big-talk of Microsoft doing red-teaming in ill-specified ways, entirely unreproducable. All the labs are "pro-research" but they again-and-again release whitepapers and pump headlines without producing the code and data alongside their claims. This just feeds into the shill-cycle of journalists doing 'research' and finding 'shocking thing AI told me today' and somehow being immune to the normal expectations of burden-of-proof.
Microsoft’s quantum lab also made ridiculous claims this year, with no updates or retractions after they were mocked by the community and some even claimed fraud
Yeh I still don't think there's a fixed definition of what a world model is or in what modality it will emerge. I'm unconvinced it will emerge as a satisfying 3d game-like first-person walkthrough.
reply