Hacker Newsnew | past | comments | ask | show | jobs | submit | zingar's commentslogin

Is this a standard tech ToU item?

Is this them saying that their human developers don’t add much to their product beyond what the AI does for them?


Imagine if Visual Studio said "you can't use VS to build another IDE".

Imagine if Visual Studio said "you can't use VS to build any product or service which may compete with a Microsoft product or service"

Refactoring is a very mechanistic way of turning bad code into good. I don’t see a world in which our tools (LLMs or otherwise) don’t learn this.

Refactorings can be useful in certain cases if the core architecture of the system is sound, but for some very complex systems, the problems can run deeper and a refactoring can be a waste of time. Sometimes you're better off reworking the whole thing because the problem might be in the foundation itself; something about the architecture forces developer's hand in terms of thinking about the problem incorrectly and writing bad code on top.

> I don’t see a world in which our tools (LLMs or otherwise) don’t learn this.

I agree, but maybe for different reasons. Refactoring well is a form of intelligence, and I don't see any upper limit to machine intelligence other than the laws of physics.

> Refactoring is a very mechanistic way of turning bad code into good.

There are some refactoring rules of thumb that can seem mechanistic (by which I mean deterministic based on pretty simple rules), but not all. Neither is refactoring guaranteed to be sufficient to lead to all reasonable definitions of "good software". Sometimes the bar requires breaking compatibility with the previous API / UX. This is why I agree with the sibling comment which draws a distinction between refactoring (changing internal details without changing the outward behavior, typically at a local/granular scale) and reworking (fixing structural problems that go beyond local/incremental improvements).

Claude phrased it this way – "Refactoring operates within a fixed contract. Reworking may change the contract." – which I find to be nice and succinct.


Hang on, is Claude running on your phone/tablet and installing large dev dependencies right there? Or which parts of this stack are you replacing with termux?

Yeah, everything runs on the phone

Didn't vscode have a web browser version you could self host so where is the cursor version, anysphere?

I have lots of non-AI software experience but nothing with AI (apart from using LLMs like everyone else). Also I did an introductory university course in AI 20 years ago that I’ve completely forgotten.

Where do I get to if I go through this material?

Enough to build… what? Or contribute on… ? Enough knowledge to have useful conversations on …? Enough knowledge to understand where to … is useful and why?

Where are the limits, what is it that the AI researchers have that this wouldn’t give?


Strange question. If you don’t know why you need this, you probably don’t. It will be the same as with the introductory AI course you did 20 years ago.

Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.

A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".

https://github.com/yanshengjia/ml-road/blob/master/resources...

Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".


Neural nets were taught in my Uni in the late 90s. They were presented as the AI technique, which was however computationally infeasible at the time. Moreover, it was clearly stated that all supporting ideas were developed and researched 20 years prior, and the field was basically stagnated due to hardware not being there.

I remember reading "neural network" articles back from late 80's, early 90's, which weren't just about ANNs, but also other connectionist approaches like Kohonen's Self-Organizing Maps and Stephen Grossberg's Adaptive Resonance Theory (ART) ... I don't know how your university taught it, but back then this seemed more futuristic brain-related stuff, not a practical "AI" technique.

My introductory course used that exact textbook and I still have it on my shelf :).

It has a chapter or two on NNs and even mentions back propagation in the index, but the majority of the book focuses elsewhere.


Anyone who watches the videos and follows along will indeed come up to speed on the basics of neural nets, at least with respect to MLPs. It's an excellent introduction.

Sure, the basics of neural nets, but it seems just as a foundation leading to LLMs. He doesn't cover the zoo of ANN architectures such as ResNets, RNNs, LSTMs, GANs, diffusion models, etc, and barely touches on regularization, optimization, etc, other than mentioning BatchNorm and promising ADAM in a later video.

It's a useful series of videos no doubt, but his goal is to strip things down to basics and show how an ANN like a Transformer can be built from the ground up without using all the tools/libraries that would actually be used in practice.


I think they meant the result— not the content—would be the same.

Is the package manager a significant amount of time compared to setting up containers, running tests etc? (Genuine question, I’m on holiday and can’t look up real stats for myself right now)

Anecdotally unless I'm doing something really dumb in my Dockerfile (recently I found a recursive `chown` that was taking 20m+ to finish, grr) installing dependencies is longest step of the build. It's also the most failure prone (due to transient network issues).

What is the point that you believe would be demonstrated by a new text editor running at the limit of hardware in a compiled editor? Would that point apply to every other text editor that exists already?

Could you give us an idea of what you’re hoping for that is not possible to derive from training data of the entire internet and many (most?) published books?

This is the problem, the entire internet is a really bad set of training data because it’s extremely polluted.

Also the derived argument doesn’t really hold, just because you know about two things doesn’t mean you’d be able to come up with the third, it’s actually very hard most of the time and requires you to not do next token prediction.


The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data. It can figure the world out just as we can figure it out when we are as well inundated with bullshit data. The pathways exist in the LLM but it won’t necessarily reveal that to you unless you tune it with RL.

> The emergent phenomenon is that the LLM can separate truth from fiction when you give it a massive amount of data.

I don't believe they can. LLMs have no concept of truth.

What's likely is that the "truth" for many subjects is represented way more than fiction and when there is objective truth it's consistently represented in similar way. On the other hand there are many variations of "fiction" for the same subject.


They can and we have definitive proof. When we tune LLM models with reinforcement learning the models end up hallucinating less and becoming more reliable. Basically in a nut shell we reward the model when telling the truth and punish it when it’s not.

So think of it like this, to create the model we use terabytes of data. Then we do RL which is probably less than one percent of additional data involved in the initial training.

The change in the model is that reliability is increased and hallucinations are reduced at a far greater rate than one percent. So much so that modern models can be used for agentic tasks.

How can less than one percent of reinforcement training get the model to tell the truth greater than one percent of the time?

The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this. The LLM in its original state just predicts text but it doesn’t care about truth or the kind of answer you want. With a little bit of reinforcement it suddenly does much better.

It’s not a perfect process and reinforcement learning often causes the model to be deceptive an not necessarily tell the truth but it more gives an answer that may seem like the truth or an answer that the trainer wants to hear. In general though we can measurably see a difference in truthfulness and reliability to an extent far greater than the data involved in training and that is logical proof it knows the difference.

Additionally while I say it knows the truth already this is likely more of a blurry line. Even humans don’t fully know the truth so my claim here is that an LLM knows the truth to a certain extent. It can be wildly off for certain things but in general it knows and this “knowing” has to be coaxed out of the model through RL.

Keep in mind the LLM is just auto trained on reams and reams of data. That training is massive. Reinforcement training is done on a human basis. A human must rate the answers so it is significantly less.


> The answer is obvious. It ALREADY knew the truth. There’s no other logical way to explain this.

I can think of several offhand.

1. The effect was never real, you've just convinced yourself it is because you want it to be, ie you Clever Hans'd yourself.

2. The effect is an artifact of how you measure "truth" and disappears outside that context ("It can be wildly off for certain things")

3. The effect was completely fabricated and is the result of fraud.

If you want to convince me that "I threatened a statistical model with a stick and it somehow got more accurate, therefore it's both intelligent and lying" is true, I need a lot less breathless overcredulity and a lot more "I have actively tried to disprove this result, here's what I found"


You asked for something concrete, so I’ll anchor every claim to either documented results or directly observable training mechanics.

First, the claim that RLHF materially reduces hallucinations and increases factual accuracy is not anecdotal. It shows up quantitatively in benchmarks designed to measure this exact thing, such as TruthfulQA, Natural Questions, and fact verification datasets like FEVER. Base models and RL-tuned models share the same architecture and almost identical weights, yet the RL-tuned versions score substantially higher. These benchmarks are external to the reward model and can be run independently.

Second, the reinforcement signal itself does not contain factual information. This is a property of how RLHF works. Human raters provide preference comparisons or scores, and the reward model outputs a single scalar. There are no facts, explanations, or world models being injected. From an information perspective, this signal has extremely low bandwidth compared to pretraining.

Third, the scale difference is documented by every group that has published training details. Pretraining consumes trillions of tokens. RLHF uses on the order of tens or hundreds of thousands of human judgments. Even generous estimates put it well under one percent of the total training signal. This is not controversial.

Fourth, the improvement generalizes beyond the reward distribution. RL-tuned models perform better on prompts, domains, and benchmarks that were not part of the preference data and are evaluated automatically rather than by humans. If this were a Clever Hans effect or evaluator bias, performance would collapse when the reward model is not in the loop. It does not.

Fifth, the gains are not confined to a single definition of “truth.” They appear simultaneously in question answering accuracy, contradiction detection, multi-step reasoning, tool use success, and agent task completion rates. These are different evaluation mechanisms. The only common factor is that the model must internally distinguish correct from incorrect world states.

Finally, reinforcement learning cannot plausibly inject new factual structure at scale. This follows from gradient dynamics. RLHF biases which internal activations are favored, it does not have the capacity to encode millions of correlated facts about the world when the signal itself contains none of that information. This is why the literature consistently frames RLHF as behavior shaping or alignment, not knowledge acquisition.

Given those facts, the conclusion is not rhetorical. If a tiny, low-bandwidth, non-factual signal produces large, general improvements in factual reliability, then the information enabling those improvements must already exist in the pretrained model. Reinforcement learning is selecting among latent representations, not creating them.

You can object to calling this “knowing the truth,” but that’s a semantic move, not a substantive one. A system that internally represents distinctions that reliably track true versus false statements across domains, and can be biased to express those distinctions more consistently, functionally encodes truth.

Your three alternatives don’t survive contact with this. Clever Hans fails because the effect generalizes. Measurement artifact fails because multiple independent metrics move together. Fraud fails because these results are reproduced across competing labs, companies, and open-source implementations.

If you think this is still wrong, the next step isn’t skepticism in the abstract. It’s to name a concrete alternative mechanism that is compatible with the documented training process and observed generalization. Without that, the position you’re defending isn’t cautious, it’s incoherent.


Your three alternatives don’t survive contact with this. Clever Hans fails because the effect generalizes. Measurement artifact fails because multiple independent metrics move together. Fraud fails because these results are reproduced across competing labs, companies, and open-source implementations.

He doesn't care. You might as well be arguing with a Scientologist.


I’ll give it a shot. He’s hiding behind that clever Hans story, thinking he’s above human delusion, but the reality is he’s the picture perfect example of how humans fool themselves. It’s so ironic.

How does this stack up in 2025/6?

> they attribute their success to morally inferior conduct

I’m not seeing accusations of morally inferior conduct here. Tech people like to dunk on marketing people no matter where in the world they are.


> People literally built OpenManus the next day after Manus' launch marketing

This is not a good sign but may not be as terrible as it used to be: it seems like as soon as one idea makes money someone else is able to reproduce it fast. The barrier for defensibility is so much higher than before.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: