Hacker Newsnew | past | comments | ask | show | jobs | submit | manofmanysmiles's commentslogin

I would love to ask the author: are you sure that large language models are only modeling language?


Whatever gets predicted by tokens gets summarized by symbols, which are artifacts of language. This gets to the illusory aspects of binary as well, the rabbit hole goes deep.


I haven't shouted into the void for a while. Today is as good a day as any other to do so.

I feel extremely disempowered that these coding sessions are effectively black box, and non-reproducible. It feels like I am coding with nothing but hopes and dreams, and the connection between my will and the patterns of energy is so tenuous I almost don't feel like touching a computer again.

A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic 3) The history of tool use and chat input is not availabler as a first class artifact for use.

I would love to see a tool that logs the full history of all agents that sculpt a codebase, including the inputs to tools, tool versions and any other sources of enetropy. Logging the seed into the RNGs that trigger LLM output would be the final piece that would give me confidence to consider using these tools seriously.

I write this now after what I am calling "AI disillusionment", a feel where I feel so disconnected from my codebase I'd rather just delete it than continue.

Having a set of breadcrumbs would give me at least a modicum of confidence that the work was reproducible and no the product of some modern ghost, completely detached from my will.

Of course this would require actually owning the full LLM.


> A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic...

models themselves are deterministic, this is a huge pet peeve of mine, so excuse the tangent, but the appearance of nondeterminism comes from a few sources, but imho can be largely attributed to the probabilistic methods used to get appropriate context and enable timely responses. here's an example of what I mean, a 52-card deck. The deck order is fixed once you shuffle it. Drawing "at random" is a probabilistic procedure on top of that fixed state. We do not call the deck probabilistic. We call the draw probabilistic. Another exmaple, a pot of water heating on a stove. Its temperature follows deterministic physics. A cheap thermometer adds noisy, random error to each reading. We do not call the water probabilistic. We call the measurement probabilistic.

Theoretical physicists run into such problems, albeit far more complicated, and the concept for how they deal with them is called ergodicity. The models at the root of LLM's do exhibit ergodic behavior; the time average and the ensemble average of an observable are identical, i.e. the average response of a single model over a long duration and the average of many similar models at a fixed moment are equivalent.


The previous poster is correct for a very slightly different definition of the word "model". In context, I would even say their definition is the more correct one.

They are including the random sampler at the end of the LLM that chooses the next token. You are talking about up to, but not including, that point. But that just gives you a list of possible output tokens with values ("probabilities"), not a single choice. You can always just choose the best one, or you could add some randomness that does a weighted sample of the next token based on those values. From the user's perspective, that final sampling step is part of the overall black box that is running to give an output, and it's fair to define "the model" to include that final random step.


but, to be fair, simply calling the sampler random is what gives people the impression like what OP is complaining about. which isn't entirely accurate, it's actually fairly bounded.

this plays back into my original comment, which you have to understand to know that the sampler, for all its "randomness" should only be seeing and picking from a variety of correct answers, i.e. the sample pool should only have all the acceptable answers to "randomly" pick from. so when there are bad or nonsensical answers that are different every time, it's not because the models are too random, it's because they're dumb and need more training. tweaking your architecture isn't going to fully prevent that.


The User:

The stove keeps burning me because I can't tell how hot it is, it feels random and the indicator light it broken.

You:

The most rigorous definition of temperature is that it is equal to the inverse of the rate of change of entropy with respect to internal energy, within a given volume V and particles N held constant. All accessible microstates are equiprobable over a long period of time, this is the very definition of ergodicity! Yet, because of the flow of entropy the observed macrostates will remain stable. Thus, we can say the the responses of a given LLM are...

The User:

I'm calling the doctor, and getting a new stove with an indicator light.


Well really, the reason why I gripe about it, to use your example, is that then they believe the indicator light malfunctioning is an intrinsic feature of stoves, so they throw their stove out and start cooking over campfires instead, tried and true, predictable, whatever that means.

I think my deck of cards example still holds.

You could argue I'm being uselessly pedantic, that could totally be the case, but personally I think that's cope to avoid having to think very hard.


Here is a definite scientific nail down and solve for non-determinism in LLM outputs (Mira Murati's new outfit but really credit the author)

https://bff531bb.connectionism.pages.dev/blog/defeating-nond...


Requires a login?



It's also a pet peeve of mine, enough that I actually wrote a blog about it

https://hi-mil.es/blog/human-slop-vs-ai-slop


I share the sentiment. I would add that people I would like to see use LLMs for coding (and other technical purposes) tend to be jaded like you, and people I personally wouldn't want to see use LLMs for that, tend to be pretty enthusiastic


I've been building something like this, a markdown that tracks your prompts, and the code generated.

https://github.com/sutt/innocuous/blob/master/docs/dev-summa...

Check it out, I'd be curious of your feedback.


Maybe just take a weekend and build something by writing the code yourself. It's the feeling of pure creative power, it sounds like you've just forgotten what it was like.


Yeah, tbh I used to be a bit agentic coding tool-pilled, but over the past four months I've come to realize that if this industry evolves in a direction where I don't actually get to write code anymore, I'm just going to quit.

Code is the only good thing about the tech industry. Everything else is capitalist hellscape shareholder dystopia. Thinking on it, its hilarious that any self-respecting coder is excited about these tools, because what you're excited for is a world where, now, at best, your entire job is managing unpredictable AI agents while sitting in meetings all day to figure out what to tell your AI agents to build. You don't get to build the product you want. You don't get to build it how you want. You'll be a middle manager that gets to orchestrate the arguments between the middle manager you already had and the inflexible computer.

You don't have to participate in a future you aren't interested in. The other day my boss asked me if I could throw Cursor at some task we've had backlogged for a while. I said "for sure my dude" then I just did it myself. It took me like four hours, and my boss was very impressed with how fast Cursor was able to do it, and how high quality the code was. He loves the Cursor metrics dashboard for "lines accepted" or whatever, every time he screenshares he has that tab open, so sometimes I task it on complicated nonsense tasks then just throw away the results. Seeing the numbers go up makes him happy, which makes my life easier, so its a win-win. Our CTO is really proud of "what percentage of our code is AI written" but I'm fairly certain that even the engineers who use it in earnest actually commit, like, 5% of what Cursor generates (and many do not use it in earnest).

The sentiment shift I've observed among friends and coworkers has been insane over the past two months. Literally no one cares about it anymore. The usage is still there, but its a lot more either my situation or just a "spray and pray" situation that creates a ton of disillusioned water cooler conversations.


This pretty much sums up my experience.


If you care about this so much why don't you use one of the open source OpenAI models? They're pretty good and give you the guarantees you want.


None of the open weight models are really as good as SOTA stuff, whatever their evals says. Depending on the task at hand this might not actually manifest if the task is simple enough, but once you hit the threshold it's really obvious.


> where I feel so disconnected from my codebase I'd rather just delete it than continue.

If you allow your codebase to grow unfamiliar, even unrecognisable to you, that's on you, not the AI. Chasing some illusion of control via LLM output reproducibility won't fix the systemic problem of you integrating code that you do not understand.


Who cares about the blame, it would just be useful if the tools were better at this task in many particular ways.


It's not blame, it's useful feedback. For a large application you have to understand what different parts are doing and how everything is put together, otherwise no amount of tools will save you.


The process of writing the code, thinking all the while, is how most humans learn a codebase. Integrating alien code sequentially disrupts this process, even if you understand individual components. The solution is to methodically work through the codebase, reading, writing, and internalizing its structure, and comparing that to the known requirements. And yet, if this is always required of you as a professional, what value did the LLM add beyond speeding up your typing while delaying the required thinking?


I completely agree.


And now imagine you'd have to rely on humans to build your software instead


This is the question though isn't it?

With sufficient structure and supervision, will a "team" of agents out-perform a team of humans?

Military, automotive and other industries have developed rigorous standards consisting of among other things detailed processes for developing software.

Can there be an AI waterfall? With sufficiently unambiguous, testable requirements, and a nice scaffolding of process, is it possible to achieve the dream of managers, and eliminate software engineers? My intuition is evenly split.


What I've been doing is running an agent inside a locked down k8s environment. Agents are spun up by operator, and have access to a single namespace.

It's not perfect, as container escape is not entirely unlikely.

I am working in a future version where all agents run inside firecracker VMs, log all actions logged externally.

With Kubernetes it's like having a bunch of virtual employees making git commits, firing up name-spaced ephemeral resources and collaborating like "remote" employees. It's certainly fun, but I haven't quite polished it to the point where I recommend this architecture to anyone.


I just spent a lot of yesterday tweaking a docker image with xfce and vs code so I can just let codex go full access mode without too much worry in a throwaway sandbox. The agent runs similarly-namespace-constrained and without sudo. I think it's a relatively safe middleground- do you really think container escape is still a big deal here?

Finally getting this setup also allowed me to very quickly troubleshoot what was breaking my build in the codex cloud hosted container which obviously has even less risk attached.

Now I'm juggling and strategizing branches like coding is an RTS game... and it feels like a super power. It's almost like unlocking an undiscovered tech tree.


I make projects following almost identical patterns. It's a little uncanny. Maybe the people in the python developer ecosystem are converging on a pretty uniform way to do most things? I though some of my choices were maybe "my own", it seeing such consistency makes me question my own free will.

It's like when people pick a "unique" name for their baby along with almost everyone else. What you thought was a unique name is the #2 most popular name.


This sort of architecture has been in favor with python for at least 10 years or so, but I think you're right — the structure just makes sense, so many reasonable engineers converge on using it.


as though subsurface pilot waves in every spectrum hold human egos as constituent particles - becoming-being lol


It is the side of ice cream that makes all the difference for the improbability drive.


This isn't getting much love.

I love this not because any individual tool is necessarily that impressive (though some are very cool), nor the shear number of them, but rather this shows me a hint of a very different software world, one where every tool is a response to a immediate need.

Each one of these looks almost like crystallized intention. This makes me think what's needed next is a meta tool that indexes all of these in a embeddedings db.


So what would happen if you placed a giant dice rolling machine to control a weapon systems based on the outcome of rolling dice?

Seems like it's unknown, and anyone doing so with the expectation that it is anything but a gamble is missing the whole point.

If you don't want your weapons influenced by randomness, don't connect them to systems that rely on randomness for decision making.


Although I don't usually use react, for me, there is a certain joy and also efficiency that comes from using some of the abstractions that you got from the larger JavaScript/web ecosystem while also having the ability render all that magic to a folder with a couple of HTML, CSS and JavaScript files.

With LLMs to help wrestle the boilerplate, I've found I can whip up a fast static site using advanced ergonomic abstractions that make the whole process a joy! In the past, wrestling the Node and NPM ecosystem was a a complete nightmare. Now it's, a dream — with the occasional storm cloud.


I love it! I effectively achieve similar results by asking Cursor lots of questions!

Like at least one other person in the comments mentioned, I would like a slightly different tone.

Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.

I may submit a PR though not if it takes a lot of time.


Thanks—would really appreciate your PR!


Theoretically yes, but tricky from what I gather.

Reasons

1) Most DNA is same with sparse deltas, so pathogen would need to match large amount of DNA and ignore lots. Challenges in biometrics and general mirror this.

2) Things tend to mutate, so even if this was possible with a single pathogen, it may mutate either targeting other genetics, or changing behavior.

3) Targeting someone specifically, and getting the "germs" into their system is tricky. I haven't thought about this as much but high mutation rate means that it would need to be introduce physically and socially close to target.

Maybe targeting specific racial groups or some other evil thing would be easier. There are many existing disease which affect different groups (not just racial) in differently with statistical significance.

That being said, I hadn't thought much about this before and it's certainly a "fun" scary thought.


How about a wireguard tunnel from an ingress box? You still pay for one VPS, but can run everything locally and just load balance at the ingress. I just manually add configs to nginx, but there are automated tools too.


Lol, kind of defeats the purpose


What defeats what purpose? I don't run k8s out of some love of ... managing external IPs?


Tunneling through a single external node defeats the purpose of hosting k8s in home server.

Maybe the external ingress node can be a load balancer controlled by the k8 cluster. But then you still have to communicate with the home server and it has no exposed ip address


> Tunneling through a single external node defeats the purpose of hosting k8s in home server.

How so? You can just rent a cheap server to tunnel through, while having the benefits of your home machine(s) for compute.

> Maybe the external ingress node can be a load balancer controlled by the k8 cluster. But then you still have to communicate with the home server and it has no exposed ip address

Do you mean that you wouldn’t be able to access the K8s control plane endpoint then (which you could if configured properly)? Or something else?


>how so?

SPOF


And having a single IP address, with one ISP at home isn't a SPOF?


@TZubiri, Then if that is a risk you accept, you could have multiple VPS's and load balance back to your home network, eliminating the new SPOF.

(@'ing because we reached the maximum reply limit)


@Daviey

Or just get your very own static IP.

It's a ZPOF.

Routing happens automatically on nearby router routes

It's deep down a matter of taste, you have a home server in Arizona and you route users to a Hetzner server in Germany and then back?

Don't justify, just recognize it's in bad taste, seek to use ip addresses as geographical host identifiers. Do not hide origin or destination. Minimize


You are adding a(nother) SPOF.


Are you even running a real homelab if you're not running MetalLB in BGP mode?!

Sarcasm obviously, but it's a fun exercise, especially if you get a real (v6, probably) net to announce.


It's on my bucket list.

My Unifi Cloud Gateway Max doesn't (yet) support BGP, but other Unifi devices do, so I do hope that it'll come to my device to. If not, I'll have to think of other ways to test it. But in the meantime, there's plenty of other stuffs to learn.

I have a real, ISP-routed IPv6 net to play with.


Right?! Get some free IP space from he.net tunnel broker and build your own IPv6 AnyCast network using Quagga for BGP.

Kidding about building your own AnyCast network (although you really could…), but he.net tunnel broker is GOAT.


I wish Happy Eyeballs worked better or let me absolutely prefer IPv4, because every time I set up a free v6 tunnel I get banned about 2 days layer for pushing a few terrabytes over it, when all I wanted was to be able to SSH into every one of my containers on a cheap VPS separately. Or the tunnel is so slow, I'm degrading my entire internet connectivity.


>free v6 tunnel

There's your problem. You need some form of cost attached to some identity assets, anything under the IANA umbrella, ips/domain names. This is in order to prevent sybil attacks. This is all well studied under he hashcash bitcoin era as PoW.

So yeah, you actually need to spend some money not in exchange of something here, but as the very thing you need, you need to distinguish yourself from those that spend 0$, not because they are cheap, but because they may do it 1000 times and ruin it your pooled reputation.


Unfortunately there's quite a market gap if you need bandwidth. I can't just pay HE 10 bucks a month, their service doesn't work that way. Hetzner would work, but their IP space is very often randomly blocked from lots of things.


> because every time I set up a free v6 tunnel I get banned about 2 days layer for pushing a few terrabytes over it, when all I wanted was to be able to SSH into every one of my containers

can you describe a bit more? I cannot connect the dots here on how terabytes are tied to free v6 tunnel - likely I'm missing some details. Thank you in advance.


If someone else provides IPv6 connectivity to you, you use their bandwidth. Some apps like Steam see IPv6 connectivity and use it regardless of what you'd prefer it to use, hinting mechanisms and all. So while I just wanted to use the tunnel for things that IPv4 does not provide, I always end up tunneling half my traffic over it, which free services don't like.


I do this for email after I got a new IP address from the shit KPN pool instead of the clean XS4ALL pool. Outgoing email proxies through an IP address at Hetzner. It's not pointless because

- I get specs from an old laptop (that I had laying around anyway) that would probably cost like 50€/month to rent elsewhere. Power costs are much lower (iirc some 2€/month) and it just uses the internet subscription I already had anyway

- When I do hardware upgrades, I buy to own. No profit margin, dedicated hardware, exactly the parts I want

- All data is stored at home so I'm in control of access controls

- Gigabit transfer speeds at home, which is most of the time that I want to do bulk transfers

I see various advantages that still exist when you need to tunnel for IP-layer issues

Edit: got curious. At Hetzner, a shared cpu with 16GB RAM is 30€/month, but storage is way too little so an additional 53€/month just for a 1TB drive needs to be added (at that price, you could buy one of these drives new, every month again and again and still have enough money over to pay for the operating electricity; you'd have a fleet of 60 drives at the expected lifetime of 5 years, or even at triple redundancy you get 20TB for that price). I'll admit the uplink would be significantly better for this system, but then my downlink isn't faster than my uplink so at home I wouldn't even notice. Not sure how much of a difference a dedicated CPU would make

At AWS, I have to guess things like iops (I put in 0 to have a best-case comparison), bandwidth (I guessed 1TB outbound, probably some months is five times more, some months half). It says the cheapest EC2 instance with these specs, shared/virtual again mind you so no performance guarantees, is t4g.xlarge. With storage and traffic, this costs 301$/month which I guess is nearly the same in euros after conversion fees. If I generously pay 3 years up front, it's only 190$ monthly + 2'156$ up front, so across 3 years that's 250$/month (and I'm out of over 2 grand which has an expected average return of 270€ at the typical 4% in an all-world ETF — I could nearly fund the electricity costs of the old laptop just from the money I lose in interest while paying for AWS! Probably 100% if I bought a battery and solar panel to the value of 2150€)

I actually have more than 1TB storage but don't currently use all of it, so figured this is a fair comparison

The proxy I currently have at Hetzner costs me 4€/month, so I save many multiples of my current total cost (including the at-home costs) by self hosting.


For cheap storage at hetzner you could add a storagebox (not fast, but fine), or now even objecstorage.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: