More

swyx · 2026-01-14T15:43:56 1768405436

fyi i maintain a repo that accidentally tracks github actions cron reliability (https://www.swyx.io/github-scraping) - just runs a small script every hour.

i just checked and in 2025 there was at least 2 outages a month every month https://x.com/swyx/status/2011463717683118449?s=20 . not quite 3 nines.

swyx · 2026-01-14T07:19:43 1768375183

as a first step, let's tax these things. this is such an immense waste of electronics.

swyx · 2026-01-14T07:03:19 1768374199

hi will! super nicely written, nice look under the hood of your processing. as an orchestration guy i always wondered why everyone seems to converge on using Ray, and as a secondary thought, how well is Anyscale capturing the Ray market.

if i were doing what you do i might set up a lot of rate limits/anomaly detection in case some weird unintended invalidation causes a weird spike in your dependency graphs. is there good practice there for anomaly detection other than "setup a bunhc of dashboards and be on call"?

twyxy · 2026-01-14T20:31:12 1768422672

Ray is the future

swyx · 2026-01-11T21:04:04 1768165444

interesting exercise and well written. my followon questions/work would be:

1a. temperature=100000 is interesting too. obviously "ideal" temperature lies somewhere between 0 and 100000. has anyone ablated temperature vs intelligence? surely i'm not the first person to this idea. commonly people try to set temp=0 to get "deterministic" or "most factual" output but we all know that is just Skinner pigeon pecking.

1b. can we use "avg temperature" as a measure in the way that we use perplexity as a measure? if we see temperature as inverted perplexity with some randomness thrown in, are they basically the same thing inverted? or subtly different?

1c. what's the "avg temperature" of most human communication? whats the "avg temperature" of a subset of "good writers"? whats the "avg temperature" of a subset of "smart writers"?

2a. rerun this negative exercise with constrained vocab to english

2b. RL a model to dynamically adjust its own temperature when it is feeling 1) less confident 2) in brainstorm mode

2c. dynamically inject negative temperature every X tokens in a decode, then judge/verify the outcome, to create high variance synthetic data?

its hard for me to follow the train of thought on 2 because negative temp is essentially not that different from ultrahigh temp in practice.

embedding-shape · 2026-01-11T21:42:45 1768167765

> commonly people try to set temp=0 to get "deterministic" or "most factual" output but we all know that is just Skinner pigeon pecking.

Hmm? Given the same runtime, the same weights, and with the model actually giving deterministic output with temp=0, are you saying this isn't actually deterministic? Most FOSS/downloadable models tend to work as expected with temp=0 in my experience. Obviously that won't give you "most factual" output, because that's something completely else, but with most models it should give you deterministic output.

swyx · 2026-01-11T23:09:15 1768172955

"What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here)"

https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

wongarsu · 2026-01-11T23:25:55 1768173955

Also from the article:

"Note that this is “run-to-run deterministic.” If you run the script multiple times, it will deterministically return the same result. However, when a non-batch-invariant kernel is used as part of a larger inference system, the system can become nondeterministic. When you make a query to an inference endpoint, the amount of load the server is under is effectively “nondeterministic” from the user’s perspective"

Which is a factor you can control when running your own local inference, and in many simple inference engines simply doesn't happen. In those cases you do get deterministic output at temperature=0 (provided they got everything else mentioned in the article right)

dnautics · 2026-01-12T00:02:39 1768176159

Having implemented LLM APIs, if you selected 0.0 as the temperature, my interface would drop the existing picking algorithm and select argmax(Logits)

omneity · 2026-01-12T08:18:10 1768205890

That's just an implementation artifact and not a fundamental fact of life.

https://docs.vllm.ai/en/latest/features/batch_invariance/

remexre · 2026-01-11T22:42:19 1768171339

There's usually an if(temp == 0) to change sampling methods to "highest probability" -- if you remove that conditional but otherwise keep the same math, that's not deterministic either.

Majromax · 2026-01-11T22:49:00 1768171740

If you remove the conditional and keep the same math, you divide by zero and get nans. In the limit as temperature goes to zero, you do in fact get maximum likelihood sampling.

dnautics · 2026-01-12T00:03:40 1768176220

if (t==0) argmax(logits) else pick(logits)

eru · 2026-01-12T08:16:05 1768205765

You are ignoring the limit taking.

dnautics · 2026-01-12T15:39:03 1768232343

If t < 0.1 return {error, 400}

eru · 2026-01-13T01:28:22 1768267702

Do you know that a mathematical limit is?

TomatoCo · 2026-01-11T22:47:35 1768171655

I'd assume that's just an optimization? Why bother sorting the entire list if you're just gonna pick the top token, linear time versus whatever your sort time is.

Having said that, of course it's only as deterministic as the hardware itself is.

dnautics · 2026-01-12T00:05:08 1768176308

The likelihood that top-two is close enough to be hardware dependent is pretty low. IIUC It's more of an issue when you are using other picking methods.

embedding-shape · 2026-01-11T22:47:19 1768171639

In for example llama.cpp? Specific to the architecture or in general? Could you point out where this is happening? Not that I don't believe you, but I haven't seen that myself, and would appreciate learning deeper how it works.

vlovich123 · 2026-01-12T01:32:31 1768181551

Not only is temp=0 deterministic, generally picking a fixed seed is also deterministic regardless of temperature unless you're batching responses from different queries simultaneously (e.g. OpenAI).

-_- · 2026-01-11T23:22:45 1768173765

Author here! 1a. LLMs fundamentally model probability distributions of token sequences—those are the (normalized) logits from the last linear layer of a transformer. The closest thing to ablating temperature is T=0 or T=1 sampling. 1b. Yes, you can do something like this, for instance by picking the temperature where perplexity is minimized. Perplexity is the exponential of entropy, to continue the thermodynamic analogy. 1c. Higher than for most AI written text, around 1.7. I've experimented with this as a metric for distinguishing whether text is written by AI. Human-written text doesn't follow a constant-temperature softmax distribution, either.

2b. Giving an LLM control over its own sampling parameters sounds like it would be a fun experiment! It could have dynamic control to write more creatively or avoid making simple mistakes. 2c. This would produce nonsense. The tokens you get with negative temperature sampling are "worse than random"

swyx · 2026-01-13T01:14:35 1768266875

> . I've experimented with this as a metric for distinguishing whether text is written by AI. Human-written text doesn't follow a constant-temperature softmax distribution, either.

oo that sounds like a cool insight. like just do a trailing 20-30 token average of estimated temperature and look for variance like one might do a VO2 max

swyx · 2026-01-09T22:41:55 1767998515

sharing my list of mistakes in markdown that Gruber endorsed :) https://news.ycombinator.com/item?id=22776108

anildash · 2026-01-10T01:05:20 1768007120

I was texting with John the other night while working on this piece, and reminiscing about my initial quibbles about the format, and I think I had been frustrated by just about everything on your list. I just need you to travel back in time to tell me to fuss more!

swyx · 2026-01-09T20:25:58 1767990358

no, they were not made to do it. they listened to feedback and did the work. this is better than we get in 99% of cases. try to be nicer and meet them half way instead of living in your ideal world.

bloomingeek · 2026-01-10T12:13:41 1768047221

In my ideal world, corporate responsibility is a must. Making junk products or killing product updates because they can't sell you the updated version is irresponsible. They listened to feedback because they know their products are overpriced for the market, so they decided to do the right thing, but only after they were called out. That's backwards. Corporations don't know the meaning of nice, only money.

swyx · 2026-01-09T20:02:49 1767988969

the frontend is beautiful. i find it inspiring that you have 10 years of data science and are no longer limited by your lack of frontend or design knowledge. this is a better site than i couldve done

swyx · 2026-01-08T20:40:54 1767904854

> Jeff & Sanjay did everything together,

my nonexpert impression is jeff keeps much more of a public profile. hence the natural celebrity goes to him. was this not true way back in the day?

kentonv · 2026-01-08T21:30:15 1767907815

I feel like Jeff's public profile has grown quite a bit since then. Note that in 2008 he wasn't doing anything related to AI yet -- none of that had even started. That has since given him somewhat of a more public role, whereas Sanjay has stayed on infrastructure which is more internal-facing. I do think Jeff Dean Facts in itself has played some part in enhancing his celebrity status, too.

With that said, I suppose it's hard for me to say what the public perception of the two was in 2008 as I only knew of either of them from working there.

swyx · 2026-01-08T03:27:41 1767842861

i asked them this in my interview. tldr they subsidize all inference on their platform https://www.youtube.com/watch?v=NBnOk0Uy9ig&t=70s

swyx · 2026-01-06T18:27:59 1767724079

> The results I use in my dashboard are from DistilBERT because it runs efficiently in my Cloudflare-based pipeline.

interesting - why use cloudflare vs say hf inference or modal? and is this replicate-cloudflare or normal cloudflare?