Hacker Newsnew | past | comments | ask | show | jobs | submit | anonymoushn's commentslogin

Hello, the part about canonical filtering in https://openreview.net/pdf?id=DFybOGeGDS doesn't seem to try to account for pretokenization. For example, if you receive " 天天中彩票APP" in o200k, it means there has to be a lowercase letter within the span of letters, and while tokens like (4 spaces) may be pairwise compatible with tokens like "123" according to the BPE merge rules, the pretokenizer would split the span of spaces to give (3 spaces), " ", "123" instead. Are you aware of any work that does actual canonical generation for models with this kind of pretokenization regex?

use claude code if you want to use opus

what does "logprobs look off" mean

If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

Hello, a couple years ago I participated in a contest to count word frequencies and generate a sorted histogram. There's a cool post about it featuring a video discussing the tricks used by some participants. https://easyperf.net/blog/2022/05/28/Performance-analysis-an...

Some other participants said that they measured 0 difference in runtime between pshufb+eq and eqx3+orx2, but i think your problem has more classes of whitespace, and for the histogram problem, considerations about how to hash all the words in a chunk of the input dominate considerations about how to obtain the bitmasks of word-start or word-end positions.


Awesome! The slides with roofline analysis are great! https://docs.google.com/presentation/d/16M90It8nOK-Oiy7j9Kw2...


requires fully deterministic inference, which turns out to be unusual, but for this sort of thing it's probably fine if you do really slow inference on cpu. cool idea.


please write your own posts from now on


i love stemming, i love searching for "anime" and getting "animal"


so sad to hear that about Streaming SIMD Extensions


This is true economically but in reality if you have much larger cost savings than that for sale then these companies mostly say "we would be happy to buy that for $0 while we pay you a million a year to move to the united states"


Not being sarcastic here, a million a year is not a target compensation for engineer like him, 5-7 is probably where it starts and goes to the stars


His bio says he was an Intel Fellow, which is like a VP-level individual role, and yes that's what I expected too… but apparently not? These are kinda low.

https://www.levels.fyi/companies/intel/salaries/software-eng...


Id expect his comp even before Intel to be way above that (he came from Netflix), perhaps levels info is not entirely correct for Intel or doesn’t apply to exceptional hires, fellow level compensation at FAANG seems to be more accurate there though


If they had, they would know that it involves many weeks of arguing with support, of course


Justifications upon justifications, man, so glad I no longer run infra.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: