You can do a rough distill through the APIs. You don't need the weights.
It was much easier when companies had models on the /completion style APIs, because you could actually get the logits for each generation step, and use that as a dataset to fit your model to.
That isn't to diminish the efforts of the Chinese developers though, they are great.
They just lack of performant hardware. They have enough knowledge. And so they choose a more effective strategy without wasting resources on training from scratch.
I am giving my 6 year old girl an old acer netbook that boots directly to pico-8.
This will be her first computing experience. She never had access to phones or tablets.
I must be holding wrong then because I do use Claude Code all the time and I do think its quite impressive… still I cant see where the productivity gains go nor am I even sure they exist (they might, I just cant tell for sure!)
Sure. But am I supposed to still understand that code at some point? Am I supposed to ask other team members to review and approve that code as if I had written it?
I'm still trying to ship quality work by the same standards I had 3 or 5 years ago.
No not, worse code. Wrong code. Code filled with bugs. Code filled with lawsuits too.
Code that make you look productive this month while you prepare to leave the company, and turn out to be absolute pooopoo the day after you leave.
I think there might be something here! a core of truth about what the future might hold. I cant take this approach right now though. Its not a good approach today.
I’ve always had this weird intuition that Zeno’s Arrow Paradox is some indication that there must be some discreteness. Somehow, somewhere, there must be a ‘tick.
reply