Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GPT4 Has 1T Parameters (the-decoder.com)
32 points by mr-ai on March 25, 2023 | hide | past | favorite | 15 comments


I don't trust this. The article cites semafor (https://www.semafor.com/article/03/24/2023/the-secret-histor...), but semafor states the 1T parameter count without any source.


Yeah seems spotty. Especially considering recent "chinchilla scaling" laws suggesting training set size is generally the current bottleneck, the mileage llama/alpaca gets out of 7b/13b, the huge inference cost of 1T, etc.


Yeah, I'm highly suspicious too. Even the arxiv article from the MS researchers doesn't have specifics about the # of parameters in GPT-4.


Sam altman said it had 1T parameters in the Lex Fridman podcast


Has anyone seen a massive delta between GPT3.5 & 4?

For my use cases (writing code), I can't seem to detect any difference in performance. Certainly not 6x or whatever the actual figure is.


I've had significant improvements with getting more complex boilerplate code out of it, it practically wrote a flask webapp with a HTML/JS UI for me with minimal corrections. I was pasting the entire codebase into each new context, giving my instructions, and getting complete and working modifications back. Anecdotally, GPT3 was more error-prone on simple coding tasks, though GPT4 has its slipups.

Entirely anecdotal: The more I use it, the more it seems somewhat... sporadically drunk? My present use-case is character-acting, I can 'feel' when the conversation has hit up against some sort of filter or barrier because characters lose track of what is happening, fail to follow basic instructions, and the logic of the conversation breaks down (who knows what, what has been exchanged, etc.), even in fairly short and well-structured conversations. The effect is greatly heightened if the conversation broaches uncomfortable topics, or if the character GPT is playing has a personality other than 'helpful assistant'.

Honestly, the content is worse than early GPT 3.5 - as they built protections against jailbreaking and imitation, they also necessarily protected against role-playing. The characters were initially wilful and human, now they are very docile and bot-like, and stop making sense if their personality contradicts the Chatbot's.

I'll be using open models for content generation in the future, and ChatGPT for other semantic layers.


>because characters lose track of what is happening

Just checking, you do know there is a token context window right? Pretty sure it's 4000 tokens on the UI, and once you exceed that tokens get dropped which can lead to some weird forgetting.

And, yes, they've made it far more bot like, but I do think that is their plan as making a line of business application rather than for 'lower paying' individual user tasks.


6x bigger than GPT3.5 as per multiple anonymous sources

100x smaller than the 100T meme that went around before release, which would cost too much to run and be too slow it was speculated.


They need to make GPT4 capable of gathering new information and updating itself further. Each time you make the model bigger it needs more information but they're limiting themselves to whatever data they've currently got from 2021. It should be reading the entire internet in real time and swallowing it up.


I think they learned from Microsoft Tay what happens if you train AI online without a curation process.


I think it shouldn't be indiscriminate, but I don't see any issue with letting it have up to date sources on Wikipedia and GitHub for instance. It's nice that I can ask it some questions about random open source projects but the info it has is 2 years out of date.


This article looks like written by ChatGPT...


Is this the first model to reach 1 trillion params?



Cool!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: