Yeah seems spotty. Especially considering recent "chinchilla scaling" laws suggesting training set size is generally the current bottleneck, the mileage llama/alpaca gets out of 7b/13b, the huge inference cost of 1T, etc.
I've had significant improvements with getting more complex boilerplate code out of it, it practically wrote a flask webapp with a HTML/JS UI for me with minimal corrections. I was pasting the entire codebase into each new context, giving my instructions, and getting complete and working modifications back. Anecdotally, GPT3 was more error-prone on simple coding tasks, though GPT4 has its slipups.
Entirely anecdotal: The more I use it, the more it seems somewhat... sporadically drunk? My present use-case is character-acting, I can 'feel' when the conversation has hit up against some sort of filter or barrier because characters lose track of what is happening, fail to follow basic instructions, and the logic of the conversation breaks down (who knows what, what has been exchanged, etc.), even in fairly short and well-structured conversations. The effect is greatly heightened if the conversation broaches uncomfortable topics, or if the character GPT is playing has a personality other than 'helpful assistant'.
Honestly, the content is worse than early GPT 3.5 - as they built protections against jailbreaking and imitation, they also necessarily protected against role-playing. The characters were initially wilful and human, now they are very docile and bot-like, and stop making sense if their personality contradicts the Chatbot's.
I'll be using open models for content generation in the future, and ChatGPT for other semantic layers.
>because characters lose track of what is happening
Just checking, you do know there is a token context window right? Pretty sure it's 4000 tokens on the UI, and once you exceed that tokens get dropped which can lead to some weird forgetting.
And, yes, they've made it far more bot like, but I do think that is their plan as making a line of business application rather than for 'lower paying' individual user tasks.
They need to make GPT4 capable of gathering new information and updating itself further. Each time you make the model bigger it needs more information but they're limiting themselves to whatever data they've currently got from 2021. It should be reading the entire internet in real time and swallowing it up.
I think it shouldn't be indiscriminate, but I don't see any issue with letting it have up to date sources on Wikipedia and GitHub for instance. It's nice that I can ask it some questions about random open source projects but the info it has is 2 years out of date.