GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers.
Mixture Of Experts - Confirmed.
OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model.
They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass.
I'm just stoked they exist at all, honestly, never minding any degree of quality. Been living my best geek life lately.