GPT-4 vs GPT-3: How much better is GPT-4?

Reading Time: 6 minutes

The field of artificial intelligence is moving so rapidly that before we have even got our hands fully on one, another one arrives. This is the case with GPT. It was only in November 2022 that GPT-3.5, on which ChatGPT is based, was released. And now, here we are, four months later, with GPT-4.

ChatGPT was, to use the word loosely, revolutionary. It threatened to make the age-old web search obsolete. At the very least, it, like the fabled mouse, disturbed the lion which is Google enough to wake it up from its cosy slumber.

But the analogy ends there. While Google has yet to release its chatbot, Bard, to the public since it was announced following the launch of ChatGPT, OpenAI has released the successor to GPT-3.5, GPT-4.

What is the difference between GPT-3.5 and GPT-4? How much better is GPT-4 than GPT-3.5? How much more powerful is GPT-4 compared to GPT-3.5? Is it just a piddling upgrade or is the improvement substantial? These are some of the questions we’ll consider in this article.

What makes GPT-4 different from GPT-3.5?

In several respects, GPT-4 is not a lot different from its predecessor, GPT-3.5. OpenAI admitted—or rather, declared—that “the distinction between GPT-3.5 and GPT-4 can be subtle” in a casual conversation. And that is in fact a good thing; for GPT-3.5 is quite good for a number of ordinary use cases. The difference begins to manifest when the sophistication of the task reaches a certain threshold.

There are a number of things that set GPT-4 apart from GPT-3.5, however. GPT-4 is more reliable, creative and has a greater ability to handle nuanced instructions. It is factually more accurate than GPT-3.5 and is less prone to “hallucination”, ie fabricate facts.

The greatest difference however is in its ability to accept images as input besides text. And besides the expanded modality, it has a significantly increased context length. This means that GPT-4 can now take a lot more text as prompts than is the case with GPT-3.5, stay in the conversation for longer without losing context, and give lengthier answers while remaining coherent.

Rather than merely scratching the surface, let’s dig a little deeper and examine the differences between GPT-3.5 and GPT-4.

GPT-4 vs GPT-3.5: Parameters

Let’s start with the one that everyone seems most excited about—but also most often wrong: parameters of GPT-3.5 and GPT-4.

Parameters are simply variables that machine learning models use to make predictions or decisions based on input data. These variables can be adjusted during the training process to improve the accuracy of the model’s output. Think of it like a recipe for baking a cake—the ingredients and measurements are the parameters; adjusting them can result in a better or worse cake.

So what is the number of parameters in GPT-4 vs GPT-3.5?

We know that GPT-3.5 has 175 billion parameters. GPT-4, some claim, has 100 trillion parameters; others are more conservative, saying it could be close to one trillion.

But the answer is, we don’t know—not yet; perhaps we’ll never know. OpenAI has not made any statement about the number of parameters in GPT-4 nor about the size of the data it’s trained on. The numbers circulating on the internet are purely speculative and could be wide off the mark.

Wired reported in 2021 quoting Andrew Feldman, CEO of Cerebras, that GPT-4 could have about 100 trillion parameters. The exact figure could be a lot less, however. According to Semafor, GPT-4 has 1 (one) trillion parameters, which may be to be closer to reality than the outrageously high number the rumours are carrying around.

Whether the number of parameters of GPT-4 is 1 trillion or 100 trillion or anywhere between and beyond, the fact of the matter is that it’ll be substantially more than the 175 billion parameters of GPT-3.5. And it is, without a doubt, smarter and more capable than GPT-3.5.

GPT-4 vs GPT-3.5: Differences in input method

GPT-4 differs from GPT-3.5 in one crucial aspect: the input method. While GPT-3.5 accepts only textual inputs as prompts, GPT-4 will allow visual input alongside text. This significantly enhances the capabilities and use cases of GPT-4 compared to GPT-3.5 and its predecessors and opens the door to virtually endless possibilities.

We would be able to ask ChatGPT with GPT-4 to give us recipes by simply uploading a picture of the content of the fridge, for example. Or we can input a picture with a chart and let it analyse and calculate the data.

We would also be able to use it for more complex tasks such as generating codes for websites and applications by just providing it with sketches of the website or app. If you’re looking for GPT developers, here’s a great resource for finding top freelancers.

Chat GPT-4 will be able to convert a hand-drawn sketch into a fully-functioning website. The future is here! 🤯 pic.twitter.com/PYIY9BQaQq
— Howard Pinsky (@Pinsky) March 15, 2023

We could also upload a screenshot of a financial statement and ask ChatGPT to provide analysis and budget tips and advice based on it. The multimodal capabilities of GPT-4 vs GPT-3 make it a lot more user-friendly and expand its utilities.

GPT-4 vs GPT-3.5: Factual accuracy

While the training data for both GPT-4 as well as GPT-3.5 are the same, ie up to September 2021, and AI will make up facts—or rather fake facts, oxymoronically speaking—that it has not been trained on, GPT-4, according to OpenAI, significantly hallucinates less often compared to GPT-3.5.

“GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations” claimed OpenAI in its release report.

GPT-4 performed better than GPT-3 and older models across a range of categories, such as history, maths, code and science.

These improvements in factual accuracy should make it a better tool for research and writing. And its reduced tendency to hallucinate means we can trust it more with critical information such as medical advice.

GPT-4 vs GPT-3.5: Context length

One of the big limitations of ChatGPT-3 is its tendency to lose track of the conversation as the dialogue goes on, and its incapacity to handle a massive amount of text. GPT-3 had about 2049 tokens which were increased to about 4096 tokens (or about 3000 words of English text) in GPT-3.5.

Compared to these, GPT-4 has 8192 tokens, a variant they called GPT-4-8K. Another variant called GPT-4-32K has a context length of 32,768 tokens (equivalent to about 50 pages of text).

This increased context length of GPT-4 vs GPT-3 means that it has a longer “memory” and will be able to follow long conversations without losing the train of thought.

Multilingual capability of GPT-4 vs GPT-3.5

ChatGPT, being fundamentally a language model, was, to put it simply, very proficient in language, especially English. GPT-4 is markedly better compared to GPT-3.5.

GPT-3.5 has an accuracy of 70.1% whereas GPT-4 has an accuracy of 85.5% in a 3-shot massive multitask language understanding (MMLU). It performed well in languages other than English when these MMLU benchmarks, which are available mostly in English, were translated into other languages.

What’s significant is that GPT-4 performed better in some of these languages than GPT-3.5 in English. This will make chatbots based on GPT-4 much more accessible and language translation and learning a lot more fluid.

How much more powerful is GPT-4 than GPT-3.5?

From the above analysis, it’s clear that GPT-4 is more advanced than GPT-3.5. How much better GPT-4 is than GPT-3.5 is hard to pin down precisely. In any case, it depends on who is asking the question.

Some claimed that GPT-4 is 10 times more advanced than GPT-3.5; another that it’s 100 times more powerful. Whatever the figure, it is less important than its everyday performance and how much the upgrade improves its utility.

For ordinary users, the difference between GPT-4 and GPT-3 may not amount to much. For those who use ChatGPT for more complex tasks such as financial analysis and programming, and those who use the API for integrating GPT-4 into their own products, the increased tokens could be a game changer.

Limitations of GPT-4

GPT-4 though better than GPT-3.5 in every aspect is still fraught with problems and shortcomings. Its knowledge of the world is still limited to 2021. It still hallucinates, though less so. It makes, sometimes though not always, simple reasoning errors; and it can be gullible when users give it false information.

And like its predecessors, it can be confidently wrong, making it hard to distinguish when it is giving accurate information and when it is not. Moreover, since the data it has been trained on is disproportionately sourced from the English-speaking world, its answers will reflect that bias, making it less culturally sensitive. The same can be said of gender biases.

Be that as it may, there’s no denying that GPT-4 has numerous abilities and potentialities. And with them comes various implications of AI that we’ll analyse in our future articles.

Image sources: OpenAI

Some similar articles

How to Use ChatGPT to Master Excel and Google Sheets

How to Write Effective Emails with ChatGPT: Tips and Tricks

15 Ways to Use ChatGPT for Business: Harnessing the Power of ChatGPT to Scale Your Business