GPT-4o Tested: Faster and More Versatile Than Before, but Questions Loom Over Reliability

Technology


Since November 2022, when ChatGPT was first released to the public, OpenAI has been the company to beat in the artificial intelligence (AI) space. Despite spending billions of dollars and creating and restructuring (looking at you, Google) their own AI division, the big tech giants have consistently found themselves catching up with the AI ​​business . Last month was no different; when just a day before the Google I/O event, OpenAI held its Spring Update event and introduced GPT-4o with major updates.

Features of GPT-4o

The “o” in GPT-4o stands for omnichannel, an important focus of the new capabilities of OpenAI's latest flagship-grade AI model. It added real-time emotive voice generation, Internet access, integration with certain cloud services, computer vision, and more. While the features were impressive on paper (and in the tech demos), the highlight was the announcement that ChatGPT powered by GPT-4o will be available to everyone, including free users.

However, there were two caveats. Free users only have limited access to GPT-4o, which translates to roughly 5-6 turns if you use web search and upload an image (yes, the limit is one image per day for free users). Also, the voice feature is not available for free users.

It also didn't take OpenAI to launch the new AI model to the public. Luckily, I got access to the company's latest AI creation within days and immediately started playing with it. I wanted to test its improvement compared to its predecessor and to all the free LLMs available in the market. I've now spent nearly two weeks with the AI ​​assistant, and while some aspects have wowed me, others have disappointed me. Allow me to explain.

GPT-4o General Generative Capacities

I've said in my Google Gemini test that I'm not a fan of ChatGPT's generative capabilities. I find it too formal and bland. Much remains the same. I asked her to write a letter to my mother explaining that I was fired from my job and she came up with the wonderful sentence “I am feeling a deep sense of sadness and pain.” But once I asked him to be more conversational, the result was much better.

Generative capabilities GPT-4o

I tested this with several similar prompts where the AI ​​had to express some emotion in its writing. In almost every case, I had to follow up with another prompt to emphasize the emotions despite having done so in the original prompt. In comparison, my experience with Gemini and Copilot was much better as they kept the language conversational and expressed emotions much closer to how I would write.

Text generation speed is nothing to write home about. Most AI chatbots are pretty fast when it comes to text output, and OpenAI's latest AI model doesn't beat it by a significant margin.

GPT-4o Conversation skills

Although it didn't have the updated voice chat feature, I wanted to test the conversational capabilities of the AI ​​model because it's often the most overlooked part of the chatbot. I wanted my experience to be similar to talking to a real person and hoped I could pick up vague phrases that referred to the above mentioned topics. I also wanted to see his reaction when a person was difficult.

In my tests, I found GPT-4o to be pretty good in terms of conversation skills. He could discuss the ethics of AI with me in great detail and concede when I made a convincing presentation. He also responded with support when I told him I was sad (because I was being laid off) and offered to help in various ways. When I said about GPT-4o that all of his solutions were stupid, he didn't respond in a pushy way, or back off altogether, to my surprise. It said, “I'm so sorry to hear you feel this way. I'm going to give you some space. If you ever need to talk or need help, I'll be here. Take care.”

Overall, I found GPT-4o better for having conversations than Copilot and Gemini. Gemini feels too restrictive, and Copilot often goes off on a tangent when the answers get vague. ChatGPT did none of these things.

If I had to mention one drawback, it would be the use of bullets and numbering. If only the AI ​​model understood that people in real life prefer a wall of text and several short messages sent in rapid succession to well-formatted replies, my illusion could be suspended for more than a couple of minutes.

GPT-4o Computer Vision

Computer vision is a newly acquired skill for ChatGPT and I was excited to try it out. Essentially, it allows you to upload an image and analyze it to give you information. In my initial tests, I shared images of objects to identify and did a great job. In all cases, it could recognize the object and share information about it.

gpt 4o ss2 GPT-4o screenshot

GPT-4o Computer Vision: Identification of Technological Devices

Then it was time to increase the difficulty and test its capabilities in real use cases. My girlfriend was looking for a wardrobe overhaul and being a good boyfriend, I decided to use ChatGPT to do a color analysis to suggest what would look good on her. To my surprise, not only was she able to analyze her skin tone and what she was wearing (based on a similar colored background), but she was also able to share a detailed analysis with outfit suggestions.

gpt 4o ss3 GPT-4o screenshot

GPT-4o color analysis

While I was suggesting outfits, I was also sharing links to different online retailers for particular clothes. However, unfortunately none of the URLs match the text.

Overall, the computer vision is excellent and perhaps my favorite feature of the new update, the drawbacks aside.

GPT-4o web searches

Internet access was one area where both Copilot and Gemini were ahead of ChatGPT. But not anymore, as ChatGPT can also search the Internet for information. In my initial tests, the chatbot worked well. He brought up the IPL 2024 table and looked for recent news articles about Geoffrey Hinton, one of the three godfathers of AI.

It was very helpful when I wanted to research famous personalities for interviews I had scheduled. I was able to quickly search for any recent news article about them with accuracy that rivaled Google Search. However, this also set off some alarm bells in my head.

Google has disabled the ability to search for information about people, including celebrities. This is primarily to protect your privacy and to avoid sharing inaccurate information about an individual. Surprised that ChatGPT still allowed this, I started asking it a series of questions I shouldn't be able to answer. I was surprised by the results.

While none of the information shown was pulled from a non-public source, the fact that anyone can easily search for information about celebrities and people with fingerprints is deeply troubling. Especially considering the strong ethical stance the company recently took when it released its Model Spec, this doesn't sit well with me. I'll let you decide whether this is in the gray area or deeply problematic.

GPT-4o Logical reasoning

During the Spring Update event, OpenAI also talked about how the GPT-4o can act as a tutor for children and help them solve problems. I decided to test it using some famous logical reasoning questions. Overall, it worked well. It even answered some of the trickier questions that puzzled GPT 3.5.

However, there are still bugs. I found several cases of number series where the AI ​​failed and gave the wrong answer. While I could still accept the AI ​​making some mistakes, what really disappointed me here was how it still failed me on some extremely easy questions (but intended to trick the AI).

gpt 4o ss4 GPT-4o screenshot

Example of GPT-4o hallucination

When asked, “How many are in the word strawberry,” he confidently answered two (the correct answer is three, in case you were wondering). The same problem existed in several other questions called. In my experience, the logical reasoning and reliability of GPT-4o is similar to its predecessor, which is not great at all.

GPT-4o: Final reflections

Overall, I'm pretty impressed with the upgrades in certain areas of the new AI model, with computer vision and conversational speech being my favorites. I'm also impressed with its internet search capability, but it's so good that I'm more concerned. In terms of logical reasoning and generative abilities, there is little improvement.

In my opinion, if you have premium access to GPT-4o, it's likely better than any other competitor in terms of overall delivery. However, there is a lot of room for improvement and AI cannot be blindly trusted.



Source

Leave a Reply

Your email address will not be published. Required fields are marked *