Elon Musk’s xAI Unveils Grok 1.5 Vision AI Model in Preview, To Compete With GPT-4 Vision and Gemini Pro 1.5

Technology


Elon Musk's artificial intelligence (AI) firm xAI has unveiled a new AI model called the Grok 1.5 Vision. This Large Language Model (LLM) is an improved version of the recently released Grok 1.5 model. With this update, the AI ​​model is now equipped with computer vision, making it capable of accepting visual media as input. Can process images and answer questions about them. Notably, the announcement came just days after OpenAI unveiled its own computer vision-powered GPT-4 model.

The announcement was made by the official X account (formerly known as Twitter) of xAI. The company shared a blog post detailing the new AI model and shared some of its benchmark scores. Since the vision capabilities were added to the newly introduced Grok 1.5 model, most of the details remain the same. It has the same context window of 1,28,000 tiles and the overall benchmark scores are also likely to remain the same.

xAI also shared benchmark scores of the Grok 1.5 Vision tested on a benchmark developed by the company. The AI ​​firm calls it the RealWorldQA benchmark, and it measures “real-world spatial understanding.” Also tested the model on other benchmarks such as MMMU, Mathvista, ChartQA and more. While Grok outperformed OpenAI's GPT-4 with Vision and Gemini 1.5 Pro in RealWorldQA, it scored lower in MMMU and ChartQA.

For the uninitiated, computer vision is a branch of computer science that deals with equipping computers (and AI models) with the ability to identify and understand real-world objects using images and videos. This is designed to help computers see and process visual signals the way humans do. With the rise of multimodal AI models, many companies are now focusing on developing vision-centric models. Google's Gemini 1.5 Pro and OpenAI's GPT-4 with Vision have this capability.

This technology also offers a wide range of applications. Indian calorie tracking and nutrition feedback platform Healthify recently added a feature called Snap where users can click a picture of a food or cuisine, and GPT-4 with an AI chatbot powered by Vision suggests how can you make the recipe healthier and how much. exercise to burn extra calories. In the future, AI models with computer vision may help diagnose diseases, build self-driving cars, and more.


Affiliate links may be automatically generated; see our ethics statement for more information.

comments

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and technology, subscribe to our YouTube channel. If you want to know all about the best influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Square Enix aims to release the third game of the Final Fantasy 7 Remake Trilogy in 2027


Apple loses top phone maker spot to Samsung as iPhone shipments fall, according to IDC





Source

Leave a Reply

Your email address will not be published. Required fields are marked *