Sunday, July 14, 2024

How Google’s Gemini-powered Bard compares with OpenAI’s GPT-4? – The Hindu


To enjoy additional benefits
December 07, 2023 05:09 pm | Updated 05:09 pm IST
FILE PHOTO: Google released a new version of their chatbot, Bard, that may match if not outshine OpenAI’s ChatGPT. | Photo Credit: Reuters
On Wednesday Google released a new version of their chatbot, Bard, that may match if not outshine OpenAI’s ChatGPT. Powered by its new large language model, Gemini, the chatbot will be available in English in more than 170 countries and territories including India.
While Gemini has multimodal features and is able to analyse images and audio in the demo clip released by Google, these features will only be rolled out in Bard on a later date.
Demis Hassabis, who heads Google DeepMind, underlined the non-text interactions with Gemini as the area where it really stands out. The demos that Google showed included instances of parents uploading their children’s homework to determine mistakes and YouTuber Mark Rober using Bard to make a paper airplane perfectly by uploading pictures of his designs to get AI feedback.
Google also made a point to stick to promotional videos instead of a live demo as it had done while launching Bard. The demo was received with a divisive reaction owing to a hallucination by the chatbot during CEO Sundar Pichai’s presentation.
(For top technology news of the day, subscribe to our tech newsletter Today’s Cache)
Gemini is available in three versions and Bard is based on the Gemini’s Pro version (mid-tier offering). The current version of Bard is comparable to OpenAI’s popular chatbot ChatGPT which is built on the GPT-3.5 model, a slightly more constrained form of the GPT-4. The more advanced version of ChatGPT called ChatGPT Plus is based on the GPT-4.
The blog posted by Google charted the comparison between Gemini’s models and other prominent AI models like Google’s previous benchmark model PaLM-2, Musk’s Grok and Hugging Face’s Llama-2 besides the GPT models.
Gemini Pro scored a 79.13% versus GPT-3.5’s 70% at the MMLU or the Multi-Task Language Understanding benchmark test. In the GSM8K benchmark test which judges arithmetic reasoning Gemini Pro beat GPT-3.5’s 57.1% with a much higher 86.5%. At the HumanEval benchmarks which is meant to test code generation, Gemini Pro beat GPT-3.5’s 48.1% with 67.7%. The only benchmark where GPT-3.5 fared better was MATH where Gemini Pro scored 32.6% compared to GPT-3.5’s 34.1%.
Meanwhile, Google also claims that the yet-to-be-released Gemini Ultra outperformed state-of-the-art models like GPT-4 on 30 out of 32 benchmark tests including reasoning and image recognition.
Once Ultra is out, Bard will evolve further into what Google calls “Bard Advanced.”
Gemini Ultra received a remarkable 90% in Massive Multitask Language Understanding (MMLU), showcasing its ability to comprehend 57 subjects, including STEM, humanities, and more. GPT-4V, on the other hand reported an 86.4%.
At reasoning, Gemini Ultra scored an 83.6% in the Big-Bench Hard benchmark, demonstrating proficiency in diverse, multi-step reasoning tasks compared to GPT-4V’s 83.1%. Gemini Ultra excelled with an 82.4 F1 Score in the DROP reading comprehension benchmark, while GPT-4V achieved an 80.9 3-shot capability in a similar scenario.
At MATH, Gemini Ultra scored a 94.4% at basic arithmetic manipulations while GPT-4V scored a 92.0%.
Some users put Bard to test in real-time. Bojan Tunguz, a data scientist with AI chipmaker NVIDIA posted on microblogging platform ‘X’ about his experience with the new version of the bot. When Bard was asked for recent updates on Israel and Gaza, it asked Tunguz to use Google Search for the latest updates. Tunguz then went on to post screenshots of the responses given by Grok and ChatGPT which were detailed and in ChatGPT’s case even divided into separate subheads.
Another X user, Ethan Mollick, an associate professor at Wharton post his experience on using the latest Bard. Mollick asked it to explain the meaning of entropy in a way as if it was speaking to third grade school kids. Bard’s response however contained factual errors. Mollick then asked Bard and ChatGPT to fact check this draft. While ChatGPT was able to catch the hallucinations, Bard corrected a different portion of the draft was initially accurate.
technology (general) / internet / Artificial Intelligence / emerging technologies
BACK TO TOPBack to Top
Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.
We have migrated to a new commenting platform. If you are already a registered user of The Hindu and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.


Leave a Response