This week, Google introduced Gemini, a formidable competitor to OpenAI’s GPT-4, showcasing an impressive lineup of three distinct models with varying sizes and capabilities. Among these, Gemini Ultra, the most advanced model tailored for “highly complex tasks,” outperforms GPT-4 in multiple domains, including knowledge of subjects like history and law, Python code generation, and tasks requiring multi-step reasoning, as indicated in Google’s official announcement.
Gemini’s prowess was demonstrated through its superior performance on the Massive Multitask Language Understanding test (MMLU), often likened to the “SATs for AI models.” Covering 57 subjects, including math, physics, history, law, medicine, and ethics, MMLU serves as a comprehensive measure of both world knowledge and problem-solving skills. Gemini Ultra scored an impressive 90%, surpassing GPT-4, which scored 86.4%.
Notably, Gemini Ultra achieved a milestone by surpassing human experts on the MMLU, scoring 90% compared to human experts’ 89.8%, marking a significant leap toward artificial general intelligence (AGI), capable of processing complex human capabilities.
Despite GPT-4’s advantage in common sense reasoning for everyday tasks, Gemini stands out with its native multimodal design, purpose-built to process various data types seamlessly, including text, audio, code, images, and video. This design choice, different from other multimodal models, is touted by Google as enhancing Gemini’s understanding of inputs.
There’s anticipation that Gemini’s formidable computing power, as noted by the researchers behind the SemiAnalysis blog, may outperform GPT-4. While early responses to the accessible Gemini Pro, available through Google’s chatbot Bard, have been positive, some challenges with accuracy and hallucinations have been reported, raising questions about how the trio of Gemini models will ultimately compare to OpenAI’s established presence in consumer awareness.