Beyond ChatGPT: The Next Wave of AI Can See, Hear, and Create Worlds
OpenAI released GPT-4o in May 2024 as a natively multimodal model that accepts text, audio, image, and video input and can generate text, audio, or images, delivering GPT-4-level performance at roughly 50% lower cost and faster speed. Google DeepMind Gemini, unveiled in December 2023, ships as Gemini 1.0 Ultra, Pro, and Nano and has exceeded state-of-the-art on 30 of 32 academic tasks, including 90% on the MMLU benchmark. Meta’s AudioCraft, announced in 2023, includes MusicGen for music and AudioGen for sound effects and was open-sourced, while MovieGen, released by late 2024, can generate short videos with audio up to 16