Beyond ChatGPT: The Next Wave of AI Can See, Hear, and Create Worlds
OpenAI launched GPT-4o in May 2024, offering GPT-4-level multimodal performance at half the cost and increased speed. Google DeepMind’s Gemini and Genie 2, Meta’s AudioCraft and MovieGen, and OpenAI’s Sora and Runway Gen-2 have expanded AI capabilities to video, audio, and 3D environments. Apple’s Vision Pro debuted in 2024 with real-time spatial computing. Training relies on massive multimodal datasets and supercomputing partnerships.