Google's Gemini Omni Cracks AI Video's Text Problem—But at a Cost
Key Takeaways
- ▸Gemini Omni appears to solve AI video's historically difficult text-rendering problem, demonstrated by a viral chalkboard proof-writing clip
- ▸The model is an extension of Google's Veo technology, showing incremental rather than revolutionary advancement
- ▸While text handling is a breakthrough, Omni still shows inconsistencies in complex physical interactions like eating and movement
Summary
A leaked pop-up in Google's Gemini app has revealed Gemini Omni, an unreleased video generation model that appears to represent a breakthrough in AI video's historically intractable text-rendering problem. A viral demo shows the model generating a realistic video of a professor writing mathematical proofs on a chalkboard with legible text, natural narration, and convincing physics—a feat that has eluded AI video generators until now. The model appears to be an evolution of Google's existing Veo technology rather than a completely new system, and is expected to be officially announced at Google I/O next week.
While the chalkboard result is impressive, tests reveal Omni still has significant limitations. A prompt for two people eating spaghetti at a restaurant produced obvious physical inconsistencies—spaghetti appeared from nowhere, and eating didn't match the bite motions—with ByteDance's Seeance 2 delivering more consistent results on the same prompt. Beyond the technology itself, the quota economics are concerning: two video generations consumed 86% of a Google AI Pro subscriber's daily limit, signaling that video generation will come with steep usage costs when officially launched.
- High quota consumption suggests video generation will be an expensive premium feature, limiting accessibility for casual users
Editorial Opinion
The chalkboard video is genuinely impressive and signals that Google has made real progress on a problem that has limited AI video adoption. However, the gap between the polished text rendering and the spaghetti-test failures reveals that Omni is strong in specific areas but not universally better—positioning it as an incremental win rather than a paradigm shift. Most significantly, the quota cost is the elephant in the room. If Google prices video generation like Omni at this consumption level, early adopters will hit their limits in two prompts, making this a tool for enterprise and professional use cases rather than a consumer product. The real competition isn't with open-source alternatives anymore; it's with how aggressively Google monetizes its video capabilities.



