Sora (OpenAI)
Sora is the newest name to be announced but has caused the most stir, partly because it is a product of OpenAI - the famous developer of ChatGPT, but mainly because of the quality of the videos the program creates from just text commands.
The company’s success with ChatGPT also gives its AI a deep understanding of language. Clips demonstrating Sora’s capabilities show characters moving and expressing themselves in a way that’s as real as a human film.
"Surrealistic" video created by Sora from text commands
But Sora isn’t available to the public yet for safety reasons. OpenAI will take careful measurements before making it available to the general public, especially given the growing number of AI users who are using it for nefarious, impersonating users or illegal purposes.
Lumiere (Google)
Lumiere is a product from Google, which is also capable of generating videos from text input, based on the STUNet (Space-Time-U-Net) structured diffusion model. Lumiere does not bother with stitching still frames together, but instead, this AI identifies the details in the video (spatial part), tracks how they move, change at the same time (temporal part), thereby helping the process operate smoothly.
Like Sora, Lumiere has yet to be released to the public. The company only introduced it in late January 2024, following the release of Gemini, a major language model that has just been synchronized with Bard.
VideoPoet (Google)
This large language model (LLM) is trained from a huge repository of videos, images, audio and text developed by Google Search in 2023. VideoPoet can perform various tasks from input sources such as text, images, videos... to create videos, highlight content, convert videos to audio, turn still images into animations...
The original idea for VideoPoet stemmed from the need to convert any autoregressive language model into a video generation system. Current autoregressive language models can process text and programming code like humans, but they struggle when it comes to video. VideoPoet solves this by using tokenization to convert input from any format into a language it can understand.
Tools for creating videos from text are mostly testing their limits
Emu Video (Meta)
In addition to Google and OpenAI, Meta is also one of the Big Techs that is active in creating AI. The company that owns Facebook also developed a video-making AI called Emu Video, which can convert images into text and then use it as data to create clips.
Emu Video is receiving positive reviews from beta testers, with 81% preferring it over Imagen Video (Google). Over 90% chose Meta’s model over PYOCO (Nvidia), and it even outperformed Meta’s Make-A-Video (which 96% chose).
CogVideo (Tsinghua University, China)
Unlike the above models, which are all products of the world's leading technology companies, CogVideo is an AI developed by a research team from Tsinghua University - a leading prestigious school in China as well as Asia. The program is based on CogView2, a pre-trained text-to-image model.
Computer art expert Glenn Marshall, who tested CogVideo, said that "directors could lose their jobs." The clip called The Crow , which he created with the help of CogVideo, received high praise and was nominated for a British Academy Film Award (BAFTA).
Source link
Comment (0)