Text-to-video AIs like Sora

Báo Thanh niênBáo Thanh niên20/02/2024


Sora (OpenAI)

Sora is the newest name to be announced but has caused the most stir, partly because it is a product of OpenAI - the famous developer of ChatGPT, but mainly because of the quality of the videos the program creates from just text commands.

The company’s success with ChatGPT also gives its AI a deep understanding of language. Clips demonstrating Sora’s capabilities show characters moving and expressing themselves in a way that’s as real as a human film.

Video "siêu thực" do Sora tạo từ các lệnh văn bản

"Surrealistic" video created by Sora from text commands

But Sora isn’t available to the public yet for safety reasons. OpenAI will take careful measurements before making it available to the general public, especially given the growing number of AI users who are using it for nefarious, impersonating users or illegal purposes.

Lumiere (Google)

Lumiere is a product from Google, which is also capable of generating videos from text input, based on the STUNet (Space-Time-U-Net) structured diffusion model. Lumiere does not bother with stitching still frames together, but instead, this AI identifies the details in the video (spatial part), tracks how they move, change at the same time (temporal part), thereby helping the process operate smoothly.

Like Sora, Lumiere has yet to be released to the public. The company only introduced it in late January 2024, following the release of Gemini, a major language model that has just been synchronized with Bard.

VideoPoet (Google)

This large language model (LLM) is trained from a huge repository of videos, images, audio and text developed by Google Search in 2023. VideoPoet can perform various tasks from input sources such as text, images, videos... to create videos, highlight content, convert videos to audio, turn still images into animations...

The original idea for VideoPoet stemmed from the need to convert any autoregressive language model into a video generation system. Current autoregressive language models can process text and programming code like humans, but they struggle when it comes to video. VideoPoet solves this by using tokenization to convert input from any format into a language it can understand.

Các công cụ tạo ra video từ văn bản đa phần đang thử nghiệm giới hạn

Tools for creating videos from text are mostly testing their limits

Emu Video (Meta)

In addition to Google and OpenAI, Meta is also one of the Big Techs that is active in creating AI. The company that owns Facebook also developed a video-making AI called Emu Video, which can convert images into text and then use it as data to create clips.

Emu Video is receiving positive reviews from beta testers, with 81% preferring it over Imagen Video (Google). Over 90% chose Meta’s model over PYOCO (Nvidia), and it even outperformed Meta’s Make-A-Video (which 96% chose).

CogVideo (Tsinghua University, China)

Unlike the above models, which are all products of the world's leading technology companies, CogVideo is an AI developed by a research team from Tsinghua University - a leading prestigious school in China as well as Asia. The program is based on CogView2, a pre-trained text-to-image model.

Computer art expert Glenn Marshall, who tested CogVideo, said that "directors could lose their jobs." The clip called The Crow , which he created with the help of CogVideo, received high praise and was nominated for a British Academy Film Award (BAFTA).



Source link

Comment (0)

No data
No data

Same tag

Same category

Same author

Figure

French father brings daughter back to Vietnam to find mother: Unbelievable DNA results after 1 day
Can Tho in my eyes
17-second video of Mang Den so beautiful that netizens suspect it was edited
The primetime beauty caused a stir because of her role as a 10th grade girl who is too pretty even though she is only 1m53 tall.

No videos available