Vietnam.vn - Nền tảng quảng bá Việt Nam

Text-to-video AIs like Sora

Báo Thanh niênBáo Thanh niên20/02/2024


Sora (OpenAI)

Sora is the newest name to be announced but has caused the most stir, partly because it is a product of OpenAI - the famous developer of ChatGPT, but mainly because of the quality of the videos the program creates from just text commands.

The company’s success with ChatGPT also gives its AI a deep understanding of language. Clips demonstrating Sora’s capabilities show characters moving and expressing themselves in a way that’s as real as a human film.

Video "siêu thực" do Sora tạo từ các lệnh văn bản

"Surrealistic" video created by Sora from text commands

But Sora isn’t available to the public yet for safety reasons. OpenAI will take careful measurements before making it available to the general public, especially given the growing number of AI users who are using it for nefarious, impersonating users or illegal purposes.

Lumiere (Google)

Lumiere is a product from Google, which is also capable of generating videos from text input, based on the STUNet (Space-Time-U-Net) structured diffusion model. Lumiere does not bother with stitching still frames together, but instead, this AI identifies the details in the video (spatial part), tracks how they move, change at the same time (temporal part), thereby helping the process operate smoothly.

Like Sora, Lumiere has yet to be released to the public. The company only introduced it in late January 2024, following the release of Gemini, a major language model that has just been synchronized with Bard.

VideoPoet (Google)

This large language model (LLM) is trained from a huge repository of videos, images, audio and text developed by Google Search in 2023. VideoPoet can perform various tasks from input sources such as text, images, videos... to create videos, highlight content, convert videos to audio, turn still images into animations...

The original idea for VideoPoet came from the need to convert any autoregressive language model into a video generation system. Current autoregressive language models can process text and programming code like humans, but they struggle when it comes to video. VideoPoet solves this by using tokenization to convert input from any format into a language it can understand.

Các công cụ tạo ra video từ văn bản đa phần đang thử nghiệm giới hạn

Tools for creating videos from text are mostly testing their limits

Emu Video (Meta)

In addition to Google and OpenAI, Meta is also one of the Big Techs that is active in AI creation. The company that owns Facebook also developed a video-making AI called Emu Video, which can convert images into text and then use it as data to create clips.

Emu Video is receiving positive reviews from beta testers, with 81% preferring it over Imagen Video (Google). Over 90% chose Meta’s model over PYOCO (Nvidia), and it even outperformed Meta’s Make-A-Video (96%).

CogVideo (Tsinghua University, China)

Unlike the above models, which are all products of the world's leading technology companies, CogVideo is an AI developed by a research team from Tsinghua University - a leading prestigious school in China as well as Asia. The program is based on CogView2, a pre-trained text-to-image model.

Computer art expert Glenn Marshall, who tested CogVideo, said that "directors could lose their jobs." The clip called The Crow , which he created with the help of CogVideo, received high praise and was nominated for a British Academy Film Award (BAFTA).



Source link

Comment (0)

No data
No data

Same tag

Same category

View of Nha Trang beach city from above
Check-in point of Ea H'leo wind farm, Dak Lak causes a storm on the internet
Images of Vietnam "Bling Bling" after 50 years of national reunification
More than 1,000 women wearing Ao Dai parade and form a map of Vietnam at Hoan Kiem Lake.

Same author

Heritage

Figure

Business

No videos available

News

Political System

Local

Product