Vietnam.vn - Nền tảng quảng bá Việt Nam

New AI tool creates high-quality photos, 9 times faster

Scientists from MIT and NVIDIA have successfully developed HART - a tool that creates high-quality images at an exceptionally fast speed, while consuming so few resources that it can run directly on a laptop or smartphone.

VietNamNetVietNamNet26/03/2025

picture 1.jpg

This image of an astronaut riding a horse was created using two types of generative AI models. Photo: MIT News


When speed and quality are no longer trade-offs

In the field of AI imaging, there are currently two main approaches:

Diffusion models allow for sharp, detailed images. However, they are slow and computationally intensive, requiring dozens of processing steps to remove noise from each pixel.

Autoregressive models are much faster because they predict small parts of an image sequentially. But they often produce images with less detail and are prone to errors.

HART (hybrid autoregressive transformer) combines both, providing the “best of both worlds”. It first uses an autoregressive model to construct the overall image by encoding it into discrete tokens. Then, a lightweight diffusion model takes over to fill in the residual tokens – the detailed information lost during encoding.

The resulting images are of comparable (or better) quality to state-of-the-art diffusion models, but are 9x faster to process and use 31% fewer computational resources.

New approach to creating quality images at high speed

One of the notable innovations of HART is how it solves the problem of information loss when using autoregressive models. Converting images into discrete tokens speeds up the process, but also loses important details such as object edges, facial features, hair, eyes, mouths, etc.

HART's solution is to have the diffusion model focus only on "patching up" these details through residual tokens. And since the autoregressive model has already done most of the work, the diffusion model only needs 8 processing steps instead of the 30+ steps it used to.

“The diffusion model is easier to implement, leading to higher efficiency,” explains co-author Haotian Tang.

Specifically, the combination of an autoregressive transformer model with 700 million parameters and a lightweight diffusion model with 37 million parameters gives HART the same performance as a diffusion model with up to 2 billion parameters, but nine times faster.

Initially, the team also tried integrating the diffusion model into the early stages of the image generation process, but this led to an accumulation of errors. The most effective approach was to let the diffusion model handle the final step and focus only on the “missing” parts of the image.

Unlocking the Future of Multimedia AI

The team’s next step is to build next-generation visual-linguistic AI models based on the HART architecture. Since HART is scalable and adaptable to a wide range of data types (multimodal), they expect to apply it to video generation, audio prediction, and many other areas.

This research was funded by several organizations including the MIT-IBM Watson AI Lab, the MIT-Amazon Science Center, the MIT AI Hardware Program, and the US National Science Foundation. NVIDIA also donated GPU infrastructure to train the model.

(According to MIT News)


Source: https://vietnamnet.vn/cong-cu-ai-moi-tao-anh-chat-luong-cao-nhanh-gap-9-lan-2384719.html


Comment (0)

No data
No data

Same tag

Same category

Legend of Father Elephant Rock and Mother Elephant Rock in Dak Lak
View of Nha Trang beach city from above
Check-in point of Ea H'leo wind farm, Dak Lak causes a storm on the internet
Images of Vietnam "Bling Bling" after 50 years of national reunification

Same author

Heritage

Figure

Business

No videos available

News

Political System

Local

Product