Machine translation is one of the most successful applications of artificial intelligence technology in natural language processing. High-quality machine translation systems such as Google Translate or Microsoft Bing Translator need large-scale bilingual datasets, up to millions of sentence pairs, to train the model.

However, many languages ​​in the world do not have enough resources. Therefore, building an effective machine translation model for resource-poor languages, including those of the Southeast Asian region, is very urgent and challenging.

Recently, the Institute of Information Technology (Vietnam Academy of Science and Technology) has researched and mastered the most advanced machine translation technology today. This unit has also successfully built a multilingual text translation system between Vietnamese and regional languages ​​including Lao, Khmer, Thai, Malaysian and Indonesian.

According to the developer, languages ​​such as Lao, Thai, and Khmer pose huge challenges when building machine translation models. The difficulty comes not only from the scarcity of bilingual data, but also because these languages ​​are morphologically rich, lacking word segmentation, sentence segmentation, and polysemy.

The AI ​​model developed by the Institute of Information Technology has "learned" how to "adapt" to all the special features of the above languages. From there, the software allows for the rapid addition of other languages ​​when needed with translation quality equivalent to advanced foreign products.

The special thing is that this multilingual translation software runs separately, stores data locally, and does not use the API of other service providers. This helps ensure security, safety, and no information leakage.

W-vien-han-lam-illustration-ai-tri-tue-nhan-tao-1.jpg
Some scientific and technological products of the Vietnam Academy of Science and Technology are displayed at the Vietnam International Innovation Exhibition 2023. Photo: Trong Dat

One problem with translation systems like Google Translate or Bing Translator is their domain-specific adaptability. That is, they can translate well for general, popular language domains serving the masses, but have poor translation quality in specialized language domains such as medicine, law, security, etc.

To overcome the above shortcomings, the research team at the Institute of Information Technology has developed a Vietnamese-centric translation system, capable of two-way translation into resource-poor languages ​​with good quality.

Specifically, this software has the same or higher quality than Google Translate for the same text. In addition, the software does not limit the length of the text.

In the period 2022-2023, the system focuses on deploying Large Language Models (LLMs) techniques, prioritizing the following language pairs: Vietnamese - Khmer, Vietnamese - Lao, Vietnamese - Thai, Vietnamese - Malay and Vietnamese - Indonesian.

With English language (a very abundant data resource and a priority strength of Google), the software of the Institute of Information Technology ensures quality almost equivalent to Google Translate. In particular, the system has the ability to fine-tune to adapt to specialized language domains such as medicine, law... according to the specific requirements of partners.

This system was self-developed by the research team, based on the technical infrastructure that supports large language data storage and the strongest artificial intelligence/machine learning (AI/ML) supercomputing capacity in Vietnam.

The Institute of Information Technology has complete mastery of the relevant technologies. Therefore, this unit can easily expand the application to new target languages ​​including ethnic minority languages ​​in Vietnam (often very poor in data resources) such as Muong, Thai, etc. and popular foreign languages ​​such as Chinese, French, Russian, etc. when needed.

This multilingual translation software Made in Vietnam is expected to be the solution to the problem of information access for ethnic minorities.

Vietnam's artificial intelligence market is worth 100 million USD . In Vietnam, AI technology is currently applied largely in customer care services, especially in banking and soon in insurance.