The product quickly created a buzz in the Vietnamese science and technology community.

Choose the difficult path to solve Vietnamese problems

At the end of 2022, ChatGPT created a “big bang”, opening up a race to conquer artificial AI among countries and giants in the technology field. At that time, the Vietnamese technology community was also eager to develop Vietnamese products to be self-sufficient in technology, reducing dependence on international products. However, not every unit has the ability and determination to realize that desire like VinBigdata.

“Generative AI is a difficult problem. Big companies like OpenAI or Google also have to spend a lot of resources and time on research to be able to create products like we see. These products are very good, but in fact, scientists still do not fully understand its operating mechanism. When it has errors, and what the errors will be, few can predict. To develop a product similar to ChatGPT for Vietnamese people, in a short time of less than a year, there are many challenges. But we chose to "risk" because if a Vietnamese version of ChatGPT is not made by Vietnamese people, then who will make it?" - Professor Vu Ha Van - Director of Science of VinBigdata shared.

In fact, very few companies choose to build their own Large Language Models from scratch. For example, OpenAI's GPT 3 has 175 billion parameters and was trained on a 45 terabyte dataset and cost $4.6 million. According to calculations, the cost to develop GPT 4 could even be up to $100 million. "With such huge numbers, it is very difficult to find a company that can afford to invest in this technology," said Dr. Nguyen Kim Anh - Product Director of VinBigdata.

picture 1.jpg

In order for Vietnamese businesses to access new generation AI technology, with optimal costs and infrastructure, VinBigdata chose a completely different direction, which is to create a language model with only 1.6 billion parameters, but with capabilities equivalent to large language models with billions of parameters. "The results show that with the architecture developed by VinBigdata itself, it is completely possible to optimize and accelerate the language model training process, reduce infrastructure costs (including training costs and usage costs), but still ensure the quality of the model", Dr. Nguyen Kim Anh added.

After solving the problem of large language model size, during the process of "conceiving" ViGPT, after studying foreign models, the VinBigdata team also realized another challenge: "illusion", coming from the inherent nature of statistical probability models.

Accordingly, the world’s largest language models are often trained with English data sources. Therefore, this model does not really understand and respond correctly to the context and culture of Vietnamese people. This leads to a hallucination that causes the large language model to “fabricate” incorrect answers.

picture 3.jpg

To find the optimal solution in the shortest time, VinBigdata's Natural Language Processing (NLP) team was divided into small groups, analyzing and discussing different ideas to find the most suitable final direction.

“Finally, we decided to develop a different architecture from most current large language models, and conduct training on a 600GB fine-tuned Vietnamese data set, to create an “intelligent virtual assistant” capable of understanding and giving answers according to the context of Vietnamese people,” Dr. Nguyen Kim Anh added.

Aspiration for a Vietnamese technology ecosystem

According to the assessment results from the Vietnamese Language Proficiency Assessment Standards (VMLU), ViGPT achieved an average score of 42.24%, second only to ChatGPT (48.54%). This result allows ViGPT to quickly search for information and answer questions about specific and specific topics of Vietnam.

In addition to the virtual assistant's capabilities, what the development team wants is to integrate ViGPT into familiar, everyday products, in order to create changes in the lives of Vietnamese people. That is the driving force that motivates the VinBigdata team to build an ecosystem of language and voice products that apply ViGPT - the "Vi" ecosystem includes: ViChat, ViVoice, ViVi Virtual Assistant. These products can be used in many industries, from the automotive industry, banking - finance, insurance to transportation and many other fields.

“When working with technology, especially AI, we don’t just want to conquer interesting, complex systems that are difficult to see. We want to create tangible, highly applicable products, where AI is the direct agent that creates changes in life,” affirmed the VinBigdata Product Director.

image 4.jpg

Therefore, the successful development of ViGPT is just the first step in the journey to bring “purely Vietnamese” technology and data to serve the lives of millions of Vietnamese people. A representative of VinBigdata said that this unit aims to integrate ViGPT into the VinBase 2.0 multi-cognitive artificial intelligence platform, in order to provide superior solutions for organizations and businesses of various sizes and industries.

Before ViGPT, the team of experts and engineers in the field of language and speech processing technology VinBigdata made its mark by launching ViVi - the first comprehensive Vietnamese virtual assistant (applied and deployed on VinFast electric cars, Vinhomes Resident applications and Vinhomes Online e-commerce platform), at the same time, completely mastering the most advanced technologies in the world such as Voice Biometrics or Voice Cloning.

All of these technologies are developed based on a 3,500 terabyte database, focusing mainly on Vietnamese-specific data, collected, analyzed and refined by VinBigdata. The ultimate goal is to bring world technology to Vietnamese life, using Vietnamese data and knowledge systems.

ViGPT is the first "Vietnamese version of ChatGPT" for end users built on the Vietnamese large language model (LLM) developed by VinBigdata. ViGPT possesses outstanding features and is designed to best suit the needs of Vietnamese people such as content creation, information search, and answering common questions that are typical of Vietnam. Register and experience ViGPT at: vigpt.vinbigdata.com

Thanh Ha