'Mastering Vietnamese data is the first step in developing and mastering Vietnamese technology'

Báo Thanh niênBáo Thanh niên27/05/2024


TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 1.

Having worked for a large artificial intelligence organization in the US, why did you decide to return home to join VinBigdata?

While working in the US, although I participated in many large government projects, the results I produced were often just a few steps in a large process. Many times, due to the strict confidentiality of the projects, I did not even know how the solutions I developed were being used.

In 2017, I returned to Vietnam when Vietnam was in the development stage and there were many problems related to big data and artificial intelligence that needed to be solved. I accepted the invitation of Professor Vu Ha Van to jointly realize the goal of developing Vietnamese technology solutions to serve the lives of Vietnamese people. I found my return to Vietnam to be much more meaningful because I would be able to work on problems with greater impact.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 2.

Dr. Dao Duc Minh in a workshop

In the strategy of developing artificial intelligence, what role and influence does big data play, sir?

Data plays a huge and valuable role in training artificial intelligence. Training a high-quality artificial intelligence model often starts with training a large database. Therefore, to have quality artificial intelligence, we first need to have good data.

Good data requires quantity and scale, quality, variety, and universality. The process of collecting and processing thousands of hours of data from the raw data cleaning step to create the highest quality data to feed into artificial intelligence models is very expensive and complex. In contrast, to analyze big data, we need to use artificial intelligence to ensure the ability to process data accurately on a large scale, thereby creating better decisive or predictive results.

For example, in the process of developing a virtual assistant product for Vietnamese people (ViVi), we had to collect and process tens of thousands of hours of high-quality audio data, from hundreds of thousands of voices from different regions, diverse ages and genders, with content spanning hundreds of fields...

Or most recently, the launch of ViGPT - "The first Vietnamese version of ChatGPT for end users" developed from a Large Language Model fully owned by VinBigdata. This model is trained based on 600 GB of refined Vietnamese data from many different fields. With our understanding of Vietnamese data and language, we have found a new approach to shorten the launch time of ViGPT within only 9 months after ChatGPT was born.

This is the resonance between big data and artificial intelligence.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 3.
TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 4.

What is your view on linking research with practical value to serve the community?

- I believe that technology research is only truly successful when it actually enters life, solves social problems and improves people's lives.

To create practical commercial products that solve business and social problems, we must always pay attention and ask the question: what value will data bring to life?

Up to now, we have researched and developed a variety of products and solutions for various industries and fields, typically ViGPT, VinDr - providing AI solutions in medical imaging diagnosis, VinBase - a bio-artificial intelligence platform, or Vizone - a set of smart image analysis solutions.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 5.

With key personnel of VinBigdata at an event of Vingroup Corporation

The 4th industrial revolution has been taking place strongly on a global scale. What advantages do you think Vietnam has?

Compared to previous revolutions, I believe that Vietnam currently has many advantages to break through in this 4.0 industrial revolution, helping to improve the country's position on the world map. The two keys to achieving this goal are data and people.

Vietnam currently has nearly 100 million people, of which a high proportion of young people use phones and personal computers. In addition, we have reputable experts in artificial intelligence and quality young personnel in information technology and have a very good foundation in mathematics.

So what are the limitations?

The first obvious limitation is that despite having a large population, we are still having difficulty mastering data, specifically standardizing and synchronizing data at facilities, business units and administrations.

In addition, we also face other constraints such as limited investment resources, especially investment in high-performance computing infrastructure.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 6.

In your opinion, how important is the role of Vietnamese data mastery in the journey of creating and mastering technology to serve the lives of Vietnamese people?

There are currently many leading artificial intelligence products from the world, typically generative AI application products based on large language models such as ChatGPT by OpenAI or Bard by Google. However, Vietnamese is not the main language group for the development of these products.

Therefore, the quality of Vietnamese-specific content returned to users is more or less affected and has a high possibility of errors, more dangerously, errors in basic knowledge.

As Vietnamese, we have the advantage of accessing our own data sources. Only we have the ability to understand the characteristics of Vietnamese data, the needs and characteristics of Vietnamese people. Therefore, mastering Vietnamese data is really the key to mastering core technologies, which are the technologies that will serve Vietnamese people.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 7.

Internal training for VinBigdata members

How to access specific data sources, especially when most Vietnamese people today use social networking sites from abroad?

The reality is that the largest source of human data today (not just Vietnamese) is on the internet and social networks. However, we can still access and collect data from different sources, based on the understanding of the characteristics of Vietnamese data, depending on the characteristics set by each project.

For example, OpenAI's GPT models have hundreds, even trillions of parameters, are trained on huge amounts of data, and cost billions of dollars. Compared to them, we have chosen a completely different path based on our research, capabilities, and resources: creating a Vietnamese language model with an architecture of only a few billion parameters, trained on a 600 GB Vietnamese data set that we collected and refined ourselves, but with equivalent capabilities in terms of Vietnamese processing capabilities. The results show that our self-developed architecture can self-optimize, shorten the language model training time, reduce costs, and still ensure model quality.

What are the challenges that you and your team have encountered in the process of researching and developing artificial intelligence products?

The first challenge is certainly time. The wave of artificial intelligence technology is coming very quickly and is in a period of explosion. In the world, leading technology companies have quickly launched highly complete products, constantly updated and improved. If we are slow and do not launch products in time, we will certainly fall behind.

On the other hand, if we want to create products that can be applied and solve practical social problems, we must also consider finding and developing the outstanding, special and unique features of the product.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 8.

Presentation at Vietnam Artificial Intelligence Day (AI4VN 2023)

In fact, many individuals and organizations in Vietnam and around the world have suffered great losses in data leaks. How do you view the issue of data security?

It can be said that any application today comes from data. When working with data, on the one hand, we must ensure the goal of applying data to create the best technology for life, and on the other hand, we must ensure data security for individuals and organizations.

The human factor is a very important link in the data security assurance process. They include developers, product users and users. For developers, awareness of data security must be present from the very beginning of data collection and processing.

Often, when there is no problem, we are not aware of the importance of data security. But if a data leak occurs, the damage can be huge. Data leaks can occur due to technical problems or intentional attacks to steal data. When data leaks, individuals or organizations can have their information used by bad guys for illegal purposes, and businesses can suffer financial losses to fix related problems, even damage to their brand.

TS Đào Đức Minh: 'Làm chủ dữ liệu Việt là bước đầu phát triển và nắm giữ công nghệ Việt'- Ảnh 9.

Dr. Dao Duc Minh and VinBigdata team at an event

After the aspiration to master technology to serve Vietnamese people, will there be steps to advance to the world?

Any organization or business that wants to bring its products to the international market must comply with international standards. VinBigdata has strengths in solutions and technology, so setting a vision to conquer the world is natural.

Of course, to deploy for many different products and applications, it is necessary to have the support of international units with many years of experience and understanding of users around the world.

Thank you!



Source: https://thanhnien.vn/ts-dao-duc-minh-lam-chu-du-lieu-viet-la-buoc-dau-phat-trien-va-nam-giu-cong-nghe-viet-18524052710263732.htm

Comment (0)

No data
No data

Same tag

Same category

Colorful Vietnamese landscapes through the lens of photographer Khanh Phan
Vietnam calls for peaceful resolution of conflict in Ukraine
Developing community tourism in Ha Giang: When endogenous culture acts as an economic "lever"
French father brings daughter back to Vietnam to find mother: Unbelievable DNA results after 1 day

Same author

Heritage

Figure

Business

No videos available

News

Ministry - Branch

Local

Product