Pho GPT stands for Pho - Generative Pre-trained Transformer, is a large language model project dedicated to Vietnamese, implemented by the VinAI engineering team - a member of Vingroup Corporation.
“Catch up” with world technology
Pho GPT uses open-source code instead of proprietary software like ChatGPT from OpenAI. This means that the source code of Pho GPT is public, available, and users can contribute to the development of Pho GPT through custom applications.
According to VinAI, Pho GPT has 7.5 billion parameters, built on the Transformer decoding platform. This model is trained from scratch, using the most advanced techniques available such as Flash Attention mechanism and AliBi context length extrapolation.
These techniques not only help Pho GPT understand the context more deeply, but also increase the application's ability to dialogue and interact naturally during use. This makes the model a versatile and multi-tasking tool, capable of meeting the diverse language needs of users.
Mr. Bui Hai Hung, General Director of VinAI. |
Sharing about the significance of the birth of Pho GPT, Mr. Bui Hai Hung, General Director of VinAI, said that the goal of the project is to develop models similar to ChatGPT for Vietnamese language and Vietnamese culture. Pho GPT has the ability to understand and write Vietnamese writing style in a way that is superior to previous generation language technologies. The model is also trained from scratch with Vietnamese data set, not depending on any other models in the world, ensuring mastery of advanced core technology for Vietnam.
It is worth mentioning that, just when the world was buzzing about the birth of Chat GPT, a year later, Pho GPT appeared in Vietnam. According to Mr. Bui Hai Hung, VinAI is the pioneer in Southeast Asia to launch a large language model with open source code. A few weeks later, a similar product was launched in Singapore.
Elevating Vietnamese AI
The results of comparing the Pho GPT-7B5-Instruct version with the closed source ChatGPT (GPT-3.5-turbo) and other open source models show that Pho GPT ranks second, only after ChatGPT in most evaluation categories.
Pho GPT has many differences compared to other language models, especially ChatGPT. It is designed to understand and write Vietnamese writing style naturally, reflecting the context, grammar, vocabulary, and expressions of Vietnamese people. It can interact with users on topics related to Vietnamese culture, history, geography, society, entertainment, sports, etc.
Furthermore, Pho GPT is open source and flexible. Users can develop customized and unique applications, especially applications that require high security, without depending on the source from proprietary software.
At the same time, the priority of Pho GPT is also high performance and cost savings with training using the latest optimization techniques, helping to reduce the size and increase the speed of the application. Pho GPT can also run on a smaller computing platform, helping to reduce costs and save resources.
Trained with a Vietnamese data warehouse of up to 41GB, including 1GB of Wikipedia text and a 40GB variant that has removed duplicates from the news data set, trained using Mosaicml llm's llm-foundry library, Pho GPT can generate text fragments according to user requests, such as articles, poems, songs, essays, speeches, introductions... Pho GPT can also create creative, humorous content, such as short stories, comments, proverbs, messages, tweets, memes... At the same time, it also dialogues with users on different topics such as current events, education, health, travel, cuisine, sports, entertainment... Besides, Pho GPT can also answer users' questions, provide information, advice, support, answer questions...
In addition, Pho GPT can translate texts or text types of different nature such as official, commercial, academic, literary documents... from Vietnamese to other languages and vice versa.
Another outstanding feature is that Pho GPT analyzes and processes text passages, such as summarizing, classifying, labeling, extracting information, detecting emotions, detecting errors, improving writing style, etc.
In the future, the Pho GPT development team said they will continue to improve the model and expand the project to other languages, especially in the Southeast Asian region.
“The birth of Pho GPT marks the first time Vietnam has had the opportunity to “catch up” with the world in this advanced technology field and own a large language model with open source code specifically for Vietnamese people, optimized for the Vietnamese language, independent of the world. This is the pride of Vin AI in particular and Vietnamese people in general,” added a representative of VinAI.
Commenting on the potential and development opportunities of Pho GPT in the coming time, Mr. Tran Duy Dong, Deputy Minister of Planning and Investment, emphasized: “AI will be one of the fields in which Vietnam has a lot of potential to develop strongly and soon catch up with the world level. The Ministry will always support and accompany the AI community in particular, the science and technology community in general, to develop a comprehensive and dynamic innovation ecosystem, contributing to the overall development of the country”.
Source
Comment (0)