Vbee is the first service in Việt Nam to use artificial intelligence to turn Vietnamese text into human speech. — Photo Vbee.vn
HÀ NỘI — When she was a student, Nguyễn Thị Thu Trang was part of a volunteer group that made audio books for visually impaired people.
Meeting with many people who couldn't see daylight, Trang said they had a huge desire to access knowledge.
“They want to be able to read books, newspapers and search the Internet. But few tools were available to support their needs, while books in Braille are costly,” she said.
While she was working, she realised that merely recording books was not the best solution because it required too much time and effort.
As a student with background in technology at the Hanoi University of Science and Technology (HUST), Trang believes there could be a big pivot with technology and disabilities.
In 2009, when she was a professor at HUST, Trang herself organised a project to record audiobooks for visually impaired people.
The wishes of the blind people she met were her motivation to start a 10-year journey working on a Vietnamese reader that features text-to-speech technology. She and her two colleagues later set up a company in 2018 and named it Vbee: Vietnamese – BE your Eyes.
Text-to-speech technology has actually been researched and used around the world for many years, and is widely applied in services such as consulting, customer interaction, smart homes and smart traffic.
What makes Vbee stand out is that it is the first technology developed in Việt Nam for Vietnamese people, with Vietnamese language as the output voice, Trang said.
“The characteristics of the Vietnamese language are complex with different accents and dialects, so it is much harder to apply the technology,” Trang said.
The desire to have an artificial voice with the same emotion and tone of a Vietnamese human voice is the difference that the Vbee team cares about.
“It took us a lot of time to create a voice with an intonation that is attractive and close to the user, other than the regular reading tools that Google and Microsoft provide for the Vietnamese market,” she said.
Vbee's text-to-speech engine also features male and female accents from the North and South, and can be trained to learn a new language in four hours.
It was not an easy path for Trang and her co-workers, because the process of Vietnamese speech synthesis is a complex one.
“We have to analyse components such as sentences, words, languages and phonemes, and identify the contextual and tonal characteristics of these components. Then we have to create a model for the duration, prosody and other acoustic parameters to generate the corresponding speech,” Trang said.
Therefore, in addition to her knowledge of computer science, Trang also had to learn about speech processing and linguistics.
Khúc Hải Vân, who was born blind, and an user of Vbee, said he loves the app.
“I felt like a real person with a Vietnamese voice reading information and newspapers to me,” he said.
“I hope Vbee will develop more useful applications that can support people like me,” he said.
During the development of the tex-to-speech reader for the blind and visually impaired, Vbee’s leaders realised the potential of its text-to-speech engine (TTS engine) in other fields, said Hồ Minh Đức, co-founder of Vbee.
One of these applications is VADI, a "virtual assistant" for drivers.
The first is directions and traffic warnings. Information is collected and updated by a software development team, while people with the app can send in traffic updates themselves.
The second function featured on VADI is audio coverage of news and directions. Compared to similar applications, virtual assistance provides information in a natural human-like voice.
“VADI has received very positive feedback from mobile users,” Đức said.
“We look forward to contributing Vbee solutions using artificial intelligence to the market, helping businesses and users to have new solutions to better serve customers while saving more,” Đức said.
Trang said although there are many text-to-speech products in the market, but with products that are tailor-made for Vietnamese people built from our TTS engine, she believed Vbee has a chance to stand out.
In order to apply research on a product, it takes a lot of time and effort, and she was proud that her team had overcome such periods to bring our product to the market.
With the advantage of experienced engineers in application research, products development, marketing and sales, Vbee is constantly updating its core products and adapting them to market needs.
“Although there are large corporations providing the same service, the advantage Vbee has is that it can respond quickly and customise well to quickly fill the niche market,” Trang said.
Vbee will focus on completing its core solution and its smart call centre product. Other potential projects relating to text-to-speech technology in the future include an automated movie dubber, virtual MC and digitalised lectures. — VNS