Google has announced an ambitious new project to develop a single AI language model that will support the "1.000 most spoken languages" in the world.
As a first step toward that goal, the company is unveiling an AI model trained in more than 400 languages, which it describes as "the largest language coverage available in a speech model today."
Google's "1.000 Languages Initiative" will not focus on any specific functionality, but on creating a single system with a huge range of knowledge in all the world's languages.
Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said that the company believes that building a model of this size will make it easier to port various AI functions to languages that are underrepresented in online spaces and AI training datasets (also known as “low-resource languages”).
"By having a single model that is exposed and trained in many different languages, we will have much better performance in low-resource languages," Ghahramani said.
“The way we're going to get to 1.000 languages is not going to be by building 1.000 different models. Languages are like organisms, they have evolved from each other and have certain similarities. We can make pretty spectacular advances in what we call zero-resource learning when we incorporate data from a new language into our 1.000-language model, and we'll be able to translate [what's learned from] one high-resource language into another low-resource language .”
The company says it has no immediate plans for where to implement this model's functionality, only that it expects it to have a range of uses in Google products, from Google Translate, YouTube subtitles and more.
"One of the really interesting things about large language models and language research in general is that they can do many different jobs," says Ghahramani.
“The same language model can turn commands for a robot into code, it can solve math problems, it can do translation. The really interesting things about language models is that they become repositories of a lot of knowledge, and by looking at them in different ways you can get to different functionalities.”