Google has announced an ambitious new project to develop a unified language model artificialof intelligence that will support the "1.000 most spoken languages" in the world.
As a first step toward that goal, the company is unveiling an AI model trained in more than 400 languages, which it describes as "the largest language coverage available in a speech model today."
Google's "1.000 Languages Initiative" will not focus on any specific functionality, but on creating a single system with a huge range of knowledge in all the world's languages.
Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said that the company believes that creating a model of this size will facilitate the transport of various AI functions in languages that are underrepresented in online spaces and in AI training datasets (also known as “low-resource languages”).
"By having a single model that is exposed and trained in many different languages, we will have much better performance in low-resource languages," Ghahramani said.
“The way we're going to get to 1.000 languages is not going to be by building 1.000 different models. Languages are like organisms, they have evolved from each other and have certain similarities. We can have pretty spectacular advances in what we call zero-download learning when we embed data from a new language to our 1.000-language model, and we'll be able to translate [what it learned] from one high-resource language to another low-resource language.”
The company says it has no immediate plans for where to implement this model's functionality, only that it expects it to have a number of uses in products by Google, from Google Translate, YouTube subtitles and more.
"One of the really interesting things about large language models and language research in general is that they can do many different jobs," says Ghahramani.
“The same language model can turn commands for a robot into code, it can solve math problems, it can do translation. The really interesting things about language models is that they become repositories of a lot of knowledge, and by looking at them in different ways you can get to different functionalities.”