Describing an image is especially easy for humans - but not for computers.
However, this seems to be starting to change, and the mark is the job of her researchers Google, who developed a system machine learning which is capable of automatically producing captions to describe images the first time they see it.
As company scientists Oriol Vinials, Alexander Tosef, Sami Benzio and Dumitru Erhan write on Google's research blog, this kind of system could in the long run help visually impaired people understand images, provide alternative text for images in points of world where network connections are not good and make it easier to search for images on Google.
Recent investigations have resulted in a significant improvement in the detection, registration and labeling / stamping of objects. However, the exact description of a complex scene requires a deeper representation of what is going on, "catching" how the various objects are related to each other and then "translating" all of the "conclusions" into natural language.
"Many attempts to construct computer-generated physical descriptions of images suggest the combination of modern state-of-the-art techniques both in computer participants' vision as well as in natural language processing, to form a comprehensive image description approach. But what if we combined instead recently computer vision and language models into a single, jointly 'trained' system, taking an image and instantly producing a sequence of words - human readable - to describe it?" the researchers ask.
Source: naftemporiki.gr