Describing an image is especially easy for humans - but not for computers.
However, this seems to be starting to change, and the sign is the work its researchers Google, who developed a system machine learning which is capable of automatically producing captions to describe images the first time they see it.
As company scientists Oriol Vinials, Alexander Tosef, Sami Benzio and Dumitru Erhan write on Google's research blog, this kind of system could in the long run help visually impaired people understand images, provide alternative text for images at points of the world where the connections network are not good and make it easier to search for images on Google.
Recent investigations have resulted in a significant improvement in the detection, registration and labeling / stamping of objects. However, the exact description of a complex scene requires a deeper representation of what is going on, "catching" how the various objects are related to each other and then "translating" all of the "conclusions" into natural language.
«Πολλές προσπάθειες να κατασκευάσουμε computer-generated φυσικές περιγραφές εικόνων προτείνουν τον συνδυασμό σύγχρονων state of the art τεχνικών τόσο στο computer vision όσο και στο natural language processing, για τον σχηματισμό μιας συνολικής προσέγγισης περιγραφής εικόνας. Αλλά τι θα γινόταν αν αντί για αυτό συνδυάζαμε πρόσφατα models computer vision and language in a single, jointly 'trained' system, taking an image and instantly producing a sequence of words - human readable - to describe it?" the researchers ask.
Source: naftemporiki.gr