Millions of files go through the Internet Archive at Commons Flickr
The unused photos 600 million pages of old books digitized by the nonprofit Internet Archive gradually ascend to FIickr, with the contribution of academician Kalev Leetaru. It is estimated that her photo hosting site Yahoo 12 will flood millions of historical photos from 1500 to 1922 that have passed to the public domain and are considered to be a common property without any restrictions on their use.
The photos come from public library books that have been digitized for years by the Internet Archive, however, they end up in a file format PDF or plain text without the ability to search for photos.
Kalev Leetaru's software as opposed to optics software recognitionς χαρακτήρων δεν παρακάμπτει τις φωτογραφίες. Αξιοποιεί μάλιστα την αδυναμία του OCR, υποθέτοντας πως ότι παρακάμπτει είναι φωτογραφία και το αποθηκεύει σε μορφή αρχείου εικόνας Jpeg. Επιπλέον, επιχειρεί να συνοδεύσει τα αρχεία εικόνας με επεξηγηματικό κείμενο υπό μορφή λεζάντας, επιλέγοντας το κείμενο που διάβασε το OCR πριν και μετά την φωτογραφία της σκαναρισμένης σελίδας.
The universality of the Internet
Professor Leetaru's ambition is to make use of these photos -2,6 millions of which have already climbed to FIickr- by its authors Wikipedia to enrich its content, especially when the entry refers to historical events. He seems willing to distribute his code in libraries around the world to export photos from books they are trying to convert to digital, reports the BBC.
However, the users of FIickr complain that since July, when the Internet Archive became a member of the service, his photos flooded the site and appear very often on Results without the possibility of user exclusion.
Source: tovima.gr