There is nothing worse than having to search for a specific phrase in PDF files that have been "scanned" as pictures. Now, however, Dropbox comes to solve this problem by introducing a new automated image recognition tool into its cloud hosting service that will automatically analyze the texts in photos and PDFs and add them as results to user searches. According to Dropbox, at this time there are more than 20 billion PDFs and photos.
In order to find the specific expression the user is looking for, all he / she has to do is import it into the Dropbox file search engine, just as it would in any search engine. So Dropbox will bring all PDF files containing that word or phrase as a result. The company said that this project is the most demanding project that has been trying to materialize to date machine learning part of it.
The problems they encountered were pretty. One of these is that PDF files with multiple pages have extracted many system resources for a long time. So the team decided to set a setting, the index of each PDF being executed only for 10 first pages of the file.
Automatic image recognition will only recognize English, from JPEG, GIF, PNG, TIFF, and PDF files that have been uploaded to the Dropbox cloud storage. The new service is expected to be released soon for corporate packages, but it is not known that it will never be applied to ordinary users. Finally, the indexing process will also be done on old files that have been uploaded to the service before the release of the new feature.