Phillip R Farber
Posts by Phillip R Farber
•
A recent blog pointed out that search is hard when there are many indexes to search because results must be combined. Search is hard for us in DLPS for a different reason. Our problem is the size of the data. The Library has been receiving page images and OCR from Google for a while now. The number of OCR'd volumes has passed the 2 million mark. This raises the question of whether it is possible to provide a useful full text search of the OCR for 2 million volumes. Or more. We are trying to find out.