Suzanne E Chapman
Library Blogs
Showing 1 - 2 of 2 items
Results
in Blog: Library Tech Talk
for Date: October 2009
•
For the past two years the University of Michigan Library has been making many of our digitized texts (including items that are in-copyright) available to persons with print disabilities through the HathiTrust Digital Library. Our Dean, Paul Courant, recently posted about this project on his blog so I thought it might be nice to offer more background and some technical information about this project.
•
We have been making improvements to our OAI provider (UMProvider). We host the metadata for HathiTrust public domain texts through the provider, as well as all the metadata for text and image collections in the UM Digital Library.
Our first improvement was to make it faster to harvest. Our provider uses mySQL tables to store, sort and provide access to the metadata. Our method for sorting the data was one of the causes for the slowness of the harvesting.
Our second improvement comes from our investigation into the increasing number of deleted HathiTrust records that were showing up in the provider, and a discrepancy between the number of records in the provider and the number of records in our HathiTrust databases. We have not fully determined the cause of this, but we have been able to restore over 30,000 HathiTrust records that were marked as deleted in the provider.
Consequently, we recommend you harvest the provider from scratch, whether the entire metadata set or a particular set. It will be quick, and you'll get those missed records. We will keep you posted on further improvements.
(The UMProvider can be accessed via http://quod.lib.umich.edu/cgi/o/oai/oai?verb=ListRecords&metadataPrefix=oai_dc. There is useful information about the HathiTrust records in the provider at http://www.hathitrust.org/data.)
Our first improvement was to make it faster to harvest. Our provider uses mySQL tables to store, sort and provide access to the metadata. Our method for sorting the data was one of the causes for the slowness of the harvesting.
Our second improvement comes from our investigation into the increasing number of deleted HathiTrust records that were showing up in the provider, and a discrepancy between the number of records in the provider and the number of records in our HathiTrust databases. We have not fully determined the cause of this, but we have been able to restore over 30,000 HathiTrust records that were marked as deleted in the provider.
Consequently, we recommend you harvest the provider from scratch, whether the entire metadata set or a particular set. It will be quick, and you'll get those missed records. We will keep you posted on further improvements.
(The UMProvider can be accessed via http://quod.lib.umich.edu/cgi/o/oai/oai?verb=ListRecords&metadataPrefix=oai_dc. There is useful information about the HathiTrust records in the provider at http://www.hathitrust.org/data.)