A Born-Digital Laboratory is Born

Part of the challenge/fun of digital preservation is developing procedures for different types of content. One area we are currently focusing on is born-digital manuscript material acquired by the Library as part of a personal archive. This type of content poses unique challenges, as digital material from a personal archive may go untouched for decades, saved in now obsolete formats and stored on now obsolete media such as floppy disks. A recent risk assessment of one such archive found “the highest risks mostly reflect the inherent issues of the media formats, namely obsolescence and degradation of content.” In other words, we need to get the stuff off the scary old floppy drives and onto nice, warm, and safe preservation-focused storage.

A picture of 3.5 floppy disks from the Robert Altman collection
Good stuff on bad media. Altman Disks by lancestuch via Flickr (CC BY-NC-SA 2.0)

We are currently developing a Born-Digital Lab that can handle this type of work. The Lab essentially acts as an airlock between computing environments used by content creators and the Library’s preservation systems. This work is centered on transferring files from older media using modern or vintage equipment and completing tests on the transferred content, such as virus scans. What enters the Lab as born-digital content on external media exits as digital packages containing the original files and resulting metadata. OAIS (pdf) nerds like me call the resulting digital packages Submission Information Packages, or SIPs. The SIPs are then moved from the Lab to Library managed pre-ingest storage, where basic preservation services like fixity checks and backups occur. This dark-storage acts as a safe area (like those special waiting rooms in airports I am always getting kicked out of) until further preservation and archival actions can take place. Separating the transfer and archival functions while providing midstream preservation storage allows us to deal with the biggest threat to the content (the old media) faster than the pace of a workflow dependent on archival appraisal, arrangement, and description would allow.

Our BitCurator Workstation
Our BitCurator Workstation by lancestuch via Flickr (CC BY-NC-SA 2.0)

As mentioned, a particularly important function of the Born-Digital Lab is the creation of metadata. Metadata creation tools like those contained in the BitCurator environment identifies the location of Personally Identifiable Information (PII) and records file formats used by the creator. This information will help us down the road when making decisions around appraisal, access, and format migration. In addition to automated tools, we also create preservation metadata documenting important events like virus checks and creating forensic images of external media. This preservation metadata is stored in the resulting package created using the Library of Congress developed BagIt Specification. The BagIt metadata profile generalized from work done by The Indiana Archives and Records Administration formed the starting point for our own metadata development.

Work on getting the Lab up and running is ongoing. Stay tuned for more details on the equipment, workflow, and metadata!  

Lance Stuchell is the Digital Preservation Librarian at the University of Michigan Library.