Brief Survey of Digital Library Software Systems

DLPS is currently (July 2010) exploring possible avenues for the future development of DLXS. DLXS is a mature and robust digital library information retrieval and repository system in use here at the University of Michigan Library and several other institutions for roughly a decade, with deeper roots back to the early 90's. DLXS has four classes, or components, supporting text (e.g., books), images (e.g., photos), finding aids (e.g., EAD) and bibliographic databases. At Michigan we are hosting over 250 collections with DLXS, which you may wish to visit to get a better sense of its capabilities and the baseline for this survey.

I compiled this brief survey of existing digital library software systems to gain a better understanding of where DLXS fits in the current landscape, and what other systems have to offer. I have included systems that are designed for libraries and which are either comparable or potentially complementary to DLXS. I thought, maybe, I would encounter many I had not heard of, but other than Veridian, and more recently SimpleDL, I did not. I intentionally excluded content management systems, such as Drupal, which have the potential to be implemented as digital library systems, but were not designed for that purpose specifically. I also excluded viable production systems not packaged and distributed as a product, such as what is behind HathiTrust.

I mention HathiTrust in particular because Michigan has played a major role in it's development, and we have gained valuable experience that will be useful as we continue with DLXS. If you are interested, there is an emerging HathiTrust Collaborative Development Environment.

You will find minimal, broad stroke, commentary expressed in terms relative to DLXS. If it seems I've mischaracterized a system, or overlooked a key strength, feel free to comment. If you have other solutions to share, please do. We'll start with DLXS because it was the basis for our exploration. This survey is not comprehensive nor in-depth, but I hope you find it to be useful, and maybe you can help fill-in some of the blanks.

DLXS: Summary, Features, Technical Details

Examples:

Examples from multiple institutions.
University of Michigan Library: Text Collections, Finding Aids (EAD) Collections, Image Collections, Scholarly Publishing Office Collections

Notes:

Developed by the University of Michigan Digital Library (that's us!).
Strong support for search and display of highly structured XML (which is a rare and powerful feature).
XPAT search engine has one time license fee. Image collections use MySQL, not XPAT.
Scales reasonably well.
Strong as an access system with very good support for collections of content, and searching across multiple collections within a class (text, image, finding aid). -
Similar to DLXS in that it IS DLXS.

XTF: Summary, Features, Technical Details

Examples:

http://www.marktwainproject.org/
http://www.calisphere.universityofcalifornia.edu/
http://www.oac.cdlib.org/ (Finding Aids from numerous institutions)

Notes:

Developed by CDL (California Digital Library, University of California).
Replaced DLXS, Greenstone, Dynaweb for CDL.
Used for text, finding aids, image collections and more.
Uses Lucene.
Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections.

Greenstone: Summary, Features, Same as Summary

Technical Details:

GreenstoneWiki: documentation for users/content managers.
Developer's Guide
Greenstone FAQ
Collection Size Limitations
Overview from the developer's point of view: Witten, I.H. and Bainbridge, D. (2007) "A retrospective look at Greenstone: Lessons from the first decade." Proc Joint Conference of Digital Libraries, Vancouver, Canada, pp. 147-156, June.
Architecture and DTD (See last 2 pages for DTD and internal document format): Witten, I., Bainbridge, D., Paynter, G., & Boddie, S. (2002). Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture." In Research and Advanced Technology for Digital Libraries (pp. 219-229).

Examples:

General
The Greenstone discussion list archive is a Greenstone collection.
Examples of Practical Digital Libraries: Collections Built Internationally Using Greenstone, Witten, Ian H., D-Lib Magazine, March 2003.

Notes:

Developed at University of Waikato, Hamilton, New Zealand in cooperation with UNESCO and the Human Info NGO in Belgium.
Provides GUI desktop applications for building and distributing digital library collections on the Internet and CD-ROM. Also has command line support. Runs in Windows and Mac OS X.
Great deal of effort was made to support easy installation and configuration.
Apparently strong support for multiple languages in the system (documentation, application interface, etc.).
Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections. Different from DLXS (and pretty much everything else) in that it provides a desktop application for building collections.
Supports searching across collections.

ContentDM (commercial): Summary, Features (same as Summary), Technical Details

Examples

Notes:

Owned by OCLC.
Same search engine as WorldCat.
Supports images, newspapers, EAD Finding Aids, audio, video and any other web format.
Support cross collection searching and cross server searching.
Option to include metadata in WorldCat for increased visibility.
Many licensing, hosting, and functionality options.
Strong as an access system, and better with metadata than full-text.

DSpace: Summary: Features: Technical Details

Examples

Notes:

Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
We use DSpace for the University of Michigan institutional repository, Deep Blue.
Primarily a repository system, but used in many ways (see examples).
Provision of functionality for end-user interaction with objects is a weakness.
Different from DLXS in that it is primarily a repository system.

Fedora Commons: Summary (General, Structure of DuraSpace), Features, Technical Details (General, Fedora Create Community)

Examples

Notes

Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
Fedora is first and foremost core repository functionality.
Fedora has a large development community (see Fedora Create Community) working on additional services, frameworks, content models, and more.
Different from DLXS in that it is primarily a repository system.
There is some energy currently around Islandora and Hydra as applications for managing and providing access to content in Fedora.
Can plug in different search systems: mySQL, SOLR, mulgara

Misc Systems

Veridian (commercial)
Cumulus digital asset management - mostly for images (commercial)
Luna Insight - specific for images (commercial)
JSTOR - all journals
ArtSTOR - all images
EPrints - institutional repository deposit, similar to DSpace
bepress - ditto
OJS - journals, but no customization
SPO has looked at Drupal, OJS, WordPress (the latter promising)
SimpleDL (commercial)
raven.scholarslab.org: Interesting example of how Solr and XSLT can be used to achieve the desired level of search granularity. XML is split in to different types of Solr documents as needed, and client XML/XSLT libraries are used to provide more granular search results on a per-page basis. From the TEI List.
Acumen (Deserves a closer look.)
Omeka (Deserves a closer look.)
Blacklight (Deserves a closer look.)

Log of updates to this posting

Added placeholders for Omeka and Acumen and Blacklight. (7/9/2010)
I originally wrote that XTF uses Lucene/Solr, but it uses Lucene, not Solr. Corrected above. (7/9/2010)

Brief Survey of Digital Library Software Systems

Tags: