Why Summary-Level Analytics Aren’t Always Sufficient

Imagine if a library had just one metric to evaluate its services: the number of people who walk in the front door the building. While this basic piece of information is helpful, it does not really answer questions such as: Of those who walked in the front door; how many users asked a question at the reference desk, or used one or more books, journals, newspapers, or other materials? How long did they stay in the library? How many asked an actual reference question (as opposed to just asking for building directions)? How many used a library resource? There are many more such questions a library might want to answer. This sort of high-level “gate count” approach omits a wealth of valuable and actionable information about sequences of events:

Did the patron get a book or two, and then ask a reference question?
Did the patron use any of the resources the library worker at the reference desk suggested, or use others?

These kinds of questions cannot be answered easily, even if the “gate count” concept is extended from the front door to subsequent “interior gates” such as the number of people who asked a reference question (simple or complex), the number of people who entered the non-fiction area on the third floor, and the number of people sitting at tables in the “quiet study” area. Counts alone don’t tell the complete story of service, resource, or space use.

Unfortunately, the tools libraries most often use to build their understanding of online behavior, tools like Google Analytics or Apache web-log analyzers, are built to provide counts, even if they are sequential counts (of people who started here, 53% of them went there, and of those 5% clicked this link to perform some action). These tools do not say much about what kind of user the person is (e.g., student, community member, faculty) or how their online activities mesh with their physical activities, or if today’s usage is a continuation of yesterday’s, or something new. We all recognize that Google, and other providers of such tools, can reap the benefit of knowing vast amounts about each user and their idiosyncratic online behavior. This is the real reason Google Analytics exists. The free (and even the premium) analysis tools made available to consumers only skim the surface of the deep data pools Google has available for itself.

In these high-level count-gathering methods, libraries are dramatically limited in the kinds of questions we can sensibly ask of our data. Wouldn’t it be more reasonable to start with solid research questions, and build backwards from there to gather the information needed to provide meaningful answers? The U-M Library collects and analyzes data to make service improvements, enhance the tools we build and buy, to create physical spaces appropriate to the activities that happen in them, and so on. The high-level data we can obtain through gate-counting measures are simply not up to the task we have set out for ourselves. At the U-M Library, we are creating tools to meet these intensive data needs, both for purely within-the-library internal use and for our participation in the IMLS-funded Library Learning Analytics Project underway in partnership with the University of Michigan’s Institute for Social Research (ISR). To answer higher-level research questions, we need to apply analytics-gathering methodologies specific to the questions, and doing so in ways that are consistent with the library’s privacy statement.

Building such a purpose-driven analytics-gathering system is challenging, and we are just getting started on the data flows to capture interactions that we want from web interfaces and store them in ISR’s data enclave for eventual analysis by authorized researchers. In addition to the still-important “gate count” equivalents, we are planning to add “instrumentation” to specific interfaces so that we can capture the details of user interactions that will help us answer specific questions about how our own systems are used and what difference that use makes. For example, in our Library Search discovery interface, knowing not just how users change search categories, apply filters, and access physical and online resources, but how those interaction and usage patterns compare across fields of study or role at the University, and -- importantly -- connect with academic success are questions we are hoping to answer. Confidentiality of user data is of course a prime concern. Making sure that we can collect meaningful data, while respecting the confidentiality of individual users, is foundational to how we will collect, store, and manage access to the data. In this regard, ISR is an excellent partner because they have been storing and managing research access to sensitive and private survey data for decades, and have tremendous expertise in this area.

Moving beyond “gate counts” in our physical spaces and particularly our virtual environments will deeply enrich our understanding of user behavior.

Why Summary-Level Analytics Aren’t Always Sufficient

Tags: