Experimenting with Generative Artificial Intelligence on Library Blogs

The Silicon Valley hype machine has been at it again and in full gear for several years now touting the miracles and revelations surrounding its latest culminating invention: Generative Artificial Intelligence (AI). With the publicity surrounding AI reaching a sustained and deafening crescendo, along with the availability of a plethora of new AI tools and technologies, I decided it was probably well past time to experiment with AI and see if I could find a legitimate use case for it on the Library Blogs site.

I performed assessments on four separate, and mostly related, types of AI technologies to utilize AI and bring it into the Library Blogs sphere. The initial goal of the experiment was to be able to ask an AI chatbot if there were any blog posts by a particular author on the site and receive an appropriate and accurate answer using natural humanlike language. While three of the experiments implemented an AI chatbot to augment the search features at the site with varying degrees of success as was the initial intention, I further conducted a fourth experiment that offered a more whimsical overview of the content at the site.

Although these experiments were all generally feasible to implement in production, the intention to find a justifiable use for AI on a medium-sized site like Library Blogs proved to be somewhat debatable. The overhead of running even a minimal AI implementation, both in the cloud as well as self hosted, includes ongoing maintenance of several systems, high costs of processing and storage, not to mention increased electricity and water usage that occurs when using AI. Estimates indicate that a single Chat GPT query takes from five to as much as fifteen times as much energy as a traditional Google algorithmic search and these enormous resource requirements cause significant environmental degradation that often falls primarily, but hardly exclusively, on vulnerable communities. Many are therefore arguing against using AI for non-essential tasks, as that may be the more ethical choice. Given these concerns and implications, it was ultimately agreed among site managers that the investment of adding AI functionality failed to warrant the negligible value that it added to the Library Blogs site.

The site base - Drupal

Library Blogs is a relatively small site in that it contains slightly over 3500 blog posts within 13 separate blogs on various topics. As the site runs on Drupal, and the Drupal project has added numerous AI features and implementations to its stockpile of modules, I started with an experiment to implement AI utilizing existing Drupal based technologies. Drupal has nearly two dozen possible AI provider modules, so choosing which ones to experiment with came down to which providers aligned with our values to utilize open source software whenever possible.

Additionally, part of the experiment was to run the AI without incurring any monetary cost as there was no budget for these experiments and likewise, part of our mission as developers in an academic library is to provide services and solutions at low-to-no cost for our patrons both inside and outside the university. Using these criteria, I selected two AI providers with Drupal integrations to start with before turning to what are currently no-cost solutions provided by the university.

Experiment 1 - Hugging Face

The first Drupal AI provider module I implemented was Hugging Face as I had seen a talk on open source AI at Drupalcon Portland where Hugging Face was introduced as an alternative to proprietary, closed-source software like OpenAI. After setting up a staging instance of the Library Blogs site with the Hugging Face provider module, I signed up for a Hugging Face account using my umich email address. Since I utilized that email address, I was given the option of joining the institutional U-M account. However, as part of my experiment to implement the AI using tools that any layperson could access without significant or any cost, and as it was also unclear what costs my department might incur by connecting to this institutional account, I elected to sign up for a personal account instead.

Unfortunately, this decision severely limited what I could run using my account on Hugging Face. I was able to connect Hugging Face to Drupal, however I was unable to do much with the chatbot other than to get it loaded. Although all the Large Language Models (LLMs) on Hugging Face are open source and freely available, what you can run on their servers is quite limited on a free account and I was only able to try some minimal LLMs that did not produce desirable results for the chatbot. Although this experiment was mostly unsuccessful in that I was unable to ask the chatbot about content on Library Blogs, I did feel reasonably confident I could return to this experiment after joining the U-M institutional account depending on what exactly the terms were. So I decided to work with an AI provider I could set up on my own.

Experiment 2 - Ollama

Since the limitations and potentially prohibitive costs of utilizing an outside service for the AI became clear, it seemed that we might be better served running our own provider on servers we have full access to. So I followed a suggestion from a colleague and took a look at Ollama which would allow me to have full control over the provider with a piece of software that had an open source license and was not owned by a major corporation, which aligns with the majority of our other applications and values.

Ollama had a Docker image available and so I decided to set it up locally with the Library Blogs site on an underpowered HP i3 laptop without a dedicated GPU knowing that if I could run it there, I could run it anywhere at a later time, but almost certainly much faster with increased processing power.

Experiment 2a - Llama3 LLM

Once I had docker containers set up for Drupal and Ollama, I imported the popular Llama3 LLM into Ollama and connected it to the site through the Drupal provider module. Oddly, despite the name, Llama3 is an open source model created not by Ollama but by Meta.

After adding the Drupal AI module's integrated DeepChat Chatbot block to the site I typed “hi” and the chatbot told me what actions I could take as JSON. As my intention was to supplement search with a friendly humanoid chatbot that would help end users find content, I then asked why it was giving me an answer in JSON and it switched to the human-mimicked format I intended.

user says hi to chatbot. chatbot answers with json. user asks why json and it switches to natural language.

As this was the behavior I expected from the chatbot, and as I was unfamiliar with how the AI used the LLM to learn about the site content, I then decided to ask it how I would get the AI to learn about the site content and incorporate it into its answers.

asking a chatbot how to index the site so users can ask it questions. Chatbot answers that it can do it and asks for permission.

I suspected the chatbot's suggestion that it could already index the content was far too easy, but I allowed it to do the indexing and see what the results were. It responded that it had asked the RAG (Retrieval Augmented Generation) database, which I had not set up, to create an index and gave me fictitious answers.

chatbot claiming to be giving real results that are in fact fictitious.

Experiment 2b - Milvus VDB

Since I either needed to add vectors to the MariaDB database or add a new separate vector database to implement RAG, I decided to research which would be more performant as well as easiest to implement. Most of the information I found suggested that using vectors in a SQL-based database was less efficient but improving compared to a dedicated Vector Database (VDB), and I also did not want to alter the current blogs database or store AI-related information with the existing content. So I elected to try a dedicated vector database, and although our department has much more experience with Apache Solr, which offers vector search, it appeared that Milvus was the more supported and widely utilized open source option integration for Drupal. Milvus is also considered a dedicated vector database, so I added it to my Docker stack.

Once Milvus was added, I indexed some content into it using Drupal's Search API module and enabled RAG which seemed to have worked. And initially at least, the Llama3 AI chatbot also seemed to know about the content. I asked if someone named "eliot" had written a blog post and it claimed to have found at least one. As I only had one blog post on the site at the time, I was encouraged by this response and hoped that it had indeed indexed the site to give accurate responses, so I took its suggestion and asked for a link.

user asking a chatbot about content on blogs.lib and getting what appears to be a legitimate and truthful response

After waiting fifteen minutes for the response, a regular occurrence on my underpowered laptop, the chatbot gave me an answer. Unfortunately, the answer was for a post I knew I didn't write and confirmed did not exist.

chatbot giving links to non-existent content

I wanted to confirm that this was indeed an error as the chatbot indicated, so I then decided to ask the bot about my name again.

chatbot giving names of non-existent authors of blog posts

The bot gave me a list of names of non-existent users including an author with the username "eliota". As there was no one on the site with that username, I asked the bot if that person existed and again the bot admitted its error.

chatbot admitting it is giving non existent information again

So I decided to ask about the specific post I had written, “Software contributions reduce our debt”, with the username included.

user asking about a specific blog post and getting a legitimate answer from the chatbot

This gave more promising results as it correctly identified the URL of the post. I followed up and asked which blog the post was in and it correctly answered that it was in the Library Tech Talk blog.

chatbot seemingly giving correct blog that the blog post is in

Unfortunately, it made up my name based on the username I had already given it rather than finding my display name, incorrectly referring to me as “Eliot WSC”. So I asked about the name of the user at the URL and it incorrectly identified me again which suggested that the indexing on user fields at least had not worked accurately. When I gave it the correct last name, it agreed with me.

chatbot agreeing with whatever the user tells it suggesting it has not indexed the content

After several more queries, each taking around 15 minutes to respond, it became apparent that the Llama3 LLM was simply making up answers as it went along based on data I had given it in the chat, or from URLs that it may have indexed, and that it could not be trusted to provide honest answers to user queries. After further investigation, I also found that the RAG could not be implemented with the Llama3 model as the model did not support tools.

Error invoking model response: registry.ollama.ai/library/llama3:latest does not support tools

Although the lack of tools likely contributed to the misleading or falsified answers it so confidently gave me, it was still disturbing that the model would present incorrect information as fact when it clearly was not retrieving the information. Either way, it was time to try something other than Meta's Llama3, which has suffered from an exodus of talent to companies in France and elsewhere recently.

Experiment 2c - Qwen3 LLM

I needed to import a new LLM to Ollama that supported tools to implement RAG and to run a test on the Milvus VDB index. As hard drive space was quickly becoming a concern, I had to look for an LLM that both supported tools and was a reasonable size. I also wanted to find a model that was more truthful with its responses. I found the Qwen3 model which supported tools and was 5.2GB, roughly the same size as Llama3 at 4.7GB. The Qwen3 model is also open source, created by the China based tech company Alibaba Cloud.

After adding Qwen3, I discovered a pleasant surprise. The Qwen3 LLM was so much more transparent than Llama3 in that it revealed its entire “thought” process immediately after I said “hi”.

chatbot using qwen3 llm and showing its "thought" process

In the image you can see that not only does the bot debate how it should respond when I say “hi”, it eventually adds a smiley face emoji to its response, which was a nice touch. I hoped that the Qwen model demonstrating its logic openly would be less prone to confidently assert information that simply was not true as Llama3 did.

Unfortunately for the experiment, the Drupal AI DeepChat Chatbot I had been using for the experiment was timing out once RAG was enabled. Although disappointing for an accurate comparison, after switching to the Drupal AI Generic Chatbot, and changing my query to an author name I hadn't yet tried to make sure I got real and newly indexed data, I finally achieved full success and the bot gave accurate results.

new chatbot giving accurate results with qwen3 and rag

Despite the transparency and the utility of the Qwen3 model, it took well over two hours to index only 100 out of over three thousand blog posts and took up an enormous amount of space on my hard drive. With the test successful, I now knew that we could indeed set up an open source stack of an existing Drupal site running a MariaDB database for the content, along with a new Ollama provider running a Qwen3 LLM as well as a new Milvus VDB to create an AI chatbot that performed a similar function as our basic Drupal keyword search.

However this new functionality would come at a significant financial and environmental cost, as we would need to significantly scale up both our storage and our processing power to essentially recreate a humanoid search. It is also noteworthy that this humanlike search companion would generate exponential technical debt by increasing the number of systems running threefold, as well as adding bloat to an already hefty Drupal application that runs the site. Additionally, despite better results in this experiment, there may also be concerns and possible legal repercussions in using a non-US based open source LLM moving forward as geopolitical tensions escalate.

Experiment 3 - Maizey

Since running our own AI provider and vector database seemed to be an extreme solution to a nonexistent problem, the next AI tools I decided to investigate for a chatbot were through U-M ITS. Although U-M ITS offers U-M GPT, a standalone chatbot that connects to Microsoft's Azure, as well as the U-M GPT Toolkit which connects to Azure through an API, neither of these are open source, the standalone GPT is not publicly accessible and does not allow custom datasets, and the toolkit would incur financial costs, violating several of the experiment's initial criteria. So I decided to look at U-M ITS's more customizable version of U-M GPT - Maizey.

Unfortunately, Maizey is not an open source piece of software either, and it is unclear what the underlying provider architecture is, however it does utilize OpenAI’s OpenGPT4-Turbo LLM which suggests yet another connection to OpenAI and Microsoft. The model is at least currently available free to Microsoft's CoPilot users. Maizey likewise conflicted with more of my criteria for these experiments as it is not available to the general public, but it is more customizable than U-M GPT allowing custom datasets. And it is currently provided to U-M employees and students free of charge, though I suspect there are significant financial and environmental costs for U-M’s main IT department to run it, thus necessitating its addition to this experiment.

The implementation of Maizey was quite straightforward and simple. After creating an account I went ahead and created a project and then added all the URLs within the Library Tech Talk blog by leveraging a Drupal view to spit them out to text. They were added to the project index in a few minutes and I was off and running through what they indicate is a “Public URL”, which I later discovered was not really "public", even to U-M staff, since I created a private project.

Regardless, I knew I would be able to easily recreate the project as a public one, which would hopefully make the public URL public to U-M affiliated users at the very least, so I then tested the site by asking about an author who I knew had several blog posts and it answered correctly and helpfully returned the URLs for several.

Maizey giving correct responses about blogs.lib quickly

The next step was to add the public URL for my Maizey project to the Library Blogs site. I first tried embedding it as an iFrame in a custom Drupal block, however, as with many iFrames these days, I immediately ran into a dreaded error.

The loading of "https://umgpt.umich.edu/maizey/staging-blogs-lib" in a frame is denied by "X-Frame-Options" directive set to "sameorigin".

So unless I requested that U-M ITS add the Library Blogs URL to their content security policy, it made an iFrame integrated on the site a non-starter. So I elected to do what I had seen on some airline and entertainment platforms and create a popup window for the chat which resulted in the following interface where the window for the Maizey bot was over the site on the right side of the screen as is the usual convention I had encountered at other sites.

Maizey chatbot opened in a new window from blogs.lib

Unfortunately, this implementation was far from ideal as the links provided by the Maizey chatbot when clicked opened a new tab in the parent window, creating a less than seamless user experience.

Maizey does provide access with an API, and although I was somewhat curious if Drupal's OpenAI or Azure provider modules would integrate with it confirming my suspicion that it might be OpenAI or Azure with a U-M wrapper, it failed to comply with yet another of my initial criteria for the experiment in that the API feature is only available should my department, or I, choose to pay for it.

Maizey API access financial responsibility

Likewise, it is also conceivable that even using Maizey without the API may incur costs in the future. As you'll see in image below of the project total cost, it's nominal, but I had only used the site for a single query, which generated 289 completion tokens, while indexing roughly 30 URLs that generated over 10,500 indexing tokens.

Maizey potential costs for use by one person

Maizey is currently only offered for free to U-M affiliated people until August 31, 2025, however that date has been modified several times for unclear reasons as they were supposed to begin charging for it over a year ago. Although it is possible the free deadline may be extended again, it is by no means guaranteed.

Experiment 4 - NotebookLM

Finally, it was suggested that I try using Google’s Notebook LM to generate an AI created podcast based on the content. Although I personally found this rather silly, I also acknowledge that I thought both Twitter and Bitcoin were ridiculous when they were released. Unfortunately for my wallet and my sense of sanity, these technologies were both popular (at least for a time), and quite lucrative.

Although NotebookLM also violated my initial open source software criteria, it was provided free via U-M’s Google account and was therefore no cost to me to test on a site I manage for the university. Regardless, doing a podcast is a one-time AI process that does not require user interaction and so even if the processing power is higher, once it's done, it's done.

Setting up NotebookLM was, like Maizey, a breeze. Like Maizey it did not crawl the site so each URL to be indexed had to be added individually. Unlike Maizey however, NotebookLM allowed only one URL at a time to index in the form. Fortunately, someone had created a free chrome extension to harvest URLs from a site and put them into NotebookLM to index. Once I did the harvesting, NotebookLM generated this podcast in .wav format based on randomly-sampled pages from the site.

After sharing the audio with my department, the positive responses ranged from noting that the podcasters cracked a joke about one of the blog posts, to wishing that the voices were Snoop Dogg and Gwyneth Paltrow like Speechify. However most observations were decidedly negative describing the “breathing noises” as “disturbing”, “creepy”, “eerie”, and just flat out “noooo” which included the following meme from The Office. Significantly, to find blog posts mentioned in the podcast they used the traditional search feature, thus negating the need for an AI chatbot to augment search.

These responses suggested that though I may have been wrong about the potential of Bitcoin and Twitter, my instincts on the silliness of an AI-generated podcast may not be completely unfounded, at least among my department colleagues.

Conclusions

Ultimately, this was largely a frivolous exercise to engage with AI and find a use case for it on a web application I deemed to be the most promising prospect for AI that I manage and that I suspected would be relatively feasible to implement. It is certainly possible to employ any of these experiments, as all four implementations of AI were technically achievable, with varying degrees of possible costs associated and technical debt incurred.

The hype surrounding AI is enormous, but so are its technical overhead, financial costs and environmental implications. Yet the pull, as well as the pressure, for tech workers to engage with it is both irresistible and compelling even when, as in this case, it implements largely fanciful features and solutions in search of a problem.

Although there is little doubt that utilizing AI on large datasets has the potential to be extremely valuable for both large academic and corporate institutions, as sifting through the vast amount of content contained within such organizations requires significant effort, the improvement gained by adding AI to a small to medium size site such as Library Blogs is far less clear and may ultimately be negligible as there is not enough content to justify the significant hardware and cost overhead required to run even a relatively small LLM on an open source provider.

Similarly, this amount of content may better be navigated in more traditional fashions using keyword search and user browsing, rather than implementing a feature that mimics human behaviors to provide similar results. Content managers and user experience specialists agreed that the value added to the overall experience of site visitors was dubious at best and we determined that the benefits of adding these AI experiments to the site were minimal as evidenced by the utilization of site search to locate a blog post mentioned in the AI podcast in lieu of a public AI Chatbot. We therefore placed a current hold on adding any of these experiments as they merely represented anthropomorphic embodiments of features we already employ at the site.