Documenting a Mixed Language: Interview with Dr. Wilkinson Daniel Wong Gonzales

International Love Data Week 2023 is February 13-17, 2023. The theme this year is Data: Agent of Change. We believe that the scholars whose data are hosted in Deep Blue Data are truly agents of change for their fields and their communities. Their data supports groundbreaking research in climate change, urban sustainability, behavioral science, policy change, and more.

Deep Blue Data is a repository offered by the University of Michigan Library that provides access and preservation services for digital research data that were developed or used in the support of research activities at U-M.

In honor of Love Data Week, we reached out to some recent Deep Blue Data depositors to ask about the history of their work, unique discoveries they made along the way, and how they see their data being useful to their research communities and beyond. 

We hope you enjoy learning more about the scholars behind the data sets. As a reminder, all data sets in Deep Blue Data are openly accessible for anyone to download and use, because we love data.


Dr. Wilkinson Daniel Wong Gonzales graduated from the University of Michigan in 2022 with a PhD in Linguistics. His dissertation is entitled, "Truly a Language of Our Own” A Corpus-Based, Experimental, and Variationist Account of Lánnang-uè in Manila." He deposited the data set "The Lannang Corpus (LanCorp): A POS-tagged, sociolinguistic corpus containing recordings and transcriptions of Lannang speech collected from the metropolitan Manila Lannangs between 2016 and 2020," which underlies his dissertation research, in Deep Blue Data. In this interview, he describes his research and why he decided to share his data set publicly. 

What prompted you to conduct your research in this area?

As a member of the Lannang community, I wanted to give back by conducting research on our unique and complex language which has not been given much attention. I saw an opportunity to create valuable resources for the community by documenting the language, as there were no resources available. My goal was to contribute to the preservation of our cultural heritage.

For those not familiar with your field, what is the one thing you think is most important, interesting, or unique about your work or your findings?

My work focuses on an undocumented mixed language, and how its features interact with other languages in its linguistic ecology, such as Tagalog, English, and Chinese (Hokkien). I used machine learning methods to determine part-of-speech tagging for this low-research language, which is an important step in understanding and preserving it.

Dr. Wilkinson Daniel Wong Gonzales with two Lannang speakers in Manila
Dr. Wilkinson Daniel Wong Gonzales with two Lannang speakers in Manila

What impact do you hope making your data public will have in the world? How are you hoping it might be encountered, reused, or built upon? 

By making my data public, I hope to allow people to get a snapshot of the language and culture of the Lannangs. It may be used to train language models such as the ones found in Alexa or Siri, and also to create a dictionary of the language which will help in the preservation and promotion of the language. I hope that it will encourage others to build upon the work and contribute to the continued research and understanding of the language.

What is one thing you learned during the process of preparing your data for deposit or sharing? 

I learned that it pays to be organized and that resources such as Deep Blue Repository are available to aid in the process.

Why do you think sharing data is important?

Sharing data is important because it allows for greater collaboration and transparency in research. By making data publicly available, other researchers can build upon previous work, leading to new discoveries and a deeper understanding of the subject at hand. Additionally, sharing data can help to ensure that research is reproducible and can increase the transparency and trust in the scientific process.