Following a data citation reference through the publishing process:
In my role as the Data Workflows Specialist at the University of Michigan - Library I review large datasets and code deposits. I also support various aspects of our research data repository Deep Blue Data, https://deepblue.lib.umich.edu/data, based on Samvera Hyrax. My colleagues and I have been making efforts to improve connections between our system and other systems to gather various metrics for our datasets.
In the Spring of 2021, a researcher I regularly work with informed me that he had included the citation to his dataset in the References section of the paper that he had just submitted to AGU JGR Planets, https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JE006875. I thought it was an excellent opportunity to follow one of our datasets through the process from a mention in the References section all the way through to the DataCite Data Metrics badge, https://support.datacite.org/docs/displaying-usage-and-citations-in-your-repository, in the Deep Blue Data repository indicating this dataset has been cited.
This is the rough process (Figure 1):
Figure 1 Process a data citation follows
Citation for the dataset, https://doi.org/10.7302/zck4-0058, as displayed in Deep Blue Data (Figure 2):
Figure 2 Data citation for dataset 10.7302/zck4-0058 as it appears in Deep Blue Data
Once the article has been officially published, the citation is fully marked up and hyperlinked in the AGU article references, https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JE006875, the DOI resolves back to Deep Blue Data deposit (Figure 3):
Figure 3 Citation in AGU article https://doi.org/10.1029/2021JE006875 References section to dataset in Deep Blue Data
Interestingly, the Google Scholar link (circled in Figure 3), https://scholar.google.com/scholar?hl=en&q=Arbic%2C+B.+K.%2C+%26+Schindelegger%2C+M.+%282021%29.+Long%E2%80%90term+Earth%E2%80%90Moon+evolution+with+high%E2%80%90level+orbit+and+ocean+tide+models+%5BDataset%5D.+University+of+Michigan%E2%80%93Deep+Blue+Data.+https%3A%2F%2Fdoi.org%2F10.7302%2FZCK4-0058, resolves to the AGU article itself in a circular manner, NOT to the dataset itself (Figure 4):
Figure 4 Google Scholar links to article in AGU not the dataset itself
Rather than pointing to Google Scholar, AGU could point to DataCite Commons, https://commons.datacite.org/doi.org/10.7302/zck4-0058, (they have links to CrossRef for other citations) or even Google Dataset Search, https://datasetsearch.research.google.com/search?query=%22Long-term%20Earth-Moon%20evolution%20with%20high-level%20orbit%20and%20ocean%20tide%20models%22&docid=L2cvMTFwdmgyMjEyZA%3D%3D, (Figure 5).
Figure 5 Screen shot of Google Dataset Search results for dataset title "Long-term Earth-Moon evolution with high-level orbit and ocean tide models"
The publisher, AGU/Wiley, makes the article metadata available to Crossref as XML displayed in Figure 6 in a somewhat more readable format via their API, https://api.crossref.org/v1/works/10.1029/2021JE006875. Citation #7 in Figure 6 below is how the citation looked prior to the Nov 2021 “fix” from AGU/Wiley. (See upcoming article from Shelley Stall and others from Force 11). Citation #8 is here for reference to show how a regular article is displayed. Note the DOI is called out on #8 (Figure 6):
Figure 6 Dataset citation from AGU/Wiley - pre Nov 2021 "fix"
Here is the vendor XML feed from the publisher, https://api.crossref.org/works/10.1029/2021JE006875/transform/application/vnd.crossref.unixsd+xml (Figure 7):
Figure 7 XML feed from publisher, AGU/Wiley, pre-Nov 2021 "fix"
After the fix from AGU/Wiley, the dataset reference in Crossref in December 2021 is listed as an “unstructured citation” that includes the DOI (Figure 8):
Figure 8 Dataset citation in Crossref after the AGU/Wiley fix shows full text of citation
Here is the official XML feed from the publisher to Crossref after the AGU/Wiley fix (Figure 9):
Figure 9 XML of dataset citation after AGU/Wiley fix labels full text of citation as "unstructured citation"
Unfortunately, the AGU/Wiley fix does not help DataCite to see this citation mention in the references as a citation to the dataset in DataCite Commons JSON (Figure 10):
Figure 11 shows how the citation appears in the DataCite Commons, https://commons.datacite.org/doi.org/10.7302/zck4-0058, frontend:
Figure 11 DataCite Commons frontend with 0 citations indicated
Figure 12 shows how the citation appears in the DataCite Search, https://search.datacite.org/works/10.7302/zck4-0058, frontend:
Figure 12 DataCite Search with 0 citations indicated
Finally, Figure 13 displays the backend of the DataCite Data Metrics badge, https://support.datacite.org/docs/displaying-usage-and-citations-in-your-repository (HTML underlying the graphic):
Figure 13 HTML for the DataCite Data Metrics badge for DOI 10.7302/zck4-0058
The DataCite Data Metrics badge displays no indication of citations (Figure 14):
Figure 14 DataCite Data Metrics badge frontend with no indication of citations
I also tracked a dataset DOI, https://doi.org/10.7302/pa6y-fb55, mentioned in a “Data Availability” statement, https://www.nature.com/articles/s41467-021-27827-y#data-availability, but not in the “References” section in an article in Nature (Figure 15):
Figure 15 Data availability statement in Nature article 10.1038/s41467-021-27827-y indicating dataset DOI https://doi.org/10.7302/pa6y-fb55
Unfortunately, there is no indication of the existence of a data availability statement in the metadata shared with Crossref, https://api.crossref.org/v1/works/10.1038/s41467-021-27827-y.
After this research, I’m not sure where or how our researchers are supposed to cite their datasets for them to be counted by the systems that “count.” Any advice on how this should be done would be greatly appreciated. I would also be happy to discuss any of this further or do testing!
For more citation fun: https://apps.lib.umich.edu/blogs/bits-and-pieces/contributing-citation-datacite-iscitedby