Next in our series is metadata, the lifeblood of infrastructure. It is used by all of the systems, services and organizations covered in this infrastructure series, including digital preservation, discussed in our first post. Metadata is ubiquitous in and foundational to scholarly communications so before we get too far into other topics, we want to share an interview with Laura Paglione, Project Upholder of Metadata 2020, which advocates richer, connected and reusable, open metadata for all research outputs
If you’re unsure how critical metadata is to the scholarly record, where it fits into infrastructure issues or why you should care about it, please read on.
Interview with Laura Paglione
Interview by Jennifer Kemp
What does ‘infrastructure’ mean to you, in the context of research communications?
Infrastructure may be thought of as a series of networks – technical, social and organizational. Metadata is a key component of research infrastructure because it reflects the scholarly record, enables discovery of research results, and credits the individuals and organizations that made contributions to these items, by providing resources or knowledge. It enables connections between ideas and outputs.
How do you describe what you do/how scholarly metadata works to people unfamiliar with it?
Most people may think they are unfamiliar with metadata but have probably told someone about a book they read without reading from the text itself, mentioning information like author name, when it was published, by which publisher, etc. For those with some familiarity, metadata for books and journals may seem obvious, but it accompanies most scholarly outputs, including data that are collected as part of research, and associated tools and services.
Metadata is attached to research outputs by those who create them, such as authors; those who publish or promote them, such as publishers; and those who care for, store and/or make available these outputs, such as librarians. In the context of scholarly infrastructure, metadata is provided in standardized ways so that it can be shared and processed by and across a variety of systems, including, for example, general and specialized search engines that make these outputs findable. When the metadata of scholarly outputs is robust and distributed throughout scholarly communications, the outputs can be used to their fullest potential, contributing to research efforts to eradicate poverty, and cure significant diseases, to offer two ambitious examples.
It’s an oversimplification to say that machines read metadata the way humans read publications but it’s true enough to indicate that it’s all interconnected and to hint at its importance.
What is the one thing you wish ‘Silicon Valley’ would do or do differently to better support scholarly metadata?
Despite the importance of metadata to the process of understanding and using scholarly outputs, this information is often only provided on a limited basis, if at all, unless individuals or organizations have paid for it. In some ways, these charges are understandable. Curating and tagging outputs with metadata in a way that is consistent and useful can be an expensive endeavor and the value of that work is important to recognize. However, at times organizations see metadata as an important strategic or financial resource which can lead to walled-off access and the inability for connections between these siloed information stores. Metadata 2020 advocates for open source metadata that is supported by financial models that enable open use and reuse of metadata connected to scholarly outputs.
What is the one thing you wish non-technical people understood better about the challenges of scholarly metadata?
High-quality metadata for all scholarly outputs is a difficult goal – one that requires the efforts of many players, including you! By advocating for richer metadata, we help to ensure that organizations commit resources to its development, and pledge to keep metadata open for its many uses, such as organization of resources, enabling interoperability, and facilitating digital identification. Library catalogs and discovery systems are a familiar use case of metadata to many of us, but so are university research projects, AI tools and lots of startup scholarly services, to name a few examples. It’s not uncommon for these services to use multiple sources of metadata to ensure they have all that is available.
Certainly, it takes expertise to deliver accurate metadata, but everyone has a role to play and understanding its importance is a great first step.
How, if at all, does metadata differ when it comes to text vs data or journals, books, etc.?
Some metadata fields are similar for any type of output format, for example, the title or name of the output, the date of its creation or discovery, information about the creators and contributors, etc. But, other metadata fields are very much specific to the type of output. Data, for example, have different and more varied file types than text content.
There are a large number of metadata schemas that are used for different types of metadata, each with their own guidelines for what quality metadata includes. This is one of the challenges we’re working to support. As part of the Metadata 2020 project, we have cataloged several of these best practices, and have included them on our website.
What other areas of infrastructure do you work most closely with/are most dependent on (& how)?
Nearly every system, workflow and process within research and scholarly communications has some connection to metadata. Metadata is an essential component of cataloging, workflow management, and scholarly discovery, for a few examples. It is difficult to consider how these could function without some form of metadata.
Metadata often involves individuals that create the outputs, who are generally pressed for time and untrained in metadata, librarians who are often resource constrained, and organizations that use the metadata for their own purposes but may not be inclined to share the results of this expensive activity with others without a fee. Each area is dependent on the others in some ways and all have limitations that collaboration can help to mitigate.
Explain in some detail the issue you think is the most vexing/ interesting/ consequential/ etc.
Improving the quality of metadata for research is such a significant undertaking that it can be easy to feel overwhelmed and have the impression that no one individual’s actions can make an impact. The difficult task of quantifying our collective impact is essential for significant progress to be made in providing richer metadata.
In a perfect world, how would metadata infrastructure be funded and governed?
The library community can perhaps offer the most insight on this topic. In the United States the Library of Congress worked with library associations to develop standardized cataloging rules; however, the cost of applying those rules has been left to individual libraries. Broadly speaking, in a perfect world, metadata would consistently be considered a core component in the creation of every output. As a result, its creation would be actively funded by the party that is funding the creation of the output itself. The amount to be funded would be sufficient to ensure rich — and well-maintained — metadata, either by supporting metadata experts to classify and enrich information, or by supporting general metadata training and tools.
What are your favorite blogs, conferences, Twitter accounts, etc. to keep up on scholarly metadata?
There are so many good resources. We’ve collected a few but what is most interesting to anyone may depend in part on what role they play in metadata. Digital library practitioners may be interested in the Metadata Support Group. The PID Forum is useful for all things related to Persistent Identifiers (PIDs). A favorite way to stay up to date is conference sessions, which are often recorded to be shared beyond attendees and may offer multiple perspectives. A good practical approach is to follow the organizations involved in your work, whether it’s their newsletters or on Twitter. With a topic like metadata, it’s often nice to have an individual perspective so maybe try following personal staff accounts of relevant organizations. Sometimes the best resources aren’t metadata-specific and my bookmarks may look a lot like any other FORCE11 member’s. Finally, @metadata_haikus isn’t for keeping up to date but it may broaden your vocabulary a bit!
Favorite little-known fact or unsung hero?
Technically metadata is the biggest unsung hero despite its prevalence in scholarly communication. Metadata has already enabled us to make connections and discoveries that wouldn’t be possible otherwise. Things as seemingly basic as acknowledging funders in the research outputs they support and linking a journal article with the clinical trial that was the basis for the publication are greatly facilitated through standardized metadata. And as the amount of information that is processed throughout the world continues to increase, metadata help to make this processing manageable by helping people and systems better understand what information is relevant for different needs. Richer metadata can do even more.
What question do you wish we asked but didn’t and why?
What does it take to have richer scholarly metadata? We have provided some practical guidance to help answer this question.
Metadata 2020 advocates richer, connected and reusable, open metadata for all research outputs. To make this idea a reality we engage hundreds of volunteers worldwide who are passionate about metadata. These individuals have been articulating the challenges of achieving richer metadata both from the point of view of specific communities, as well as from a systemic point of view. Recognizing that the challenges do not rest with any single community or area, we are developing an advocacy campaign that illustrates the actions that each one of us can take to ensure richer metadata – Metadata that can help fuel the discoveries, connections, and capabilities that will help address some of the globe’s most significant and pervasive challenges. Because Connecting Research Matters.
More Information: Laura Paglione and METADATA 2020
Metadata 2020 (www.metadata2020.org/) is a collaboration that advocates richer, connected and reusable, open metadata for all research outputs, which will advance scholarly pursuits for the benefit of society. We aim to create awareness and resources for all who have a stake in creating and using scholarly metadata. We will demonstrate why richer metadata should be the scholarly community’s top priority, how we can all evaluate ourselves and improve, and what can be achieved when we work harder, and work together. Richer metadata fuels discovery and innovation. Connected metadata bridges the gaps between systems and communities. Reusable, open metadata eliminates duplication of effort. When we settle for inadequate metadata, none of this is possible and everyone suffers as a consequence.
In her role as Project Upholder, Laura Paglione coordinates the activities of the group of volunteers that work on the Metadata 2020 project. These answers given in this interview represent a compilation of the thoughts and insights from the many individuals that participate on the project.