Tom Parsons and I attended the 2nd DataCite Workshop at the British Library Conference Centre on July 6th, which proved to be an excellent opportunity to compare notes with other institutions working on incorporating the DataCite metadata schema into their workflows.
Caroline Wilkinson has already written a report on the Workshop, and the slides from the Workshop are available. So rather than repeat that information, here are the notes I made on points raised during the day which seemed particularly relevant to our current work at the University of Nottingham – hopefully there will be something here that’s helpful to others as well.
DataCite Mandatory Metadata
- Many metadata schemas exist; it’s advisable to choose or define one that meets your specific needs
- “Title” should always be different from the article title: it’s the title of the dataset
- When listing “Creators” (authors) in DataCite, it’s important to also define their roles and IDs
- “PublicationYear” should be the date of public availability
- “Publisher” should be the data center or archive making the data available.
- “ResourceType” is currently being considered as a mandatory, rather than an optional field
- Citation suggestion: Creator (Year): Title. Publisher. Identifier.
Subject-Specific Metadata
- There are a large number of additional subject-specific metadata schemas in use
- eg: Data Documentation Initiative – Standard for statistical and social science data (v 3.1 released in 2009)
- Some datasets have huge numbers of contributors (eg genetics) where the list of contributors is itself a large dataset
- For geospatial data, geographical extent is a crucial metadata item, which can be surfaced in landing pages as an embedded Google Map
Protocols and Standards
- Bristol are providing serialisation using RDF/XML, and using SWORD as the repository deposit protocol
- DC2AP – A DataCite Dublin Core Application Profile is in development
- DataCite2RDF – Maps DataCite metadata to RDF
- OAI-PMH – Open Access Initiative Protocol for Metadata Harvesting
- OGC-CSW – Open Geospatial Consortium Catalog Services for the Web
- ISO 19101 – Deals with subsets of data
- XForms – “XML format for the specification of a data processing model for XML data and user interface(s) for the XML data, such as web forms”
- WAF – Web Accessible Folder
Useful Software
- Bristol have used Apache Tika to extract metadata from data files
- OrbianForms – XForms-compliant web form builder available in a free open source Community Edition
- Ex Libris Rosetta – “highly scalable, secure, and easily managed digital preservation system”
- Ex Libris Primo – “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects”
Miscellaneous
- A “Schematron” validates content as well as conformance to XML schema
- See the Dryad Wiki re: Versioning of Datasets
- David Shotton’s blog: opencitations.wordpress.com