Category Archives: Workflows

Notes from the 2nd Datacite Workshop

Tom Parsons and I attended the 2nd DataCite Workshop at the British Library Conference Centre on July 6th, which proved to be an excellent opportunity to compare notes with other institutions working on incorporating the DataCite metadata schema into their workflows.

Caroline Wilkinson has already written a report on the Workshop, and the slides from the Workshop are available. So rather than repeat that information, here are the notes I made on points raised during the day which seemed particularly relevant to our current work at the University of Nottingham – hopefully there will be something here that’s helpful to others as well.

DataCite Mandatory Metadata

  • Many metadata schemas exist; it’s advisable to choose or define one that meets your specific needs
  • “Title” should always be different from the article title: it’s the title of the dataset
  • When listing “Creators” (authors) in DataCite, it’s important to also define their roles and IDs
  • “PublicationYear” should be the date of public availability
  • “Publisher” should be the data center or archive making the data available.
  • “ResourceType” is currently being considered as a mandatory, rather than an optional field
  • Citation suggestion: Creator (Year): Title. Publisher. Identifier.

Subject-Specific Metadata

  • There are a large number of additional subject-specific metadata schemas in use
  • eg: Data Documentation Initiative – Standard for statistical and social science data (v 3.1 released in 2009)
  • Some datasets have huge numbers of contributors (eg genetics) where the list of contributors is itself a large dataset
  • For geospatial data, geographical extent is a crucial metadata item, which can be surfaced in landing pages as an embedded Google Map

Protocols and Standards

  • Bristol are providing serialisation using RDF/XML, and using SWORD as the repository deposit protocol
  • DC2AP – A DataCite Dublin Core Application Profile is in development
  • DataCite2RDF – Maps DataCite metadata to RDF
  • ISO 19101 – Deals with subsets of data
  • XForms – “XML format for the specification of a data processing model for XML data and user interface(s) for the XML data, such as web forms”
  • WAF – Web Accessible Folder

Useful Software

  • Bristol have used Apache Tika to extract metadata from data files
  • OrbianForms – XForms-compliant web form builder available in a free open source Community Edition
  • Ex Libris Rosetta – “highly scalable, secure, and easily managed digital preservation system”
  • Ex Libris Primo – “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects”

Miscellaneous

  • A “Schematron” validates content as well as conformance to XML schema

RLUK RDM discussion day, 16.4.12

Here are some interesting references picked up at the above event yesterday.

Work in the USA towards data archiving

An interesting post on the JISC Repositories list on 21st March highlights a survey being undertaken by the University of Michigan looking at the relationship between data archives and institutional repositories. Their 10 minutes survey is available here for any that wish to contribute.

The poster also mentions a draft 12 page guide (available on Scribd, but easier to read as a Google doc) which is interesting and useful about building links between social science data archives and institutional repositories  ” . . . that provides guidelines and decision rules for institutional repositories at each stage of the archiving process: from appraisal to acquisition to curation to dissemination. ”

Bill

ADMIRe Benefits

We have been thinking, along with other projects, of the possible benefits of the ADMIRe project and the larger framework of RDM development within the University, in which ADMIRe fits.

Many of the benefits are qualitative in nature, although we do expect solid returns in terms of research exposure, management and re-use.

Our current thinking – very much in draft form for now – can be found here.

We would welcome comments and reflections from others that are going down similar paths in identifying institutional and other benefits for their RDM programmes.

Bill