Monthly Archives: August 2012

Notes from the 2nd Datacite Workshop

Tom Parsons and I attended the 2nd DataCite Workshop at the British Library Conference Centre on July 6th, which proved to be an excellent opportunity to compare notes with other institutions working on incorporating the DataCite metadata schema into their workflows.

Caroline Wilkinson has already written a report on the Workshop, and the slides from the Workshop are available. So rather than repeat that information, here are the notes I made on points raised during the day which seemed particularly relevant to our current work at the University of Nottingham – hopefully there will be something here that’s helpful to others as well.

DataCite Mandatory Metadata

  • Many metadata schemas exist; it’s advisable to choose or define one that meets your specific needs
  • “Title” should always be different from the article title: it’s the title of the dataset
  • When listing “Creators” (authors) in DataCite, it’s important to also define their roles and IDs
  • “PublicationYear” should be the date of public availability
  • “Publisher” should be the data center or archive making the data available.
  • “ResourceType” is currently being considered as a mandatory, rather than an optional field
  • Citation suggestion: Creator (Year): Title. Publisher. Identifier.

Subject-Specific Metadata

  • There are a large number of additional subject-specific metadata schemas in use
  • eg: Data Documentation Initiative – Standard for statistical and social science data (v 3.1 released in 2009)
  • Some datasets have huge numbers of contributors (eg genetics) where the list of contributors is itself a large dataset
  • For geospatial data, geographical extent is a crucial metadata item, which can be surfaced in landing pages as an embedded Google Map

Protocols and Standards

  • Bristol are providing serialisation using RDF/XML, and using SWORD as the repository deposit protocol
  • DC2AP – A DataCite Dublin Core Application Profile is in development
  • DataCite2RDF – Maps DataCite metadata to RDF
  • ISO 19101 – Deals with subsets of data
  • XForms – “XML format for the specification of a data processing model for XML data and user interface(s) for the XML data, such as web forms”
  • WAF – Web Accessible Folder

Useful Software

  • Bristol have used Apache Tika to extract metadata from data files
  • OrbianForms – XForms-compliant web form builder available in a free open source Community Edition
  • Ex Libris Rosetta – “highly scalable, secure, and easily managed digital preservation system”
  • Ex Libris Primo – “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects”

Miscellaneous

  • A “Schematron” validates content as well as conformance to XML schema

ADMIRe RDM survey at the University of Nottingham

The more we become embedded with all things research data management (RDM) at the University of Nottingham the less time we seem to have to update this blog with our ADMIRe JISCMRD activities. I know how beneficial I find all the JISCMRD blog postings, especially learning from some of the projects which are at a more advanced stage than ours, so hopefully this posting will provide you with some idea of the work we have been doing.

July was a really busy month, so this is the first in a planned series of updates of some of our key activities that the ADMIRe team have been focusing on recently.

Research Data Management Survey

As Tom outlined in his blog posting earlier this month our research data management survey (using the Bristol Online Survey tool) was launched and will be open until mid September.We currently have 196 responses from researchers across all faculties. UoN is a research-intensive university with more than 2500 career researchers (excluding PhD researchers).

Our survey is aimed at all UoN researchers (including PhD researchers) and we wanted to discover how data is used and managed across the University. Requirements gathering on RDM is a key activity for us, we aim to deliver a sustainable RDM service which will facilitate and embed good RDM practice at UoN.

We will publish the survey results (anonymised) once they have been analysed, sometime during the Autumn. SomeĀ  interim results are as follows:

  1. 85% of respondents are creating or working with documents (txt, pdf, Word etc)
  2. 32% back-up their data daily
  3. 59% do not record or document any metadata about their data
  4. 66% work on externally funded projects
  5. 26% developed a RDM plan for their project
  6. 92% had not received any RDM training
  7. 129/196 respondents wanted to receive training in developing a RDM plan
  8. 49.0% said their research data was confidential to their research group
  9. 30% said they were unsure whether they were required to make their data publicly discoverable and accessible after the project closed
  10. 40% said they would not deposit their data in a subject/discipline specific respository and 48% weren’t sure

Plenty of interesting responses thus far for us to mull over. Tomorrow I will provide an update on our work with DAF and sensitive data, our planned RDM website, and other training and RDM awareness training activities.