Tom Parsons and I attended the 2nd DataCite Workshop at the British Library Conference Centre on July 6th, which proved to be an excellent opportunity to compare notes with other institutions working on incorporating the DataCite metadata schema into their workflows.
Caroline Wilkinson has already written a report on the Workshop, and the slides from the Workshop are available. So rather than repeat that information, here are the notes I made on points raised during the day which seemed particularly relevant to our current work at the University of Nottingham – hopefully there will be something here that’s helpful to others as well.
DataCite Mandatory Metadata
- Many metadata schemas exist; it’s advisable to choose or define one that meets your specific needs
- “Title” should always be different from the article title: it’s the title of the dataset
- When listing “Creators” (authors) in DataCite, it’s important to also define their roles and IDs
- “PublicationYear” should be the date of public availability
- “Publisher” should be the data center or archive making the data available.
- “ResourceType” is currently being considered as a mandatory, rather than an optional field
- Citation suggestion: Creator (Year): Title. Publisher. Identifier.
- There are a large number of additional subject-specific metadata schemas in use
- eg: Data Documentation Initiative – Standard for statistical and social science data (v 3.1 released in 2009)
- Some datasets have huge numbers of contributors (eg genetics) where the list of contributors is itself a large dataset
- For geospatial data, geographical extent is a crucial metadata item, which can be surfaced in landing pages as an embedded Google Map
Protocols and Standards
- Bristol are providing serialisation using RDF/XML, and using SWORD as the repository deposit protocol
- DC2AP – A DataCite Dublin Core Application Profile is in development
- DataCite2RDF – Maps DataCite metadata to RDF
- ISO 19101 – Deals with subsets of data
- XForms – “XML format for the specification of a data processing model for XML data and user interface(s) for the XML data, such as web forms”
- WAF – Web Accessible Folder
- Bristol have used Apache Tika to extract metadata from data files
- OrbianForms – XForms-compliant web form builder available in a free open source Community Edition
- Ex Libris Rosetta – “highly scalable, secure, and easily managed digital preservation system”
- Ex Libris Primo – “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects”
- A “Schematron” validates content as well as conformance to XML schema
Two key publications have been made available this week, both of which are of interest to the ADMIRe project team. Firstly we had the highly awaited publication of the Finch Report: “Accessibility, sustainability, excellence: how to expand access to research publications” . This 140 page publication presents the findings of the Working Group on Expanding Access to Published Research Findings, chaired by Dame Janet Finch. The report recommends a programme of action which will enable more people to read and use the publications arising from research. The report makes ten recommendations and outlines the key actions necessary in order to implement the recommendations of the working group. An executive summary is available and the report has had some interesting media coverage this week, including in the Guardian and the BBC.
The Royal Society today published their substantial report “Science as an open enterprise: open data for open science” which:
“highlights the need to grapple with the huge deluge of data created by modern technologies in order to preserve the principle of openness and to exploit data in ways that have the potential to create a second open science revolution.”
The report highlights six key areas for action, and these include:
- Scientists needing to be more open amongst themselves and with the public and media
- Greater recognition for the value of data gathering, analysis and communication
- Common standards for sharing information in order to make data widely usable
- Publishing data in a reusable form to support findings must be mandatory
- More experts in managing and supporting the use of digital data are required
- New software tools need to be developed to analyse the growing amount of data being gathered
The report includes some interesting case studies of data use and the costs of digital repositories.
It will be interesting to see the impact that both these publications have on academic scholarly communications and opening up access to research outputs (both publications and data).
OpenAIREplus is a large-scale EU project bringing together 41 pan-European partners, including three cross-disciplinary research communities. OpenAIREplus aims to:
“…create a robust, participatory service for the cross-linking of peer-reviewed scientific publications and associated datasets.”
The 30 month project launched in December 2011 (see Bill’s post on this launch) and on the 11th June they will be presenting an OpenAIREplus workshop in conjunction with the Nordbib Conference 2012 Copenhagen, June 11-13, 2012 . The OpenAIREplus workshop “Linking Open Access publications to data – policy development and implementation” looks really interesting with a very exciting programme and I am hoping they will make the workshop presentations and outputs available after the event.
The workshop is aimed at anyone with an interest in this topic, and will be of interest to library managers, researchers, research funders, repository managers, journal editors and publishers, and research administrators. Topics covered include:
- Preparing and writing institutional data management policies
- An overview of funder’s responsibilities and requirements towards data availability and management
- An overview of linking research publications and data
- The research data landscape
Follow developments and news items on the OA EU infrastructure on Twitter @OpenAIRE_eu
Links of interest
OpenAIREplus press release
International conference: Structural frameworks for open, digital research – strategy, policy & infrastructure
The world of ‘Open data’ is not just an issue being hotly debated across the international stage but it seems to be accelerating.
In just the last month there has been…
http://www.libereurope.eu/news/libraries-have-a-role-to-play-is-a-clear-message-from-the-e-infrastructures-for-open-science-wo from Rome
http://www.slideshare.net/laura_Cz/open-everything-exploring-open-in-higher-education from South Africa
http://www.globalimpactstudy.org/2012/04/open-data-open-research-discussion-at-ictd-2012/ in the US
http://blog.finnish-institute.org.uk/2012/03/our-new-research-on-open-data-argues.html from Finland
as well as the major news about the Welcome Trust http://www.guardian.co.uk/science/2012/apr/10/government-backs-research-results-public?newsfeed=true
What does all this mean ?
Perhaps it’s too early to say that there is a definite pattern but I would contend that the debate is maturing rapidly and it’s certainly an exciting time to be involved in this area. Especially when you don’t have to be a large organisation to have a voice, as demonstrated by the always entertaining (and informative) http://cameronneylon.net/
An interesting post on the JISC Repositories list on 21st March highlights a survey being undertaken by the University of Michigan looking at the relationship between data archives and institutional repositories. Their 10 minutes survey is available here for any that wish to contribute.
The poster also mentions a draft 12 page guide (available on Scribd, but easier to read as a Google doc) which is interesting and useful about building links between social science data archives and institutional repositories ” . . . that provides guidelines and decision rules for institutional repositories at each stage of the archiving process: from appraisal to acquisition to curation to dissemination. ”
We’ve been talking today about the move towards open data and how we can draw upon our experiences of trying to deliver open access publishing. Experience from the open access work we have done at the University of Nottingham tells us that we need to take a long term view of this. The open access work has been on-going for nearly ten years and even now there is resistance to publishing using this methodology. Achieving similar results for open data may inevitably take just as long and potentially has bigger hurdles to overcome since researchers build careers upon their IPR and the data they generate and hold.
Is ownership of data much more ingrained into their personal USP as a researcher who brings value to an organisation than perhaps publications are? Inherent in publishing is a certain “letting go” that is accepted as part of the process of being an academic researcher. Is it the case that this does not necessarily exist in the psyche for datasets?
So in light of this we’re seeking to identify ways in which researchers are already “open” with their data. For example depositing in national archives at the end of a project. In the current mindset this might be a tick box towards “sustainability” in the funding bid, but can we re-purpose that thinking and turn it to “being open with data”?
Does that then simplify the process of creating the “local repository” (and supporting metadata) such that the entry describes the dataset and where it is held, linking off to the national repository? Perhaps that is a small additional step that is achievable beyond what the researcher is already doing and can be a catalyst towards change and more openness? If so, then does that local repository become part of the framework we are striving for in ADMIRe for us to build a process around? A quick retrospective trawl might help us to get a quick win and build such a repository to show its potential.
Open access started on a “build it and they will come” approach, and perhaps we need to do the same for open data?
We have been thinking, along with other projects, of the possible benefits of the ADMIRe project and the larger framework of RDM development within the University, in which ADMIRe fits.
Many of the benefits are qualitative in nature, although we do expect solid returns in terms of research exposure, management and re-use.
Our current thinking – very much in draft form for now – can be found here.
We would welcome comments and reflections from others that are going down similar paths in identifying institutional and other benefits for their RDM programmes.
A nice, clean and fairly concise to research data management aimed at academics is available from MIT Libraries: http://libraries.mit.edu/guides/subjects/data-management/
Part of the work of ADMIRe is to get insitutional approval and adoption of a Research Data Management Policy: this is going well. As part of this we have had a look at some other institutional RDMPs: it might be useful to list them here.
Edinburgh University Research Data Management Policy
University of Hertfordshire Research Data Management Policy
University of Northampton Research Data Policy
University of Oxford Research Data Management and background
Following from the previous two posts, which touch on governmental level responses to open access to publication outputs and public data, this is a development which will try and tie them together.
The European OpenAIRE project has out out a press release which reviews its aim to link peer-reviewed literature to associated data.
From the press release:
“The 30 month project, funded by the EC 7th Framework Programme, will work in tandem with OpenAIRE, extending the mission further to facilitate access to the entire Open Access scientific production of the European Research Area, providing cross-links from publications to data and funding schemes. This large-scale project brings together 41 pan-European partners, including three cross-disciplinary research communities.”
Nottingham is proud to be a partner in this work, acting as the National Open Access Desk for the UK for this and the continuing and precursor OpenAIRE project.