JISC, EPSRC and DCC RDM awareness event at Nottingham

As the ADMIRe project reaches its final stages, we were pleased to host a large Research Data Management awareness event on The University of Nottingham’s main campus. The event was the culmination of extensive planning by Laurian Williamson and Research Graduate Services, It was designed so that Heads of Schools and senior Professional Services managers could learn about RDM and the impact this may have on their respective roles.

The event started with a buffet lunch, before continuing with a selection of enlightening talks from both external and internal speakers. Attendance was limited to 60 people and we were pleased to say that we had a full house, with only one or two seats being empty in the room. The agenda of the event is here: ADMIRe RDM Event Briefing and Programme

The speakers and their presentations are listed below:

Dr Simon Hodson, Jisc Managing Research Data Programme Manager: Hodson MRD Overview – Nottingham
Ben Ryan, EPSRC Senior Manager, Research Outcomes: EPSRC RDM (Nottingham June 2013)
Joy Davidson, Digital Curation Centre (DCC) Associate Director: Introduction to RDM DCC
Caroline Williams, Director of Libraries and Research and Learning Resources, University of Nottingham: ADMIRe RDM Event June 2013
Paul Kennedy, Group Leader, Security Group, IT Services, University of Nottingham: RDM-Launch-Data-Security
Dr Steven Bamford, Senior Research Fellow, School of Physics & Astronomy, University of Nottingham: RDM meeting Steve Bamford Galaxy Zoo

The talks were followed by Q&A sessions and a panel discussion at the end of the afternoon. As would be expected, discussions were lively and we gave researchers the chance to ask the RDM experts and learn how other institutions are faring. Questions from the floor focused upon the issues around:

  1. Long-term funding of data retention and storage
  2. Sharing sensitive and commercial data
  3. What to store and what to delete (is it cheaper to re-run an experiment for example)
  4. Obsolescence of software/data
  5. Quality of the research being impaired by RDM policy requirements
  6. Subject repositories versus an institutional policy
  7. National and international efforts on RDM
  8. Lodging patent applications and the timely release of data
  9. Costs of data management after the grant ends
  10. The area of PhD and data ownership and long-term responsibility for that data
  11. Metadata and contextual data (e.g. from email trails)
  12. Anonymous data and data fusion (identifying individuals by fusing disparate data sets)

One poignant comment noted that the EPSRC deadline of 2015 is only two years away, so significant progress must be made in all of these areas if RDM is to succeed – both at Nottingham and in the wider research community.

Although this represents the final researcher engagement session for ADMIRe, it is not the end of RDM at Nottingham. Plans are in place for sessions such as these to continue throughout the coming years at Nottingham and explore and answer the questions that were raised today.

University of Nottingham Research Data Management Website

We recently launched the University of Nottingham Research Data Management (RDM) website, which provides a single location for authoritative RDM information and resources for our research community at UoN.

This first phase of the development of the website provides both generic and UoN specific information and phase two development (2013 -2014) will include subject-specific RDM information, more content added to the ‘research data showcase’ and the site content will be refined and enhanced based on further feedback and input from the research community and key stakeholders.

From the onset we wanted a site that would sit within the UoN research domain and adhere to the UoN brand look & feel. The collaboration with the UoN Web manager was crucial and he was very keen on the idea of creating a RDM website for academics at UoN but also using the site to showcase UoN research data.

There are two main audiences for the site:

  • Researchers – both University researchers and interested external researchers (site content will be instructional and used as a tool by researchers)
  • General Public – community active people (site content will promote UoN research and data sets) and our JiscMRD programme partners

Creating the site content was a collaborative effort and it took a while to identify key stakeholders and assign responsibility for authoring and ownership of individual pages. Bringing it all together was quite a challenge and we had to delay the launch until the UoN RDM policy was approved.

There are 50+ pages on the site and deciding on the site hierarchy was heavily influenced by other RDM sites, specifically the University of Glasgow data management site for researchers, which we thought was an excellent RDM site.

In the UoN RDM survey we asked the respondents (366) to select areas where they would like to receive help with RDM, and having a UoN website was one of the tools that they indicated would be useful:

 

In the next few weeks the team will be raising awareness of the RDM site using a variety of internal communication channels, and we welcome any feedback from both the UoN research community and our JiscMRD programme partners.

Adapting, using, and re-using RDM training materials

It was quite timely when I returned to work today that I saw the JiscMRD Evidence blog posting Jisc MRD project materials: use and reuse for RDM training outlining how outputs from the programme are being used and re-used in DCC training events.

Here at ADMIRe we have adapted, used and re-used the excellent Research Data MANTRA and the Training for Data Management (TraD) supportDM for two different UoN audiences, postgraduate students/early career researchers and support staff (library and IT support). In both instances we have embedded these training resources in Moodle, using  valuable outputs from the wider Jisc MRD Programme.

University of Nottingham short course on research data management

We collaborated with the Graduate School during 2012/2013 and adapted and embedded the University of Edinburgh Research Data Management MANTRA online course in Moodle. Christine from the Graduate School did all the technical work in Moodle and I adapted the content of MANTRA for the UoN audience. This standalone online (self-study) online course is delivered entirely online via Moodle and is aimed specifically at postgraduate research students and early career researchers and was made available in April 2013. It now forms part of the UoN short course portfolio and the postgraduate students can gain training points by completing an optional assessment questionnaire (only two questions).

The collaboration with the Graduate School worked really well and it is hoped that this ‘RDM’ collaboration will improve RDM capacity and capability at UoN.

supportDM course for research data management support services

Last week I embedded the first module of the University of East London (UeL) Training for Data Management (TraD) supportDM course in Moodle, aimed specifically at those involved in research data management support services (at UoN this is currently library staff and IT support).

The SupportDM course presumes no prior knowledge of data management or digital curation and is designed for use in a blended learning environment with group meetings and individual tasks to complement the Xerte online elements. It is also suitable for standalone self-directed learning using the Xerte modules.

It has been really useful having these high quality training materials available for adaptation and re-use, many thanks to EDINA and Data Library, University of Edinburgh, and the University of East London for making their project outputs available for re-use and adaptation.

I recently circulated a brief paper on RDM Training to the head of professional development at UoN – providing an overview of what is currently available nationally and what has been done by the ADMIRe project in the area of online RDM training.

Jisc Managing Research Data Programme Workshop

The ADMIRe team attended the excellent Jisc Managing Research Data Programme Workshop this week and we presented on our progress made and some of the challenges we have faced thus far around two themes, business cases and plans for sustainability and data repositories, portals and institutional systems.

The workshop provided a platform for the JiscMRD projects to consider and reflect on the progress made, highlight successes, and reflect on some of the challenges that still remain when considering RDM, especially within a very complex UK HEI context.

Tom’s presented on Data catalogues and data repository and I presented on our work around ADMIRe RDM service models.

There was plenty of time to share experiences and in particular how challenging it is trying to deliver and build institutional RDM capacity and capability.

The keynote from Professor Geoffrey S. Boulton, University of Edinburgh really made me think about the broader ‘data’ context and in particular that RDM isn’t just about compliance with the data expectations from the funding bodies, we need to remember that researchers want to exploit the growing data resources that are available.

University of Nottingham Research Data Management Survey results

The results and analysis of the University of Nottingham Research Data Management survey are now available and the full-text report is available here:  ADMIRe Survey Results and Analysis 2013

The survey covered several key components of research data management (RDM) practice and provides a benchmark to measure progress against the RCUK principles on data. We do hope that the research community and all our Jisc Managing Research Data Programme partners will find something of interest in these results.

The survey was disseminated (using a variety of internal communication channels) to researchers across the University, and was an important part of the requirements gathering phase of the ADMIRe project. This served multiple purposes:

1. To baseline current RDM practices

2. To gather the researchers requirements for RDM

3. Raise awareness for the prospective service and gauge interest levels for the proposed service.

4. Identify areas where support, training, and advocacy were required.

We had 366 respondents, which was a very positive response rate and allows some valid conclusions to be made. Some interesting observations are:

  • The diversity of data types and the strong presence of non-digital data such as lab notebooks
  • Multiple locations for the data and therefore, the ad-hoc strategies of back-up
  • RDM training is high on the agenda
  • Low awareness of the expectations from research funders
  • Low awareness of funding requirements regarding data sharing

We welcome any comments on the survey and if you are interested in having access to the anonymized raw survey data, please do contact  us at <researchdata@nottingham.ac.uk>

 

JISC Managing Research Data Benefits & Evidence Workshop

In late November I attended the JISC Managing Research Data Benefits & Evidence Workshop in Bristol. The two day event was a good chance to review progress and devise KPIs and metrics with which to measure the success of both our project and the implementation of our service. As you would expect there’s a huge amount of reading to be done around policies, funding requirements and work coming out of the other JISC MRD projects, luckily I’ve taken this speed reading course…

I have managed to produce a workable benefits and evidence template, which is available here: Benefits Management Plan – ADMIRe

As you will see, a lot of the metrics require a sufficient level of maturity and are mainly forward looking – our project is expected to hand over a fledgling RDM service, with minimal metrics collected and provide a baseline for what already exists.

Data sharing, what are the incentives?

Data sharing is a hot topic amongst the scientific community and in some instances sharing research data is a requirement/stipulation of your funding body.

In our research data management survey (results to be released shortly) we asked our researchers who could access their research data and the majority of respondents shared their data with their collaborators, with minimal sharing of data outside of the University. See chart below:

Guest blog on data sharing

This guest blog post is from Dr Marianne Bamkin, Research communications assistant and JoRD Project Officer, from the Centre for Research Communications, University of Nottingham. She explains what JoRD is and describes some of the feedback they have had from researchers on the issue of data sharing.

The Journal Research Data Policy Bank (JoRD) project is a JISC funded initiative looking into the feasibility of a service that will collate and summarise journal policies on Research Data in order to provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with these policies. The information held in JoRD would be freely accessible to researchers, publishers and any other interested parties who may want to know whether a journal insists on the inclusion of data in the article, or as supplementary materials to the article, or if the data should be in a certain format or stored in a certain repository. The feasibility study is researching a number of aspects of such a service, f or instance, various business models for funding the service, what publishers and researchers would want from such a service, and most importantly, whether the service would be actively used.

From feedback gained through a combination of a focus group, workshop, online questionnaire and interviews it appears that researchers would be very interested in using the resource to choose where to publish and to understand the requirements of journals. The online questionnaire was answered by researchers from all over the globe, representing each academic discipline and 36 different subjects. The predominant opinion that shone through was that all researchers shared their data with someone, although it may only be a research partner, and the vast majority of researchers believed that in today’s internet society data should be freely shared and openly accessed and they were prepared to share their data. That opinion was also reflected by the participants of a focus group.

There are qualifications to sharing, the most important to researchers being that of attribution and intellectual property. If they had spent many years gathering the data, they want that effort recognised, not necessarily rewarded, money was not a personal concern, but the acknowledgement for their hard work. Another caveat was expressed that truly raw data are not shareable: quantitative data may have errors, qualitative data may be indecipherable, and data may be confidential and sensitive. Data would therefore need a certain level of processing before sharing. Researchers also felt that there were certain optimum times when they would be willing to share data, for example, doctoral research is required to be unique so any data shared before the thesis is submitted may be used to reach the same conclusions by another researcher, preventing the first researcher’s work to be unique. Publishing the data after the doctoral award would be no problem.

However, the researchers’ list of the benefits of sharing data outweighed the problems. They felt that sharing data was expected in current society, leading to scientific openness and accountability. The researchers benefit by having increased access to data, by finding storage for data that would make it future-proof and would also allow greater opportunity for collaboration. Science benefits because shared data increases research efficiency, promotes knowledge, allows data to be verified and studies to be replicated, which in turn increases the quality of Science. Looking at it from that point of view, sharing data is a win : win situation. I am just going to go and upload some data…

For more information on the JoRD project and our findings so far please visit our blog on: http://jordproject.wordpress.com/

Event Report: 4th DataCite Workshop

Logos for British Library, DataCite and JISC

The 4th JISC-British Library DataCite Workshop, on December 3rd at the British Library Centre for Conservation, looked at the challenges of citing data that has various versions, granularities or other structural facets that may make citation difficult. Once again, it proved to be a fascinating and well-organised day, and an excellent opportunity to compare notes with practitioners from all over the country who are wrestling with the same problems we have been pondering.

When Should I Mint a New DOI?

To start us thinking about issues around changes to datasets, we all took part in an exercise answering the question of whether a new DOI should be issued in various scenarios such as changes in access conditions, migration of formats for preservation purposes, and the re-issuing of data in anonymised form following legal action.

Although there was a good level of broad consenus in our answers, there was a significant difference of perspective from our first speaker, Roy Lowrie on “Mapping the data publication paradigm onto the operations of the British Oceanographic Data Centre”. From Roy’s data centre perspective, any change that affects the metadata of a dataset will change the checksum in the BODC’s system, and for him, that means that a new DOI should be issued. Although Roy was often in the minority when answering “Yes” to some of the questions as to whether a new DOI was required, by the end of the day I was left feeling that Roy had highlighted a very important issue around the governance of datasets using DOIs.

If a DOI represents a reference to an archived and curated resource, then if any of the properties of that resource change, surely the object referred to is no longer the same object? I find that I remain uncertain as to whether such updates to metadata fields should ideally be reflected in a change to the version number of the DOI rather than a change to the DOI itself, but I do suspect that the only consistent general solution must somehow involve an archiving of the old version of the packaged object (of which the metadata forms a vital part), otherwise important information may be lost, and therefore there will potentially be multiple archived versions of the packaged object. Are these not then to be thought of as ‘different datasets’ requiring different DOIs? If not, then what does that imply for our confidence in the persistence of a dataset whose properties may be subject to change? Thought-provoking questions indeed…

“There’s no point in assigning DOIs to digital garbage”

This was the quote of the day, for me: Another theme of Roy’s talk was his fear that DOIs would be handed out without adequate scrutiny of datasets. He feels that obtaining a DOI should represent a mark of quality, indicating the approval of the dataset from the institution managing the DOI. This seems to imply that standards, institutional policies and discipline-specific sign-off procedures are key to managing the assignment of DOIs appropriately.

Roy introduced a number of other themes which were developed by other speakers throughout the day. Standardised dictionaries are necessary for nearly every metadata field – otherwise the metadata is often uninterpretable. Larger datasets at the BODC are constantly changing and constantly being refined, and this constant flux means that snapshots of data are in one sense missing the point – but the other hand, versioning and snapshotting datasets becomes increasingly important when they are referenced by researchers. In the data centre paradigm, the dataset is a dynamic entity – so it needs to be pinned down in order to map to its static equivalent in the publication paradigm.

Dublin Core is fine for basic metadata, but discipline-specific enhancements to the metadata, using standards like IOS19115, DIF, FGDC and Darwin Core, are often necessary if any sense is to be made of the dataset. The extended metadata can be filtered to Dublin Core using XSLT. The BODC’s approach to granularity uses the concept of the ‘discovery dataset’: systematic groupings of data atoms.

Last but not least, Roy noted that based on his experience over the years, he would never consider minting a DOI without a verified dataset physically in his possession – promises count for nothing…

“It Depends…”

Next, Neil Jefferies of the Bodleian Library, University of Oxford, speaking on”DOI Implementation issues for institutions”, introduced another theme which was echoed throughout the day: the right answer to many or even most of the questions we are all wrestling with, it turns out, is “it depends…”

From Neil’s experience curating datasets for University of Oxford researchers from a wide range of disciplines, he’s learned that questions such as how to define the appropriate level of granularity, when to version, and how to interpret each metadata field, are very often determined by technical details that are specific to the discipline – and indeed the right answers even vary within the discipline, depending on the research scenario. There just aren’t universal answers to these questions, which implies that a team of experts – librarians and other data curators – have to work together with researchers in order to work out how to define and curate datasets and their metadata. Machine rules to answer these questions are not feasible, in Neil’s view.

How, then, should one manage this heterogeneous situation? Neil explained that the philosophy of Bodleian’s approach is to first obtain sufficient metadata to identify and find an object; then archive it; and then continue to work on the metadata.

Other interesting points from Neil: Bodleian systems use a key concept of an ‘aggregation’, a collection of versions of datasets; they issue their own UUIDs for everything they hold; the Data Catalogue has almost identical structure to the Data Repository; and increasingly they are finding datasets which actually started out as ‘metadata’ – rich and structured metadata can effectively be a dataset itself, and thus the lines between data and metadata are perhaps becoming blurred in some disciplines.

“Research is never finished…”

Next up, Rebecca Lawrence from the Faculty of 1000 on “The Publisher’s perspective and the F1000 approach to versioning”.  Rebecca introduced the forward-looking policies of the soon-to-be-launched “F1000 Research” peer-review and publication service for biology and medicine. She described the radical new publication model of this new research journal:  Immediate publication on submission (within one week, following a very basic check that the article really is scientific); Transparent Peer Review post-publication; and Full Deposition and Sharing of data.

Re-iterating a key theme of the day, Rebecca noted that it has generally been assumed that the publisher keeps “the version of record” of a publication – but in reality science moves on in a more continuous way. Some publishers are therefore now exploring versioned articles, where authors can amend their articles post-publication. F1000’s approach has versioning at the heart of its publication and peer-review, using CrossMark as a tool to help with the management of errors and corrections. The review status has even been added to their citation notation (in square brackets as part of the title).

This was an inspiring model, for me, addressing some vitally important issues around transparency of peer-review and the speeding-up of the process of open publication. The referencing and versioning structure and the process that Rebecca described looked clear and sensible, and it will be fascinating to see whether this model is taken up more widely in the future.

The fluidity of the approach is perhaps best summed up by noting that, in this model, there is never a definitive, finished and final version of a publication: potentially an article could very well receive review comments many years after it was written, and could be amended in response.

“Academics are starting to feel herded”

For our final speaker, a perspective from an actual academic: Simon Coles of the University of Southampton and National Crystallography Service on “DIY DOI: a researcher’s perspective on registering and citing data”.

Simon explained from the outset that he wanted to present a challenging and combative perspective, illustrating how many academics feel about the movement towards ‘open data’, and explaining how these issues relate to the actual motivations of academics. Academics, he said, are just about beginning to feel ‘herded’ to open access to ‘their’ data – and most are reacting reluctantly and experiencing it as ‘another stick to be beaten with’. To explain this reaction, Simon pointed out what traditionally motivates academics: promoting oneself, climbing the ladder in terms of research recognition and recognition in the field, beating the competition and coming out on top of one’s peers.

Simon noted that journal articles are actually a fairly small proportion of his productivity – he estimates that about 5% of the work he does goes into journal articles. Much of the rest of academic work is often effectively lost to posterity – posters, theses, talks, lectures, reports, etc. Most career academics have huge racks of material in their office, and they are very interested in self-publishing their legacy material before retirement in order to pass on their accumulated knowledge to the next generation. Certainly thought-provoking observations, raising the question of whether a focus on the archival and dissemination of publication-related data may be rather missing the point. Indeed, Simon asserted that there is a general feeling that the vast (and exponentially growing) quantity of unstructured supporting electronic data should not be part of the peer review and publication process.

Neil then showed us the reality of publication data in his field of Chemistry, demonstrating a description of information gained from chemical experiments using simple Dublin Core as a base but augmented with chemical information via Qualified Dublin Core. Of course, this practical demonstration of the existing discipline-specific approaches of the Chemistry community to managing and sharing data illustrated again that existing discipline-specific realities determine what makes sense in terms of research metadata.

Despite the initial challenging perspective, Neil’s talk became more positive as he demonstrated current practice, saying that chemists are slowly coming round to embedding DOIs into publications, pointing at datasets in institutional repositories. He and others are now starting to aggregate and combine repositories of chemical data, and using mash-ups to combine content. Neil finished by showing off the Labtrove system to enable the archiving and sharing of ‘lab notebook’ experimental metadata. Labtrove is now being introduced into Southampton’s ePrints repository, and they are now able to cite their lab notebooks.

Finally, in a good summary of several themes from the day, Neil noted that the policy for the obtaining of DOIs requires an institutional plan, discipline-level decision-making, and a sign off process.

 Take-Home Themes

All the themes above were, of course, developed further in discussions during and after lunch, and in a final session we split into groups to think about some specific problems around data versioning and citation. There’s no substitute for attendance at a DataCite workshop, but hopefully the following summary of key themes from the day will be useful both for those who attended and those who couldn’t:

  • Research datasets and publications are generally, in reality, fluid and evolving – they are increasingly being seen as versioned objects, in various contexts.
  • Diversity of material and standards means that librarians have to work closely with academics in order to define appropriate practices and appropriate metadata, as well as to enable appropriate curation of datasets.
  • Discipline-specific standards and extensions to Dublin Core are essential to making datasets re-usable.
  • Institutional policies and discipline-based sign-off are key to managing the assignment of DOIs. There’s no point assigning DOIs to ‘digital garbage’.
  • Question: Should any change to a dataset or to its metadata require a new DOI?

 

Event report: JISC Research Data Management Training Workshop

I attended the JISC Research Data Management Training Workshop, which was held on the 26th October 2012. The aim of the workshop was to provide an opportunity for the new JISCMRD Training projects to introduce what they have been doing in their projects and outlining their progress in the area of developing research data management training materials. This strand of projects are producing RDM training materials for the sciences and/or librarians.

Here at ADMIRe we have already delivered RDM training sessions for library and IT support staff and are very interested in finding out what others are doing when planning sustainable RDM training for their research community. We have developed a RDM training plan, much of which will need to be sustained beyond the timescale and lifetime of the ADMIRe project.

As well as the excellent presentations, the workshop provided plenty of opportunities to discuss challenges, opportunities, benchmarks, and how to make RDM training outputs easy to find and re-usable. A really useful aspect of the day was the involvement of some of the projects from the JISC digital preservation programme, who shared their experiences around developing training resources. In the afternoon we had the opportunity to provide feedback on the Research Information and Digital Literacies Coalition (RIDLs) proposed criteria for describing, reviewing and assessing practice in information literacy training. I found this session really useful, especially when considering how important it is to plan and evaluate courses and resources. The draft criteria are available from here.

The JISCMRD training projects which presented on their activities thus far, included:

  • DaMSSI-ABC – they aim to deliver work that will provide benchmarks on how to best describe training materials and align them with the Vitae Research Development Framework and digital curation
  • RDMRose – led by the University of Sheffield this project aims to develop learning materials on RDM for all LIS students
  • RDMTPA – this project (led by the University of Hertfordshire) is delivering RDM training for physics and astronomy. They have produced a really useful mindmap for RDM training and linking it to the research data lifecycle
  • SoDaMaT – a project led by QMUL which aims to develop discipline-specific research data management training materials for postgraduate research students, researchers and academics working in the area of digital music and audio research
  • TraD – led by UEL this project aims to produce an adapted data management course for PhD students in psychology and a new data management materials for postgraduate students in computer science

There was much to reflect on and take-away ideas from this event, some of which will inform how we move forward with our RDM training and awareness raising. For example:

  1. The possibility of creating a central hub for RDM training resources
  2. DCC will be developing a career profile for librarians involved with RDM
  3. Big challenges – storage, big data, capacity, preservation, which data will be archived with publication, who will re-use the training material?
  4. Discipline-specific RDM resources vs.generic RDM ones
  5. Develop resources around the research process and research data lifeycle
  6. Map your RDM training to the Vitae RDF
  7. ‘Tiered training’ approach
  8. ‘Slogan based’ RDM training – this worked well for some institutions
  9. Embedding RDM training within the CPD culture of an institution (this is the gold standard)
  10. We need to gather evidence for the benefits of RDM training – benefits from RDM training are difficult to quantify
  11. Must fit training around the needs of your researchers
  12. Advocacy, advocacy advocacy – try an find RDM champions and ‘enablers’ at your institution

A really valuable day and ADMIRe are looking forward to seeing and possibly utilising the project outputs once they are made available.

http://www.flickr.com/photos/bixentro/2199711056/sizes/s/in/photostream/