Data that you create during the course of your fellowship constitutes a valuable resource alongside the findings that you publish in journal articles or elsewhere. Providing access to that data allows other researchers to contextualise and validate your findings, and to undertake secondary analyses. Hence in order to comply with the Open and Enhanced Access Policy of the UK Department for International Development, you must make your data freely accessible via a repository or a data centre. You must do this within 12 months of completing your fellowship or when you publish your findings, whichever is the sooner.
For the purposes of this guide, research data is defined as the evidence that underpins and can be used to validate research findings. Data can take many forms (print, digital or physical), and may consist of quantitative or qualitative information created or collected in the course of your work by experimentation, observation, interview, or other methods. Data may thus include statistics, collections of digital images, sound recordings, transcripts of interviews, survey data and fieldwork observations, archive material, and found objects. Data may be raw or primary (for example direct from a measurement or collection), or derived from primary data or existing data sources for subsequent analysis or interpretation.
This guide outlines the steps you must take in order to meet the terms of your award, and includes a decision tree which will help you identify and decide what you need to do at key stages in your research.
B. Before begin your fellowship
[You should complete an Access and Data Management Plan as part of your research project design and proposal, outlining your strategy for maximising access to your publications, data and other outputs.]
Your Data Management Plan (DMP) should set out:
The types of data you expect to create or collect
- How you will record this data
- Where you will store your data and keep it secure
- What data you will preserve after the end of your project (see section D below).
- How you will make your data accessible to others, and any likely restrictions on access.
There are many sources of help in drafting a DMP. The Digital Curation Centre provides a range of resources including a ‘how to’ guide, a checklist of issues and questions to consider, guidance on a series of frequently asked questions, and a freely-available web-based tool to help you create your plan . It is important that your plan should be rigorous in identifying the kinds of data you will collect or create, how it will be managed, and which is likely to be of long-term value beyond the end of the project. You should keep all these matters under review during the course of your research.
C. During your fellowship
Any data you collect or create must be organised, stored securely, and maintained so that it can be re-used in the future. For some detailed guidance on how and why to do this, see the Guide published in the UK by Jisc in January 2016. You will need to think about:
Organising your data in a structured way. This includes:
- File names that help you keep track of the content and status of your data files, including version numbers and dates:
- File structure and hierarchy, and whether files are organised by type of data (text, dataset, images etc); research activities (interviews, surveys, measurements etc); or type of material (documentation, publications, data etc)
- Version control so that different versions can be found as needed. If you store files in various places you may need regularly to synchronise. But make sure you maintain single master versions of files in a suitable format and stored in a single location.
Looking after your non-digital data such as laboratory notebooks, found objects, journals or consent forms. You might consider digitising them by scanning or taking a digital photograph. If not, you must protect them in other ways, by keeping them, for instance in a secure, fireproof filing cabinet.
Managing access to your data, which may be particularly important with data that relates to individuals, or is commercially-sensitive. In such cases you must consider carefully how to prevent unauthorised access.
Keeping your data safe, by regularly backing up your work. This is particularly important, of course, for data that you have indicated in your DMP will be preserved after the end of your project. It is important also to treat software code with similar levels of care as data, especially if specific software is needed in order to use or interpret your data. Data and code can be lost or corrupted through human error, software or hardware faults, hacking or virus infection. Where files contain personal or other sensitive information, you should restrict the number of copies (for example, a master and a backup copy); and you may wish to consider encrypting your files.
Documenting your data, to explain how they were created or collected, what they mean, their content and structure, and any manipulations you have undertaken. This is essential to ensure that the data can be understood both during your project and in the longer term, and that it can subsequently be interpreted by others. So you must maintain good documentation about your project and its progress, as well as about your data, including such matters as data sources, collection methods and protocols, data validation and checking, data confidentiality, descriptions of variables, explanations of codes and classification schemes, and so on. Sub-sets of this documentation may later be converted into metadata, usually structured according to international standards, so that your data can be discovered and accessed by others.
Reviewing the data that you have created or collected, and how it is being managed, in the light of your DMP; and re-assessing which categories of data should be preserved after the end of the project.
D. After your fellowship comes to an end
You should undertake a final review of the data that should be preserved in a recognised repository or data centre. Under the terms of your award, you must ensure your data is retained for a minimum of five years after the end of the award; and you must make it accessible to others, without charge, any time beyond twelve months after you have completed the creation or collection of your data. That does not mean you must preserve everything. The guiding principle is that data should be preserved and made accessible if it will be of value and use to others, erring on the side of inclusion over exclusion. In order to help you decide, the Digital Curation Centre outlines five steps in determining what data should be kept for the long term:
Identifying the data that must be kept for legal, regulatory or other reasons, which means that you must consider whether there are any laws or regulations which require you to retain the data; or whether there are contractual reasons why you must retain it. If you have collected or created personal data about individuals, you will also have to consider the legal or ethical requirements relating to the preservation of the data, as well as the forms in which it can be preserved, and the terms on which it may be made accessible – or not - to others. Similar issues may arise with data you have collected from or which has been created by third parties, such as government agencies, voluntary organisations or commercial companies; in many cases, such data may have confidentiality agreements attached to it.
Identifying the purposes the data could fulfil, which might include factors such as enabling others to check and verify your findings; providing opportunities for further analysis of the data, or integration with other data sources; providing evidence for further publications; or providing a resource for teaching and learning about your research.
Identifying the data likely to be of long-term value, which means that you must consider:
- The quality of the data, in terms of completeness, sample size, accuracy, validity, reliability, and any other relevant criteria; and whether you have good quality documentation about how it was collected or created
- The likely demand for the data: are other researchers or users likely to find the data valuable; does it relate to matters of wider research or public interest; does it have potential for integration with other data sets?
- Replicability: how easy would it be to re-produce the data (remembering that data arising from observations or surveys is likely to be unique)?
- Data formats: are the software and the hardware for re-use of the data likely to be widely available?
Assessing the resources needed, in time, equipment, software, service charges and so on, for preparing the data for archiving (including any necessary conversion to open file formats; and for preparing appropriate metadata.
Finalising your appraisal, and creating a schedule of the data that should be preserved, and what can be disposed of. It is good practice to keep a record of the data you discard.
E. Choosing a digital repository, archive or data centre
You should submit the data you have decided should be preserved for the long term in a repository, archive or data centre that can provide secure, stable storage, and provide access to the data (subject to any restrictions necessary in relation to personal data about individuals, or data that is sensitive on any other grounds). Some universities – though relatively few in Africa – run their own institutional repositories. But repositories recognised and run by members of the research community in different disciplines carry the advantage that they have expertise relating to the data in their subject or disciplinary domain. The UK Department for International Development’sR4D repository will accept simple datasets where no other repository is available. In all cases, you should provide to R4D metadata relating to your data, as well as your publications.
There are several guides to repositories, their characteristics, the subjects and content types they cover, and the terms on which they operate. Some repositories restrict deposit to members of a particular institution or group of institutions; others to data created with the support of a particular funder. Many others are more open. One of the most comprehensive sources to consult for such information is the Registry of Research Data Repositories (re3data), which covers all subject areas and institutions across the world. It is managed under the auspices of DataCite, an international organisation that issues Digital Object Identifiers (DOIs) for data, to provide a persistent link to its location on the internet. In the biosciences, the biosharing.org catalogue includes bioscience databases described according to domain guidelines and standards. Among publishers, Nature Publishing Group (now part of Springer Nature) maintains a list of approved repositories that meet its specific requirements for data access, preservation and stability.
As a general rule, subject or disciplinary domain repositories are likely to offer the best location for data that is to be made openly accessible. But wherever you decide to deposit your data, you should consider a number of factors:
- Is the repository reputable? Re3data undertakes reviews of repositories before it includes them in the registry, and indicates whether or not each repository has been certified by another organisation, such as a relevant funder or learned society.
- Will the repository accept your data? There are some general-purpose repositories such as Figshare, Dryad and Zenodo that accept data of any kind. But most repositories accept data that relates to a particular institution, funding body, type of study, research topic or domain. You may also have to meet requirements relating to data types and file formats.
- Legal issues. You will need to consider issues such as whether the repository can accept personal data; any requirements relating to confidentiality; and whether deposit will breach copyright relating to third-party data. Each repository will provide advice on this. You will also need to consider the licence under which the data is deposited and made available to others, and the access control systems used by the repository.
- Repository data services. What steps does the repository take to make the data findable and accessible, by providing high-quality metadata records, stable identifiers such as DOIs, and effective browse and search facilities? Does it promote interoperability by exposing metadata as linked open data, harvesting of metadata, and interlinking with publications? What steps does it take to facilitate re-use of the data, including data mining and extraction, and visualisations?
Further guidance on all these points, including the pros and cons of different kinds of repositories, is provided by the Digital Curation Centre. Most data repositories also provide guidance on the issues you will need to address before depositing your data, and on how to prepare your data for deposit. See, for example, the guidance offered by the UK Data Archive and by Dryad.
Once you have decided on a repository, you will usually have to complete a data deposit form, on which you will provide information about your project, file descriptions, format and size, geographical coverage (where relevant), any relevant confidentiality or consent issues, licensing terms and so on. You will typically be asked either to provide sufficient documentation so that the repository staff can facilitate discovery of your data by creating catalogue metadata structured according to an international standard, or to provide such metadata yourself.
F. Publishing your data
Read the “CIRCLE Open Access Publication Guidance” for information on how to decide on the journals in which you publish your findings, and how to ensure that you meet the terms of your award with regard to Open Access. Any journal article you publish will typically include tables or figures derived from your data. Under the terms of your award you must include in any publication details of how readers can gain access to your original datasets. Some publishers and journals encourage authors to submit data files as supplementary material to accompany the online version of their articles. But a growing number of journals and publishers also now require that the data underlying the findings reported in the articles they publish - including computational or curated data, and data produced by an experimental or observational procedure – should be submitted to, and made accessible via, an appropriate external repository. The policies differ in detail, often in relation to specific subject domains, and you should check the policies of the particular journal in which you decide to publish your article. You may wish to examine the policies of publishers such as Nature Publishing Group, PLOS, and the Royal Society. But in general terms, there is often a requirement to provide data in the ‘rawest’ form that will permit substantial reuse.
You may also wish to consider publishing some or all of your data in a specialist data journal. Such journals provide opportunities for you formally to publish your data, and to gain formal acknowledgement for it, in the form of citations. The key purpose of data journals is thus to encourage researchers to provide access to their data by promoting both accreditation (through peer review before publication and citations after publication) and re-use of data; to improve research transparency; and to provide an easily- accessible, permanent and resolvable route to research datasets. Examples of such journals include Scientific Data, published by Nature Publishing Group, GigaScience, published by Springer, and Geoscience Data Journal, published by Wiley.
Data Management Checklist: Key questions and decisions
Before you begin your fellowship
Identify the kinds of data, digital and non-digital, that you are going to collect or create, and create a data management plan
a. How are you going to organise, store and manage it, and what resources – equipment, expertise, storage – do you need?
b. Are there particular kinds of data that will require special measures because they include personal information, or are sensitive or confidential on other grounds.
- Do you have the necessary regulatory or ethical approvals for handling such data?
- Are the facilities available to enable me to protect such data (encryption, special storage etc)?
During your fellowship
Collect or create data (experiments, observations, interviews etc), and take systematic steps to organise it.
a. Develop and implement a protocol for file names, and a structural hierarchy for your files
b. Keep to a restricted number of file formats (preferably open formats)
c. Maintain a strict system of version control as you amend or update your files
Store your files, and any relevant code, as securely as possible, and back them up regularly. If you are using more than one device, or store files in different places, synchronise them regularly, making sure that you preserve a single master set of files.
a. Make sure that you restrict who has access to your files, and that they can amend or manipulate them only with your knowledge and consent.
b. Make sure that special protection is given to files containing personal data or other sensitive or confidential information.
Document the progress of your project and contextual information relating to your data.
a. Keep records of your data sources, collection methods and protocols, data validation and checking, data confidentiality, descriptions of variables, explanations of codes and classification schemes, etc, so that your data remains understandable and usable.
Review the data you have created or collected, and your data management plan.
a. Do any changes need to be made to the plan – and how you are organising, storing and documenting your data - in the light of the progress of your project and the data you have created or collected?
After your fellowship
Determine the data that should be preserved in a recognised data repository.
a. What data must be kept for legal or regulatory reasons?
b. What purposes might your data fulfil beyond the life of your fellowship, and what might its value be?
c. What do you need to do to prepare the data for archiving, and what resources do you need to do so?
Choose a repository, archive or data centre
a. Are there reputable repositories available to you and will they accept your data?
b. Are there any legal or other reasons why you may not deposit your data in such a repository?
c. What kinds of services does the repository provide that will sustain the value of your data and ensure that it is widely accessible and used by others?
d. What steps do you need to take to deposit your data?
Publish your findings and your data
a. Make sure that you provide in any publication information details of where your data is preserved and can be made accessible; and that you meet the conditions set by your chosen journal on access to your data.
b. Examine the options for publishing your data in a specialist data journal.