This document provides an intuitive introduction and guide to the Neuroimaging Data Model (NIDM) for software interoperability. NIDM defines a core vocabulary that extends the PROV Data Model for provenance with terms that capture information about neuroimaging research, from data acquisition to analysis and results. This primer explains the fundamental NIDM concepts and provides examples of its use. The primer is intended as a starting point for neuroimaging scientists or developers interested in using or creating apps with NIDM.
This document is part of the NIDM Family of Documents, a set of documents defining various aspects of neuroimaging research that are necessary to achieve the vision of inter-operable interchange of information in heterogeneous environments such as the Web, research consortia, and laboratories. A list of current NIDM documents and the latest revision of this specification can be found in the NIDM specification index. These documents are listed below.
The NIDM Working Group encourages implementation of the specifications overviewed in this document. Work on this document by the NIDM Working Group is ongoing, errors and suggestions may be reported in the issue tracker and these may be addressed in future revisions.
This document was published by the NIDM Working Group as a Working Draft. If you wish to make comments regarding this document, please report using the NIDM issue tracker. You can also ask questions at Neurostars Q&A. All comments are welcome.
This primer document provides an accessible introduction to the Neuroimaging Data Model (NIDM) for data sharing at Web scale. NIDM is an extensible framework for community driven development of metadata standards that captures a broad spectrum of research information, particularly data provenance. The provenance of a given piece of neuroimaging data represents its origin, which can facilitate reproducible research by capturing a description of how data was processed. The NIDM Family of Documents includes specifications that define recommendations for how to model neuroimaging research information across several stages of the research process, referred to as NIDM Components.
As a specification for neuroimaging data exchange, NIDM Components address specific information needs for the neuroimaging community. Different NIDM users or developers may have different perspectives on the types of neuroimaging data they would like to exchange.
This primer document aims to ease the adoption of the NIDM specifications by providing:
This section provides an explanation of the main concepts in NIDM, which are extensions of PROV. As this document is meant as a starting point, refer to PROV and the NIDM specification for a given Component for detailed recommendations.
A central theme in NIDM is to view all the information produced during the course of an investigation in the context of provenance, which is accomplished by building NIDM as an extension of the W3C PROV recommendations (Figure 1). PROV defines a core set of high-level structures for capturing provenance information that bolsters trust in how a given piece of information was generated. The PROV specification details three core objects that are used to describe provenance, which are Entities, Agents, and Activities. These core objects are related to each other with a set of defined relations, as highlighted in the figure below.
Entities are used to capture information that tends to persist over time, for example a dataset or spreadsheet file, which can be modified of derived from other entities. For an Entity, like a file, to be created, an Activity is needed that describes how the Entity came into existence, and an Agent is needed to describe who (e.g., a person) or what (e.g., an organization or software) is responsible for generating an Entity. The binding of these three objects provides the structure needed to trust how and who some file, dataset, or analysis was created.
A central theme in NIDM is to view all the information produced during the course of an investigation in the context of provenance, which is accomplished by building NIDM as an extension of the W3C PROV recommendations. In addition to PROV, NIDM adopts a number of metadata vocabularies that are actively being used in the broader Web community (e.g., Dublin Core, FOAF, VoID, and DCAT). By adopting existing vocabularies and harmonizing our efforts with similar groups developing biomedical metadata standards (e.g., W3C Health Care and Life Sciences Interest Group, ISA Commons), NIDM is able to maintain a common description of high-level concepts that are shared across biomedical domains (e.g., project descriptions and participant demographics) while providing additional specificity for the neuroimaging domain. This additional specificity is captured in layers by building a NIDM Core vocabulary, based on PROV, that is then extended to model specific types of information, referred to as NIDM Components, and then linked together using a high-level dataset descriptor (Figure 2).
The NIDM Dataset Descriptor Component is modeled after the efforts of the Health Care and Life Sciences (HCLS) Interest Group at the W3C. HCLS aims to provide a generic dataset descriptor recommendation that captures a general set of metadata that is applicable across domains. NIDM adopts the HCLS approach and extends it by providing an additional set of recommendations that are specific to neuroimaging. The HCLS Dataset Descriptors are organized into three levels – Summary, Version, and Distribution (Figure 3). The Summary-level is focused on information that does not change over the course of a project, while the Version-level captures changes that are specific to a given data release. The Distribution-level is intended to inform the consumer with links to where specific datasets can be accessed, which may come in one or more forms (e.g., relational database, tarball, or RDF). In the case of RDF, datasets can be described using the VoiD vocabulary (www.w3.org/TR/void/), which is the recommended approach with NIDM, as it allows each NIDM Component for a given project to be described independently and linked appropriately using URIs.
The NIDM Workflow Component focus group is just beginning to work on an object model for capturing workflow provenance. We encourage those interested in participating to contact the NIDM google group for further information. There has been work on two preliminary implementations of provenance using NIDM, one using the Nypipe workflow system and the other the SPM batch processing system.
An example of a Nipype workflow for extracting the brain from a structural MRI scan can be found at BET workflow.
An preliminary example encoding a batch execution of slice timing correction using the Statistical Parametric Mapping software is showing in the figure 6.
In this section we give a brief tutorial on modeling your data with NIDM. This will serve as a good starting point and contains further information about choosing vocabularies and documenting your object models.
The first step in modeling your data is to create a conceptual graph of the objects needed to represent the data you will instantiate into a NIDM object model. Here we will use a particularly simple example. We will start with an Excel spreadsheet containing a time series of maternal heart rate measurements sampled each second for 180 seconds during pregnancy. This data was collected at the University of California, Irvine (UCI) as part of the UCI Conte Center. First, let's have a look at a subset of the Excel spreadsheet. Columns A and B are the subject identifiers for the mother and fetus respectively. Columns C-H are the heart rate measurements over time. Note, the columns have been truncated to only show the last 6 seconds of the 180 second acquisition.
Now we can create a conceptual model of this spreadsheet. In this step we decide on how to represent the data as a graph using the PROV-DM core objects: agents, entities, and activities and what relationships needed to connect the information. We create an entity for the heart rate measurements, an activity for the process of measuring the data, and 2 agents, one for the mother whom the heart rate measurements are prov:wasAttributedTo and one of the fetus which the heart rate measurements are prov:wasAssociatedWith.
Next, we decide on the attributes needed to describe the entities, agents, and activities we created in our conceptual model. Here we can create as many attributes as necessary but we should be parsimonious whenever possible. Too many attributes can make your NIDM object model overly complex. In this example we have chosen to add the attributes ncit:heartRate, ncit:timepoint, ncit:heatRateMean, and mhr:heartRateStdDev to the heart rate entity. The prov:type and prov:label attributes are standard attributes in PROV-DM which are typically added to most entities and agents. The format of the attributes is described in the section Vocabularies and Terms Selections section. For the agents we have added the ncit:subjectID attribute to capture the data in Columns A and B of the spreadsheet. Lastly, we have added some associations in the activity to associate the activity with the two agents.
Once you've defined your object model by specifying the objects, relationships, and attributes, it's time to apply your object model to your data. Typically one writes a parser to reformat the input data into a NIDM serialisation. Typical serialisation formats include PROV-N (the provenance notation),Turtle (Terse RDF Triple Language), JSON-LD (Java Serial Object Notation for Linked Data), and RDF-XML (Resource Description Framework XML Syntax). In Example 1 we show the Turtle syntax of the heart rate data entity for ncit:subjectID 30564.
[mhr:entity_71759c9cf0c811e3a3c23c07541223b4 a prov:DataItem, prov:Entity ; rdfs:label "Heart Rate data for subjectID 30564" ; ncit:heartRate "[79.3, 75.3, 78.3, 80.6, 78.2, 80.5, 80.3, ...]" ; ncit:heartRateAvg 8.69e+01 ; ncit:timepoint "[2, 3, 4, 5, 6, 7, 8, ...]"^^ncit:sec ; mhr:heartRateStd 3.9e+00 ; prov:wasAttributedTo mhr:agent_71758f21f0c811e3ab373c07541223b4 ; prov:wasGeneratedBy mhr:activity_71759561f0c811e39f583c07541223b4 .])To make writing NIDM files easier the working group has been using the ProvToolbox Java library from Luc Moreau or the PROV Python library from Trung Dong. The RDFLib has also been used by the group for efficiently working with NIDM documents in Python.
We would also like to refer the interested reader to complementary sources of information:
|Audience||Link||Description and link|
|Users||NIDASH page||Main INCF page for NIDASH Task Force, a child of the INCF Program on Standards for Data Sharing page.|
|Developers||NIDASH on Github||Organization hosting various code repositories of NIDASH projects, NIDM in particular.|
|Developers||NIDASH Google group||Wide distribution list for NIDASH activities|
|Modellers||NIDASH wiki||Overview of NIDASH activities; intended for both internal and external users.|
|Modellers||NIDM working group website||Site that hosts current information on the NIDM data model|
|Modellers||NIDASH Google Drive||Shared area for the INCF Neuroimaging task force (NIDASH) activities & NIDM development, including meeting agenda & minutes, specification, etc...|
|Allfirstname.lastname@example.org||NIDM mailing list|
This document has been produced by the NIDM Working Group, and its contents reflect extensive discussion within the Working Group as a whole.
Members of the NIDM Working Group at the time of publication of this document were: