MXADataModel
Data Organization
The goal of the MXA project is to allow the researcher to gather
all the information and data that pertains to a particular
experiment and store that information in a single location in an
open format. Thus allowing the researcher to easily share data with
thier peers for further analysis.
Main MXADataModel web site: http://mxa.web.cmu.edu/
Published Papers
Jackson, Michael, and Jeff P. Simmons, M. De Graef. MXA: a
customizable HDF5-based data format for multi-dimensional data
sets. Modelling and Simulation in Materials Science and
Engineering Volume 18 Number 6, September 2010.
Too Many Files
While doing research many data files are usually created. Typically
these files are stored in a mish-mash of folders scattered about a
users computer system. Important information about the experiment
may or may not follow the data. Losing this meta data may render
the actual data useless to the researcher. Each of these files
could be stored in different formats which can be either
proprietary or opensource
Data Storage and HDF5
HDF5 is a scientific file format under ongoing development at UIUC.
The HDF format was originally designed to store data being produced
on High Performance Computing (HPC) systems. After a number of
revisions to the HDF specification a complete rewrite was under
taken to solve some of the limiting issues that had arisen during
production use. The latest revisiion is version 5, or HDF5. This
file format organizes data in a "Directed Graph" which in its
simplest form can be used in a heirarchical fashion. Any type of
data can be stored into the file in any organization that the
researcher desires. In this way the HDF5 format is a highly
flexible storage format in which to store vast amounts of
data.
Data Model
The concept of a "Data Model" is introduced in order to bring some
consistancy to the layout of the data within an HDF5 file. The data
model consists of 3 main parts and tries to follow generally
accepted experimental methods.
1. Data Dimensions are the independent variables of a particular
data set or experiment
2. Data Records are the the dependent variables of the
experiment
3. Data Root is used to describe the location within the HDF5 file
where the actual data resides
The data model is stored in the HDF5 file in a strict location and
follows a specification that states how the information is stored.
By using this type of system, exchange of MXA files between
algorithms becomes easier and more straight forward. Note that the
values that are written to the "Data Model" part of the HDF5 file
only describe the experiment and the layout of the data within the
HDF5 file. The actual data is stored in another location inside the
HDF5 data file.
The "Data Root" is a value that holds the 'path' to the actual
experimental data as it is stored within the HDF5 data file. The
data is stored according to the heirarchy that is laid out in the
Data Model. Using this approach, any piece of data can be quickly
found either by a human being or by automated processes.
Data sets available
for download
IN100 - This dataset consists of a series of sections
through an IN100 sample, provided by Jack Schirra, of Pratt and
Whitney in support of the DARPA AIM program. It was sectioned with
a slice-and-view method with the Focused Ion Beam by Mike Uchic of
AFRL. Area of each image is approx 30 x 30 microns, and spacing
between consecutive images is 100 nm.
IN100 Tile Series 7000 Slice 0
Pearlite -
Digitized versions of legacy pearlitic steal
micrographs.
Pearlite_slice_0084.tif
RoboMet - A
beta-processed titanium alloy, courtesy of Dr. H. Fraser at The
Ohio State University, acquired with a RoboMet automated serial
sectioning instrument.
RoboMet Slice 68 Tile 04
MNML-3 - RoboMet.3D
image stack of 3 phase Ni-Al-Epoxy on a nanoscale courtesy of US
Air Force Research Labs. Data Collected under the supervision of
Dr. J. Spowart. Data Release under US Air Force Public Affairs case
number 88ABW-2009-5021. The complete data stack is 4 Tiles by 5
Tiles by 926 slices and totals about 24 GB in size. The data set
linked here is a subset of that consisting of slices 769 to
780.
MNML-3_200x Slice 50
Tile 01
MNML-5 - RoboMet.3D
image stack of 2 phase Ni-Epoxy on a nanoscale courtesy of US Air
Force Research Labs. Data Collected under the supervision of Dr. J.
Spowart. Data Release under US Air Force Public Affairs case number
88ABW-2009-5022. The complete data stack is 9 Tiles by 6 Tiles by
821 slices and totals about 50 GB in size. The data set linked here
is a subset of that consisting of slices 100 to
110.
MNML-5_500x Slice 136
Tile 13