2 - Getting started with SampleData : Exploring dataset contents¶

This second User Guide tutorial will introduce you to:

create and open datasets with the SampleData class
the SampleData Naming System
how to get informations on a dataset content interactively
how to use the external software Vitables to visualize the content and organization of a dataset
how to use the Paraview software to visualize the spatially organized data stored in datasets
how to use generic HDF5 command line tools to print the content of your dataset

You will find a short summary of all methods reviewed in this tutorial at the end of this page.

Note

Throughout this notebook, it will be assumed that the reader is familiar with the overview of the SampleData file format and data model presented in the previous notebook of this User Guide.

Warning

This Notebook review the methods to get information on SampleData HDF5 datsets content. Some of the methods detailed here produce very long outputs, that have been conserved in the documentation version. Reading completely the content of this output is absolutely not necessary to learn what is detailed on this page, they are just provided here as examples. So do not be afraid and fill free to scroll down quickly when you see large prints !

I - Create and Open datasets with the SampleData class¶

In this first section, we will see how to create SampleData datasets, or open pre-existing ones. These two operations are performed by instantiating a SampleData class object.

Before that, you will need to import the SampleData class. We will import it with the alias name SD, by executing:

Import SampleData and get help¶

[1]:

from pymicro.core.samples import SampleData as SD

Before starting to create our datasets, we will take a look at the SampleData class documenation, to discover the arguments of the class constructor. You can read it on the pymicro.core package API doc page, or print interactively by executing:

>>> help(SD)

or, if you are working with a Jupyter notebook, by executing the magic command:

>>> ?SD

Do not hesitate to systematically use the ``help`` function or the ``”?”`` magic command to get information on methods when you encounter a new one. All SampleData methods are documented with explicative docstrings, that detail the method arguments and returns.

Dataset creation¶

The class docstring is divided in multiple rubrics, one of them giving the list of the class constructor arguments. Let us review them one by one.

filename: basename of the HDF5/XDMF pair of file of the dataset

This is the first and only mandatory argument of the class constructor. If this string corresponds to an existing file, the SampleData class will open these file, and create a file instance to interact with this already existing dataset. If the filename do not correspond to an existing file, the class will create a new dataset, which is what we want to do here.

Let us create a SampleData dataset:

[2]:

data = SD(filename='my_first_dataset')

That is it. The class has created a new HDF5/XDMF pair of files, and associated the interface with this dataset to the variable data. No message has been returned by the code, how can we know that the dataset has been created ?

When the name of the file is not an absolute path, the default behavior of the class is to create the dataset in the current work directory. Let us print the content of this directory then !

[3]:

import os # load python module to interact with operating system
cwd = os.getcwd() # get current directory
file_list = os.listdir(cwd) # get content of current work directory
print(file_list,'\n')

# now print only files that start with our dataset basename
print('Our dataset files:')
for file in file_list:
    if file.startswith('my_first_dataset'):
        print(file)

['3_SampleData_Image_groups.ipynb', 'test_dump.txt', '5_SampleData_data_compression.ipynb', '6_SampleData_inheritance_and_Microstructure_class.ipynb', 'Introduction_backup.md', 'test_crop_data.h5', 'my_first_dataset.h5', '4_SampleData_Mesh_groups.ipynb', 'SampleData_Introduction.ipynb', 'dataset_information.txt', 'SampleDataUserGuide.rst', '1_Getting_Information_from_SampleData_datasets.ipynb', '2_SampleData_basic_data_items.ipynb', 'my_first_dataset.xdmf', 'Images', '.ipynb_checkpoints', 'test_crop_data.xdmf', 'SampleData_Quick_Reference_Sheet.ipynb']

Our dataset files:
my_first_dataset.h5
my_first_dataset.xdmf

The two files my_first_dataset.h5 and my_first_dataset.xdmf have indeed been created.

If you want interactive prints about the dataset creation, you can set the verbose argument to True. This will set the activate the verbose mode of the class. When it is, the class instance prints a lot of information about what it is doing. This flag can be set by using the set_verbosity method:

[4]:

data.set_verbosity(True)

Let us now close our dataset, and see if the class instance prints information about it:

[5]:

del data


Deleting DataSample object
.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

Dataset and Datafiles closed

Note

It is a good practice to always delete your SampleData instances once you are done working with a dataset, or if you want to re-open it. As the class instance handles opened files as long as it exists, deleting it ensures that the files are properly closed. Otherwise, file may close at some random times or stay opened, and you may encounter undesired behavior of your datasets.

The class indeed returns some prints during the instance destruction. As you can see, the class instance wrights data into the pair of files, and then closes the dataset instance and the files.

Dataset opening and verbose mode¶

Let us now try to create a new SD instance for the same dataset file "my_first_dataset". As the dataset files (HDF5, XDMF) already exist, this new *SampleData* instance will open the dataset files and synchronize with them. with the verbose mode on. When activated, SampleData class instances will display messages about the actions performed by the class (creating, deleting data items for instance)

[6]:

data = SD(filename='my_first_dataset', verbose=True)

-- Opening file "my_first_dataset.h5"

Minimal data model initialization....

Minimal data model initialization done


**** FILE CONTENT ****

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`


Printing dataset content with max depth 3

.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

You can see that the printed information states that the dataset file my_first_dataset.h5 has been opened, and not created. This second instantiation of the class has not created a new dataset, but instead, has opened the one that we have just closed. Indeed, in that case, we provided a filename that already existed.

Some information about the dataset content are also printed by the class. This information can be retrived with specific methods that will be detailed in the next section of this Notebook. Let us focus for now on one part of it.

The printed info reveals that our dataset content is composed only of one object, a Group data object named /. This group is the Root Group of the dataset. Each dataset has necessarily a Root Group, automatically created along with the dataset. You can see that this Group already have a Child, named Index. This particular data object will be presented in the third section of this Notebook. You can also observe that the Root Group already has attributes (recall from introduction Notebook that they are Name/Value pairs used to store metadata in datasets). Two of those attributes match arguments of the SampleData class constructor:

the description attribute
the sample_name attribute

The description and sample_name are not modified in the dataset when reading a dataset. These SD constructor arguments are only used when creating a dataset. They are string metadata whose role is to give a general name/title to the dataset, and a general description. However, they can be set or changed after the dataset creation with the methods set_sample_name and set_description, used a little further in this Notebook.

Now we know how to open a dataset previously created with SampleData. We could want to open a new dataset, with the name of an already existing data, but overwrite it. The SampleData constructor allows to do that, and we will see it in the next subsection. But first, we will close our dataset again:

[7]:

del data


Deleting DataSample object
.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

Dataset and Datafiles closed

Overwriting datasets¶

The overwrite_hdf5 argument of the class constructor, if it is set to True, will remove the filename dataset and create a new empty one, if this dataset already exists:

[8]:

data = SD(filename='my_first_dataset',  verbose=True, overwrite_hdf5=True)


-- File "/home/amarano/Codes/pymicro/examples/SampleDataUserGuide/my_first_dataset.h5" exists  and will be overwritten

-- File "my_first_dataset.h5" not found : file created
-- File "my_first_dataset.xdmf" not found : file created
.... writing xdmf file : my_first_dataset.xdmf

Minimal data model initialization....

Minimal data model initialization done

.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

As you can see, the dataset files have been overwritten, as requested. We will now close our dataset again and continue to see the possibilities offered by the class constructor.

[9]:

del data


Deleting DataSample object
.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

Dataset and Datafiles closed

Copying dataset¶

One last thing that may be interesting to do with already existing dataset files, is to create a new dataset that is a copy of them, associated with a new class instance. This is usefull for instance when you have to try new processing on a set of valuable data, without risking to damage the data.

To do this, you may use the copy_sample method of the SampleData class. Its main arguments are:

src_sample_file: basename of the dataset files to copy (source file)
dst_sample_file: basename of the dataset to create as a copy of the source (desctination file)
get_object: if False, the method will just create the new dataset files and close them. If True, the method will leave the files open and return a SampleData instance that you may use to interact with your new dataset.

Let us try to create a copy of our first dataset:

[10]:

data2 = SD.copy_sample(src_sample_file='my_first_dataset', dst_sample_file='dataset_copy', get_object=True)

[11]:

cwd = os.getcwd() # get current directory
file_list = os.listdir(cwd) # get content of current work directory
print(file_list,'\n')

# now print only files that start with our dataset basename
print('Our dataset files:')
for file in file_list:
    if file.startswith('dataset_copy'):
        print(file)

['3_SampleData_Image_groups.ipynb', 'dataset_copy.xdmf', 'test_dump.txt', '5_SampleData_data_compression.ipynb', '6_SampleData_inheritance_and_Microstructure_class.ipynb', 'Introduction_backup.md', 'test_crop_data.h5', 'my_first_dataset.h5', '4_SampleData_Mesh_groups.ipynb', 'SampleData_Introduction.ipynb', 'dataset_information.txt', 'SampleDataUserGuide.rst', '1_Getting_Information_from_SampleData_datasets.ipynb', '2_SampleData_basic_data_items.ipynb', 'my_first_dataset.xdmf', 'Images', '.ipynb_checkpoints', 'dataset_copy.h5', 'test_crop_data.xdmf', 'SampleData_Quick_Reference_Sheet.ipynb']

Our dataset files:
dataset_copy.xdmf
dataset_copy.h5

The copy_dataset HDF5 and XDMF files have indeed been created, and are a copy of the my_first_dataset HDF5 and XDMF files.

Note that the copy_sample is a static method, that can be called even without SampleData instance. Note also that it has a overwrite argument, that allows to overwrite an already existing dst_sample_file. It also has, like the class constructor, a autodelete argument, that we will discover in the next subsection.

Automatically removing dataset files¶

In some occasions, we may want to remove our dataset files after using our SampleData class instance. This can be the case for instance if you are trying some new data processing, or using the class for visualization purposes, and are not interested in keeping your test data.

The class has a autodelete attribute for this purpose. IF it is set to True, the class destructor will remove the dataset file pair in addition to deleting the class instance. The class constructor and the copy_sample method also have a autodelete argument, which, if True, will automatically set the class instance autodelete attribute to True.

To illustrate this feature, we will try to change the autodelete attribute of our copied dataset to True, and remove it.

[12]:

# set the autodelete argument to True
data2.autodelete = True
# Set the verbose mode on for copied dataset
data2.set_verbosity(True)

[13]:

# Close copied dataset
del data2


Deleting DataSample object
.... Storing content index in dataset_copy.h5:/Index attributes
.... writing xdmf file : dataset_copy.xdmf
.... flushing data in file dataset_copy.h5
File dataset_copy.h5 synchronized with in memory data tree

Dataset and Datafiles closed
SampleData Autodelete:
 Removing hdf5 file dataset_copy.h5 and xdmf file dataset_copy.xdmf

The class destructor ends by priting a confirmation message of the dataset files removal in verbose mode, as you can see in the cell above. Let us verify that it has been effectively deleted:

[14]:

file_list = os.listdir(cwd) # get content of current work directory
print(file_list,'\n')

# now print only files that start with our dataset basename
print('Our copied dataset files:')
for file in file_list:
    if file.startswith('dataset_copy'):
        print(file)

['3_SampleData_Image_groups.ipynb', 'test_dump.txt', '5_SampleData_data_compression.ipynb', '6_SampleData_inheritance_and_Microstructure_class.ipynb', 'Introduction_backup.md', 'test_crop_data.h5', 'my_first_dataset.h5', '4_SampleData_Mesh_groups.ipynb', 'SampleData_Introduction.ipynb', 'dataset_information.txt', 'SampleDataUserGuide.rst', '1_Getting_Information_from_SampleData_datasets.ipynb', '2_SampleData_basic_data_items.ipynb', 'my_first_dataset.xdmf', 'Images', '.ipynb_checkpoints', 'test_crop_data.xdmf', 'SampleData_Quick_Reference_Sheet.ipynb']

Our copied dataset files:

As you can see, the dataset files have been suppressed. Now we can also open and remove our first created dataset using the class constructor autodelete option:

[15]:

data = SD(filename='my_first_dataset',  verbose=True, autodelete=True)

print(f'Is autodelete mode on ? {data.autodelete}')

del data

-- Opening file "my_first_dataset.h5"

Minimal data model initialization....

Minimal data model initialization done


**** FILE CONTENT ****

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`


Printing dataset content with max depth 3

.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree
Is autodelete mode on ? True

Deleting DataSample object
.... Storing content index in my_first_dataset.h5:/Index attributes
.... writing xdmf file : my_first_dataset.xdmf
.... flushing data in file my_first_dataset.h5
File my_first_dataset.h5 synchronized with in memory data tree

Dataset and Datafiles closed
SampleData Autodelete:
 Removing hdf5 file my_first_dataset.h5 and xdmf file my_first_dataset.xdmf

[16]:

file_list = os.listdir(cwd) # get content of current work directory
print(file_list,'\n')

# now print only files that start with our dataset basename
print('Our dataset files:')
for file in file_list:
    if file.startswith('my_first_dataset'):
        print(file)

['3_SampleData_Image_groups.ipynb', 'test_dump.txt', '5_SampleData_data_compression.ipynb', '6_SampleData_inheritance_and_Microstructure_class.ipynb', 'Introduction_backup.md', 'test_crop_data.h5', '4_SampleData_Mesh_groups.ipynb', 'SampleData_Introduction.ipynb', 'dataset_information.txt', 'SampleDataUserGuide.rst', '1_Getting_Information_from_SampleData_datasets.ipynb', '2_SampleData_basic_data_items.ipynb', 'Images', '.ipynb_checkpoints', 'test_crop_data.xdmf', 'SampleData_Quick_Reference_Sheet.ipynb']

Our dataset files:

Now, you now how to create or open SampleData datasets. Before starting to explore their content in detail, a last feature of the SampleData class must be introduced: the naming system and conventions used to create or access data items in datasets.

Note

Using the autodelete option is usefull when you want are using the class for tries, or tests, and do not want to keep the dataset files on your computer. It is also a proper way to remove a SampleData dataset, as it allows to remove both files in one time.

II - The SampleData Naming system¶

SampleData datasets are composed of a set of organized data items. When handling datasets, you will need to specify which item you want to interact with or create. The SampleData class provides 4 different ways to refer to datasets. The first type of data item identificator is :

the Path of the data item in the HDF5 file.

Like a file within a filesystem has a path, HDF5 data items have a Path within the dataset. Each data item is the children of a HDF5 Group (analogous to a file contained in a directory), and each Group may also be children of a Group (analogous to a directory contained in a directory). The origin directory is called the root group, and has the path '/'. The Path offers a completely non-ambiguous way to designate a data item within the dataset, as it is unique. A typical path of a dataitem will look like that : /Parent_Group1/Parent_Group2/ItemName. However, pathes can become very long strings, and are usually not a convenient way to name data items. For that reason, you also can refer to them in SampleData methods using:

the Name of the data item.

It is the last element of its Path, that comes after the last / character. For a dataset that has the path /Parent_Group1/Parent_Group2/ItemName, the dataset Name is ItemName. It allows to refer quickly to the data item without writing its whole Path.

However, note that two different datasets may have the same Name (but different pathes), and thus it may be necessary to use additional names to refer to them with no ambiguity without having to write their full path. In addition, it may be convenient to be able to use, in addition to its storage name, one or more additional and meaningfull names to designate a data item. For these reasons, two additional identificators can be used:

the Indexname of the data item
the Alias or aliases of the data item

Those two types of indentificators are strings that can be used as additional data item Names. They play completely similar roles. The Indexname is also used in the dataset Index (see below), that gather the data item indexnames together with their pathes within the dataset. All data items must have an Indexname, that can be identical to their Name. If additionnal names are given to a dataset, they are stored as an Alias.

Many SampleData methods have a nodename or name argument. Everytime you will encounter it, you may use one of the 4 identificators presented in this section, to provide the name of the dataset you want to create or interact with. Many examples will follow in the rest of this Notebook, and of this User Guide.

Let us now move on to discover the methods that allow to explore the datasets content.

III- Interactively get information on datasets content¶

The goal of this section is to review the various way to get interactive information on your SampleData dataset (interactive in the sens that you can get them by executing SampleData class methods calls into a Python interpreter console).

For this purpose, we will use a pre-existing dataset that already has some data stored, and look into its content. This dataset is a reference SampleData dataset used for the core package unit tests.

[17]:

from config import PYMICRO_EXAMPLES_DATA_DIR # import file directory path
import os
dataset_file = os.path.join(PYMICRO_EXAMPLES_DATA_DIR, 'test_sampledata_ref') # test dataset file path
data = SD(filename=dataset_file)

1- The Dataset Index¶

As explained in the previous section, all data items have a Path, and an Indexname. The collection of Indexname/Path pairs forms the Index of the dataset. For each SampleData dataset, an Index Group is stored in the root Group, and the collection of those pairs is stored as attributes of this Index Group. Additionnaly, a class attribute content_index stores them as a dictionary in the clas instance, and allows to access them easily. The dictionary and the Index Group attributes are automatically synchronized by the class.

Let us see if we can see the dictionary content:

[18]:

data.content_index

[18]:

{'array': '/test_group/test_array',
 'group': '/test_group',
 'image': '/test_image',
 'image_Field_index': '/test_image/Field_index',
 'image_test_image_field': '/test_image/test_image_field',
 'mesh': '/test_mesh',
 'mesh_ElTagsList': '/test_mesh/Geometry/Elem_tags_list',
 'mesh_ElTagsTypeList': '/test_mesh/Geometry/Elem_tag_type_list',
 'mesh_ElemTags': '/test_mesh/Geometry/ElementsTags',
 'mesh_Elements': '/test_mesh/Geometry/Elements',
 'mesh_Field_index': '/test_mesh/Field_index',
 'mesh_Geometry': '/test_mesh/Geometry',
 'mesh_NodeTags': '/test_mesh/Geometry/NodeTags',
 'mesh_NodeTagsList': '/test_mesh/Geometry/Node_tags_list',
 'mesh_Nodes': '/test_mesh/Geometry/Nodes',
 'mesh_Nodes_ID': '/test_mesh/Geometry/Nodes_ID',
 'mesh_Test_field1': '/test_mesh/Test_field1',
 'mesh_Test_field2': '/test_mesh/Test_field2',
 'mesh_Test_field3': '/test_mesh/Test_field3',
 'mesh_Test_field4': '/test_mesh/Test_field4'}

You should see the dictionary keys that are names of data items, and associated values, that are hdf5 pathes. You can see also data item Names at the end of their Pathes. The data item aliases are also stored in a dictionary, that is an attribute of the class, named aliases:

[19]:

data.aliases

[19]:

{}

You can see that this dictionary contains keys only for data item that have additional names, and also that those keys are the data item indexnames.

The dataset index can be plotted together with the aliases, with a prettier aspect, by calling the method print_index:

[20]:

data.print_index()

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : array                                     H5_Path : /test_group/test_array
         Name : group                                     H5_Path : /test_group
         Name : image                                     H5_Path : /test_image
         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field
         Name : mesh                                      H5_Path : /test_mesh
         Name : mesh_ElTagsList                           H5_Path : /test_mesh/Geometry/Elem_tags_list
         Name : mesh_ElTagsTypeList                       H5_Path : /test_mesh/Geometry/Elem_tag_type_list
         Name : mesh_ElemTags                             H5_Path : /test_mesh/Geometry/ElementsTags
         Name : mesh_Elements                             H5_Path : /test_mesh/Geometry/Elements
         Name : mesh_Field_index                          H5_Path : /test_mesh/Field_index
         Name : mesh_Geometry                             H5_Path : /test_mesh/Geometry
         Name : mesh_NodeTags                             H5_Path : /test_mesh/Geometry/NodeTags
         Name : mesh_NodeTagsList                         H5_Path : /test_mesh/Geometry/Node_tags_list
         Name : mesh_Nodes                                H5_Path : /test_mesh/Geometry/Nodes
         Name : mesh_Nodes_ID                             H5_Path : /test_mesh/Geometry/Nodes_ID
         Name : mesh_Test_field1                          H5_Path : /test_mesh/Test_field1
         Name : mesh_Test_field2                          H5_Path : /test_mesh/Test_field2
         Name : mesh_Test_field3                          H5_Path : /test_mesh/Test_field3
         Name : mesh_Test_field4                          H5_Path : /test_mesh/Test_field4

This method prints the content of the dataset Index, with a given depth and from a specific root. The depth is the number of parents that a data item has. The root Group has thus a depth of 0, its children a depth of 1, the children of its children a depth of 2, and so on… The local root argument can be changed, to print only the Index for data items that are children of a specific group. When used without arguments, print_index uses a depth of 3 and the dataset root as default settings.

As you can see, our dataset already contains some data items. We can already identify at least 3 HDF5 Groups (test_group, test_image, test_group), as they have childrens, and a lot of other data items.

Let us try different to print Indexes with different parameters. To start, Let us try to print the Index from a different local root, for instance the group with the path /test_image. The way to do it is to use the local_root argument. We will hence give it the value of the /test_image path.

[21]:

data.print_index(local_root="/test_image")

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/test_image`

         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field

The print_index method local root arguments needs the name of the Group whose children Index must be printed. As explained in section II, you may use for this other identificators than its Path. Let us try its Name (last part of its path), which is test_image, or its Indexname, which is image:

[22]:

data.print_index(local_root="test_image")
data.print_index(local_root="image")

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `test_image`

         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `image`

         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field

As you can see, the result is the same in the 3 cases.

Let us now try to print the dataset Index with a maximal data item depth of 2, using the max_depth argument:

[23]:

data.print_index(max_depth=2)

Dataset Content Index :
------------------------:
index printed with max depth `2` and under local root `/`

         Name : array                                     H5_Path : /test_group/test_array
         Name : group                                     H5_Path : /test_group
         Name : image                                     H5_Path : /test_image
         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field
         Name : mesh                                      H5_Path : /test_mesh
         Name : mesh_Field_index                          H5_Path : /test_mesh/Field_index
         Name : mesh_Geometry                             H5_Path : /test_mesh/Geometry
         Name : mesh_Test_field1                          H5_Path : /test_mesh/Test_field1
         Name : mesh_Test_field2                          H5_Path : /test_mesh/Test_field2
         Name : mesh_Test_field3                          H5_Path : /test_mesh/Test_field3
         Name : mesh_Test_field4                          H5_Path : /test_mesh/Test_field4

Of course, you can combine those two arguments:

[24]:

data.print_index(max_depth=2, local_root='mesh')

Dataset Content Index :
------------------------:
index printed with max depth `2` and under local root `mesh`

         Name : mesh_Field_index                          H5_Path : /test_mesh/Field_index
         Name : mesh_Geometry                             H5_Path : /test_mesh/Geometry
         Name : mesh_Test_field1                          H5_Path : /test_mesh/Test_field1
         Name : mesh_Test_field2                          H5_Path : /test_mesh/Test_field2
         Name : mesh_Test_field3                          H5_Path : /test_mesh/Test_field3
         Name : mesh_Test_field4                          H5_Path : /test_mesh/Test_field4

The print_index method is usefull to get a glimpse of the content and organization of the whole dataset, or some part of it, and to quickly see the short indexnames or aliases that you can use to refer to data items.

To add aliases to data items or Groups, you can use the add_alias method.

The Index allows to quickly see the internal structure of your dataset, however, it does not provide detailed information on the data items. We will now see how to retrieve it with the SampleData class.

2- The Dataset content¶

The SampleData class provides a method to print an organized and detailed overview of the data items in the dataset, the print_dataset_content method. Let us see what the methods prints when called with no arguments:

[25]:

data.print_dataset_content()

Printing dataset content with max depth 3

****** DATA SET CONTENT ******
 -- File: test_sampledata_ref.h5
 -- Size:     1.271 Mb
 -- Data Model Class: SampleData

 GROUP /
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
        This is a test dataset created by the SampleData class unit tests.

         * sample_name : test_sample
 -- Childrens : Index, test_group, test_image, test_mesh,
----------------
************************************************


 GROUP test_group
=====================
 -- Parent Group : /
 -- Group attributes :
         * group_type : Group
 -- Childrens : test_array,
----------------
****** Group /test_group CONTENT ******

 NODE: /test_group/test_array
====================
 -- Parent Group : test_group
 -- Node name : test_array
 -- test_array attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_group/test_array (CArray(51,)) 'array'
 -- Compression options for node `/test_group/test_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------


************************************************


 GROUP test_image
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * dimension : [9 9 9]
         * empty : False
         * group_type : 3DImage
         * nodes_dimension : [10 10 10]
         * nodes_dimension_xdmf : [10 10 10]
         * origin : [-1. -1. -1.]
         * spacing : [0.2 0.2 0.2]
         * xdmf_gridname : test_image
 -- Childrens : Field_index, test_image_field,
----------------
****** Group /test_image CONTENT ******

 NODE: /test_image/Field_index
====================
 -- Parent Group : test_image
 -- Node name : Field_index
 -- Field_index attributes :
         * node_type : string array

 -- content : /test_image/Field_index (EArray(1,)) ''
 -- Compression options for node `/test_image/Field_index`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_image/test_image_field
====================
 -- Parent Group : test_image
 -- Node name : test_image_field
 -- test_image_field attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_image
         * transpose_indices : [2, 1, 0]
         * xdmf_fieldname : test_image_field
         * xdmf_gridname : test_image

 -- content : /test_image/test_image_field (CArray(10, 10, 10)) 'image_test_image_field'
 -- Compression options for node `/test_image/test_image_field`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (327, 10, 10)
 -- Node memory size :    63.867 Kb
----------------


************************************************


 GROUP test_mesh
=====================
 -- Parent Group : /
 -- Group attributes :
         * Elements_offset : [0]
         * Number_of_boundary_elements : [0]
         * Number_of_bulk_elements : [8]
         * Number_of_elements : [8]
         * Topology : Uniform
         * Xdmf_elements_code : ['Triangle']
         * description :
         * element_type : ['tri3']
         * elements_path : /test_mesh/Geometry/Elements
         * empty : False
         * group_type : 3DMesh
         * nodesID_path : /test_mesh/Geometry/Nodes_ID
         * nodes_path : /test_mesh/Geometry/Nodes
         * number_of_nodes : 6
         * xdmf_gridname : test_mesh
 -- Childrens : Geometry, Field_index, Test_field1, Test_field2, Test_field3, Test_field4,
----------------
****** Group /test_mesh CONTENT ******

 NODE: /test_mesh/Field_index
====================
 -- Parent Group : test_mesh
 -- Node name : Field_index
 -- Field_index attributes :
         * node_type : string array

 -- content : /test_mesh/Field_index (EArray(9,)) ''
 -- Compression options for node `/test_mesh/Field_index`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 GROUP Geometry
=====================
 -- Parent Group : test_mesh
 -- Group attributes :
         * group_type : Group
 -- Childrens : ElementsTags, NodeTags, Elem_tag_type_list, Elem_tags_list, Elements, Node_tags_list, Nodes, Nodes_ID,
----------------
****** Group /test_mesh/Geometry CONTENT ******

 NODE: /test_mesh/Geometry/Elem_tag_type_list
====================
 -- Parent Group : Geometry
 -- Node name : Elem_tag_type_list
 -- Elem_tag_type_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Elem_tag_type_list (EArray(3,)) ''
 -- Compression options for node `/test_mesh/Geometry/Elem_tag_type_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Elem_tags_list
====================
 -- Parent Group : Geometry
 -- Node name : Elem_tags_list
 -- Elem_tags_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Elem_tags_list (EArray(3,)) ''
 -- Compression options for node `/test_mesh/Geometry/Elem_tags_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Elements
====================
 -- Parent Group : Geometry
 -- Node name : Elements
 -- Elements attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Elements (CArray(24,)) 'mesh_Elements'
 -- Compression options for node `/test_mesh/Geometry/Elements`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------

 GROUP ElementsTags
=====================
 -- Parent Group : Geometry
 -- Group attributes :
         * group_type : Group
 -- Childrens : ET_2D, ET_Bottom, ET_Top, field_2D, field_Bottom, field_Top,
----------------
 GROUP NodeTags
=====================
 -- Parent Group : Geometry
 -- Group attributes :
         * group_type : Group
 -- Childrens : NT_Z0_plane, NT_out_of_plane, field_Z0_plane, field_out_of_plane,
----------------
 NODE: /test_mesh/Geometry/Node_tags_list
====================
 -- Parent Group : Geometry
 -- Node name : Node_tags_list
 -- Node_tags_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Node_tags_list (EArray(2,)) ''
 -- Compression options for node `/test_mesh/Geometry/Node_tags_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Nodes
====================
 -- Parent Group : Geometry
 -- Node name : Nodes
 -- Nodes attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Nodes (CArray(6, 3)) 'mesh_Nodes'
 -- Compression options for node `/test_mesh/Geometry/Nodes`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (2730, 3)
 -- Node memory size :    63.984 Kb
----------------

 NODE: /test_mesh/Geometry/Nodes_ID
====================
 -- Parent Group : Geometry
 -- Node name : Nodes_ID
 -- Nodes_ID attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Nodes_ID (CArray(6,)) 'mesh_Nodes_ID'
 -- Compression options for node `/test_mesh/Geometry/Nodes_ID`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------


 NODE: /test_mesh/Test_field1
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field1
 -- Test_field1 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field1
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field1 (CArray(6, 1)) 'mesh_Test_field1'
 -- Compression options for node `/test_mesh/Test_field1`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field2
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field2
 -- Test_field2 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field2
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field2 (CArray(6, 1)) 'mesh_Test_field2'
 -- Compression options for node `/test_mesh/Test_field2`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field3
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field3
 -- Test_field3 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field3
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field3 (CArray(8, 1)) 'mesh_Test_field3'
 -- Compression options for node `/test_mesh/Test_field3`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field4
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field4
 -- Test_field4 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field4
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field4 (CArray(8, 1)) 'mesh_Test_field4'
 -- Compression options for node `/test_mesh/Test_field4`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------


************************************************

As you can see, this method prints by increasing depth, detailed information on each Group and each data item of the dataset, with a maximum depth that can be specified with a max_depth argument (like the method print_index, that has a default value of 3). The printed output is structured by groups: each Group that has childrenis described by a first set of information, followed by a Group CONTENT string that describes all of its childrens.

For each data item, or Group, the method prints their name, path, type, attributes, content, compression settings and memory size if it is an array, children names if it is a Group. Hence, when calling this method, you can see the content and organization of the dataset, all the metadata attached to all data items, and the disk size occupied by each data item. As you progress through this tutorial, you will learn the meaning of those informations for all types of SampleData data items.

The print_dataset_content method has a short boolean argument, that allows to plot a condensed string representation of the dataset:

[26]:

data.print_dataset_content(short=True)

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)
     --NODE test_array: /test_group/test_array (data_array) (   64.000 Kb)

  |--GROUP test_image: /test_image (3DImage)
     --NODE Field_index: /test_image/Field_index (string array) (   63.999 Kb)
     --NODE test_image_field: /test_image/test_image_field (field_array) (   63.867 Kb)

  |--GROUP test_mesh: /test_mesh (3DMesh)
     --NODE Field_index: /test_mesh/Field_index (string array) (   63.999 Kb)
    |--GROUP Geometry: /test_mesh/Geometry (Group)
       --NODE Elem_tag_type_list: /test_mesh/Geometry/Elem_tag_type_list (string array) (   63.999 Kb)
       --NODE Elem_tags_list: /test_mesh/Geometry/Elem_tags_list (string array) (   63.999 Kb)
       --NODE Elements: /test_mesh/Geometry/Elements (data_array) (   64.000 Kb)
      |--GROUP ElementsTags: /test_mesh/Geometry/ElementsTags (Group)
      |--GROUP NodeTags: /test_mesh/Geometry/NodeTags (Group)
       --NODE Node_tags_list: /test_mesh/Geometry/Node_tags_list (string array) (   63.999 Kb)
       --NODE Nodes: /test_mesh/Geometry/Nodes (data_array) (   63.984 Kb)
       --NODE Nodes_ID: /test_mesh/Geometry/Nodes_ID (data_array) (   64.000 Kb)

     --NODE Test_field1: /test_mesh/Test_field1 (field_array) (   64.000 Kb)
     --NODE Test_field2: /test_mesh/Test_field2 (field_array) (   64.000 Kb)
     --NODE Test_field3: /test_mesh/Test_field3 (field_array) (   64.000 Kb)
     --NODE Test_field4: /test_mesh/Test_field4 (field_array) (   64.000 Kb)

This shorter print can be read easily, provide a complete and visual overview of the dataset organization, and indicate the memory size and type of each data item or Group in the dataset. The printed output distinguishes Group data items, from Nodes data item. The later regroups all types of arrays that may be stored in the HDF5 file.

Both short and long version of the print_dataset_content output can be written into a text file, if a filename is provided as value for the to_file method argument:

[27]:

data.print_dataset_content(short=True, to_file='dataset_information.txt')

[28]:

# Let us open the content of the created file, to see if the dataset information has been written in it:
%cat dataset_information.txt

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)
     --NODE test_array: /test_group/test_array (data_array) (   64.000 Kb)

  |--GROUP test_image: /test_image (3DImage)
     --NODE Field_index: /test_image/Field_index (string array) (   63.999 Kb)
     --NODE test_image_field: /test_image/test_image_field (field_array) (   63.867 Kb)

  |--GROUP test_mesh: /test_mesh (3DMesh)
     --NODE Field_index: /test_mesh/Field_index (string array) (   63.999 Kb)
    |--GROUP Geometry: /test_mesh/Geometry (Group)
       --NODE Elem_tag_type_list: /test_mesh/Geometry/Elem_tag_type_list (string array) (   63.999 Kb)
       --NODE Elem_tags_list: /test_mesh/Geometry/Elem_tags_list (string array) (   63.999 Kb)
       --NODE Elements: /test_mesh/Geometry/Elements (data_array) (   64.000 Kb)
      |--GROUP ElementsTags: /test_mesh/Geometry/ElementsTags (Group)
      |--GROUP NodeTags: /test_mesh/Geometry/NodeTags (Group)
       --NODE Node_tags_list: /test_mesh/Geometry/Node_tags_list (string array) (   63.999 Kb)
       --NODE Nodes: /test_mesh/Geometry/Nodes (data_array) (   63.984 Kb)
       --NODE Nodes_ID: /test_mesh/Geometry/Nodes_ID (data_array) (   64.000 Kb)

     --NODE Test_field1: /test_mesh/Test_field1 (field_array) (   64.000 Kb)
     --NODE Test_field2: /test_mesh/Test_field2 (field_array) (   64.000 Kb)
     --NODE Test_field3: /test_mesh/Test_field3 (field_array) (   64.000 Kb)
     --NODE Test_field4: /test_mesh/Test_field4 (field_array) (   64.000 Kb)

Note

The string representation of the SampleData class is composed of a first part, which is the output of the print_index method, and a second part, that is the output of the print_datase_content method (short output).

[29]:

# SampleData string representation :
print(data)

Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : array                                     H5_Path : /test_group/test_array
         Name : group                                     H5_Path : /test_group
         Name : image                                     H5_Path : /test_image
         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field
         Name : mesh                                      H5_Path : /test_mesh
         Name : mesh_ElTagsList                           H5_Path : /test_mesh/Geometry/Elem_tags_list
         Name : mesh_ElTagsTypeList                       H5_Path : /test_mesh/Geometry/Elem_tag_type_list
         Name : mesh_ElemTags                             H5_Path : /test_mesh/Geometry/ElementsTags
         Name : mesh_Elements                             H5_Path : /test_mesh/Geometry/Elements
         Name : mesh_Field_index                          H5_Path : /test_mesh/Field_index
         Name : mesh_Geometry                             H5_Path : /test_mesh/Geometry
         Name : mesh_NodeTags                             H5_Path : /test_mesh/Geometry/NodeTags
         Name : mesh_NodeTagsList                         H5_Path : /test_mesh/Geometry/Node_tags_list
         Name : mesh_Nodes                                H5_Path : /test_mesh/Geometry/Nodes
         Name : mesh_Nodes_ID                             H5_Path : /test_mesh/Geometry/Nodes_ID
         Name : mesh_Test_field1                          H5_Path : /test_mesh/Test_field1
         Name : mesh_Test_field2                          H5_Path : /test_mesh/Test_field2
         Name : mesh_Test_field3                          H5_Path : /test_mesh/Test_field3
         Name : mesh_Test_field4                          H5_Path : /test_mesh/Test_field4

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)
     --NODE test_array: /test_group/test_array (data_array) (   64.000 Kb)

  |--GROUP test_image: /test_image (3DImage)
     --NODE Field_index: /test_image/Field_index (string array) (   63.999 Kb)
     --NODE test_image_field: /test_image/test_image_field (field_array) (   63.867 Kb)

  |--GROUP test_mesh: /test_mesh (3DMesh)
     --NODE Field_index: /test_mesh/Field_index (string array) (   63.999 Kb)
    |--GROUP Geometry: /test_mesh/Geometry (Group)
       --NODE Elem_tag_type_list: /test_mesh/Geometry/Elem_tag_type_list (string array) (   63.999 Kb)
       --NODE Elem_tags_list: /test_mesh/Geometry/Elem_tags_list (string array) (   63.999 Kb)
       --NODE Elements: /test_mesh/Geometry/Elements (data_array) (   64.000 Kb)
      |--GROUP ElementsTags: /test_mesh/Geometry/ElementsTags (Group)
      |--GROUP NodeTags: /test_mesh/Geometry/NodeTags (Group)
       --NODE Node_tags_list: /test_mesh/Geometry/Node_tags_list (string array) (   63.999 Kb)
       --NODE Nodes: /test_mesh/Geometry/Nodes (data_array) (   63.984 Kb)
       --NODE Nodes_ID: /test_mesh/Geometry/Nodes_ID (data_array) (   64.000 Kb)

     --NODE Test_field1: /test_mesh/Test_field1 (field_array) (   64.000 Kb)
     --NODE Test_field2: /test_mesh/Test_field2 (field_array) (   64.000 Kb)
     --NODE Test_field3: /test_mesh/Test_field3 (field_array) (   64.000 Kb)
     --NODE Test_field4: /test_mesh/Test_field4 (field_array) (   64.000 Kb)

Now you now how to get a detailed overview of the dataset content. However, with large datasets, that may have a complex internal organization (many Groups, lot of data items and metadata…), the print_dataset_content return string can become very large. In this case, it becomes cumbersome to look for a specific information on a Group or on a particular data item. For this reason, the SampleData class provides methods to only print information on one or several data items of the dataset. They are presented in the next subsections.

3- Get information on data items¶

To get information on a specific data item (including Groups), you may use the print_node_info method. This method has 2 arguments: the name argument, and the short argument. As explain in section II, the name argument can be one of the 4 possible identifier that the target node can have (name, path, indexname or alias). The short argument has the same effect on the printed output as for the print_dataset_content method. Let us look at some examples. Its default value is False, i.e. the detailed output.

First, we will for instance want to have information on the Image Group that is stored in the dataset. The print_index and short print_dataset_content allowed us to see that this group has the name test_image, the indexname image, and the path /test_image. We will call the method with two of those identificators, and with the two possible values of the short argument.

[30]:

# Method called with data item indexname, and short output
data.print_node_info(nodename='image', short=True)

  |--GROUP test_image: /test_image (3DImage)

[31]:

# Method called with data item Path and long output
data.print_node_info(nodename='/test_image', short=False)


 GROUP test_image
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * dimension : [9 9 9]
         * empty : False
         * group_type : 3DImage
         * nodes_dimension : [10 10 10]
         * nodes_dimension_xdmf : [10 10 10]
         * origin : [-1. -1. -1.]
         * spacing : [0.2 0.2 0.2]
         * xdmf_gridname : test_image
 -- Childrens : Field_index, test_image_field,
----------------

You can observe that this method prints the same block of information that the one that appeared in the print_dataset_content method output, for the description of the test_image group. With this block, we can learn that this Group is a children of the root Group (‘/’), that it has two children that are the data items named Field_index and test_image_field. We can also see its attributes names and values. Here they provide information on the nature of the Group, that is a 3D image group, and on the topology of this image (for instance, that it is a 9x9x9 voxel image, of size 0.2).

Let us now apply this method on a data item that is not a group, the test_array data item. The print_index function instructed us that this node has an alias name, that is test_alias. We will use it here to get information on this node, to illustrate the use of the only type of node indicator that has not been used throughout this notebook:

[32]:

data.print_node_info('test_alias')

No group named test_alias

Here, we can learn which is the Node parent, what is the node Name, see that it has no attributes, see that it is an array of shape (51,), that it is not stored with data compression (compresion level to 0), and that it occupies a disk space of 64 Kb.

The print_node_info method is usefull to get information on a specific target, and avoir dealing with the sometimes too large output returned by the print_dataset_content method.

4- Get information on Groups content¶

The previous subsection showed that the print_node_info method applied on Groups returns only information about the group name, metadata and children names. The SampleData class offers a method that allows to print this information, with in addition, the detailed content of each children of the target group: the print_group_content method.

Let us try it on the Mesh group of our test dataset:

[33]:

data.print_group_content(groupname='test_mesh')


****** Group test_mesh CONTENT ******


****** Group test_mesh CONTENT ******

 NODE: /test_mesh/Field_index
====================
 -- Parent Group : test_mesh
 -- Node name : Field_index
 -- Field_index attributes :
         * node_type : string array

 -- content : /test_mesh/Field_index (EArray(9,)) ''
 -- Compression options for node `/test_mesh/Field_index`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 GROUP Geometry
=====================
 -- Parent Group : test_mesh
 -- Group attributes :
         * group_type : Group
 -- Childrens : ElementsTags, NodeTags, Elem_tag_type_list, Elem_tags_list, Elements, Node_tags_list, Nodes, Nodes_ID,
----------------
 NODE: /test_mesh/Test_field1
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field1
 -- Test_field1 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field1
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field1 (CArray(6, 1)) 'mesh_Test_field1'
 -- Compression options for node `/test_mesh/Test_field1`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field2
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field2
 -- Test_field2 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field2
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field2 (CArray(6, 1)) 'mesh_Test_field2'
 -- Compression options for node `/test_mesh/Test_field2`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field3
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field3
 -- Test_field3 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field3
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field3 (CArray(8, 1)) 'mesh_Test_field3'
 -- Compression options for node `/test_mesh/Test_field3`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field4
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field4
 -- Test_field4 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field4
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field4 (CArray(8, 1)) 'mesh_Test_field4'
 -- Compression options for node `/test_mesh/Test_field4`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

Obviously, this methods is identical to the print_dataset_content method, but restricted to one Group. As the first one, it has a to_file, a short and a max_depth arguments. These arguments work just as for print_dataset_content method, hence there use is not detailed here. the However, you may see one difference here. In the output printed above, we see that the test_mesh group has a Geometry children which is a group, but whose content is not printed. The print_group_content has indeed, by default, a non-recursive behavior. To get a recursive print of the group content, you must set the recursive argument to True:

[34]:

data.print_group_content('test_mesh', recursive=True)


****** Group test_mesh CONTENT ******


****** Group test_mesh CONTENT ******

 NODE: /test_mesh/Field_index
====================
 -- Parent Group : test_mesh
 -- Node name : Field_index
 -- Field_index attributes :
         * node_type : string array

 -- content : /test_mesh/Field_index (EArray(9,)) ''
 -- Compression options for node `/test_mesh/Field_index`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 GROUP Geometry
=====================
 -- Parent Group : test_mesh
 -- Group attributes :
         * group_type : Group
 -- Childrens : ElementsTags, NodeTags, Elem_tag_type_list, Elem_tags_list, Elements, Node_tags_list, Nodes, Nodes_ID,
----------------
****** Group /test_mesh/Geometry CONTENT ******

 NODE: /test_mesh/Geometry/Elem_tag_type_list
====================
 -- Parent Group : Geometry
 -- Node name : Elem_tag_type_list
 -- Elem_tag_type_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Elem_tag_type_list (EArray(3,)) ''
 -- Compression options for node `/test_mesh/Geometry/Elem_tag_type_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Elem_tags_list
====================
 -- Parent Group : Geometry
 -- Node name : Elem_tags_list
 -- Elem_tags_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Elem_tags_list (EArray(3,)) ''
 -- Compression options for node `/test_mesh/Geometry/Elem_tags_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Elements
====================
 -- Parent Group : Geometry
 -- Node name : Elements
 -- Elements attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Elements (CArray(24,)) 'mesh_Elements'
 -- Compression options for node `/test_mesh/Geometry/Elements`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------

 GROUP ElementsTags
=====================
 -- Parent Group : Geometry
 -- Group attributes :
         * group_type : Group
 -- Childrens : ET_2D, ET_Bottom, ET_Top, field_2D, field_Bottom, field_Top,
----------------
 GROUP NodeTags
=====================
 -- Parent Group : Geometry
 -- Group attributes :
         * group_type : Group
 -- Childrens : NT_Z0_plane, NT_out_of_plane, field_Z0_plane, field_out_of_plane,
----------------
 NODE: /test_mesh/Geometry/Node_tags_list
====================
 -- Parent Group : Geometry
 -- Node name : Node_tags_list
 -- Node_tags_list attributes :
         * node_type : string array

 -- content : /test_mesh/Geometry/Node_tags_list (EArray(2,)) ''
 -- Compression options for node `/test_mesh/Geometry/Node_tags_list`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_mesh/Geometry/Nodes
====================
 -- Parent Group : Geometry
 -- Node name : Nodes
 -- Nodes attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Nodes (CArray(6, 3)) 'mesh_Nodes'
 -- Compression options for node `/test_mesh/Geometry/Nodes`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (2730, 3)
 -- Node memory size :    63.984 Kb
----------------

 NODE: /test_mesh/Geometry/Nodes_ID
====================
 -- Parent Group : Geometry
 -- Node name : Nodes_ID
 -- Nodes_ID attributes :
         * empty : False
         * node_type : data_array

 -- content : /test_mesh/Geometry/Nodes_ID (CArray(6,)) 'mesh_Nodes_ID'
 -- Compression options for node `/test_mesh/Geometry/Nodes_ID`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------


 NODE: /test_mesh/Test_field1
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field1
 -- Test_field1 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field1
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field1 (CArray(6, 1)) 'mesh_Test_field1'
 -- Compression options for node `/test_mesh/Test_field1`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field2
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field2
 -- Test_field2 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Nodal_field
         * node_type : field_array
         * padding : None
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field2
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field2 (CArray(6, 1)) 'mesh_Test_field2'
 -- Compression options for node `/test_mesh/Test_field2`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field3
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field3
 -- Test_field3 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field3
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field3 (CArray(8, 1)) 'mesh_Test_field3'
 -- Compression options for node `/test_mesh/Test_field3`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_mesh/Test_field4
====================
 -- Parent Group : test_mesh
 -- Node name : Test_field4
 -- Test_field4 attributes :
         * empty : False
         * field_dimensionality : Scalar
         * field_type : Element_field
         * node_type : field_array
         * padding : bulk
         * parent_grid_path : /test_mesh
         * xdmf_fieldname : Test_field4
         * xdmf_gridname : test_mesh

 -- content : /test_mesh/Test_field4 (CArray(8, 1)) 'mesh_Test_field4'
 -- Compression options for node `/test_mesh/Test_field4`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192, 1)
 -- Node memory size :    64.000 Kb
----------------

As you can see, the information on the childrens of the Geometry group have been printed. Note that the max_depth argument is considered by this method as an absolute depth, meaning that you have to specify a depth that is at least the depth of the target group to see some output printed for the group content. The default maximum depth for this method is set to a very high value of 1000. Hence, print_group_content prints by defaults group contents with a recursive behavior. Note also that print_group_content with the recursive option, is equivalent to print_dataset_content but prints the dataset content as if the target group was the root.

5- Get information on grids¶

One of the SampleData class main functionalities is the manipulation and storage of spatially organized data, which is handled by Grid groups in the data model. Because they are usually key data for mechanical sample datasets, the SampleData class provides a method to print Grouo informations only for Grid groups, the print_grids_info method:

[35]:

data.print_grids_info()


 GROUP test_image
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * dimension : [9 9 9]
         * empty : False
         * group_type : 3DImage
         * nodes_dimension : [10 10 10]
         * nodes_dimension_xdmf : [10 10 10]
         * origin : [-1. -1. -1.]
         * spacing : [0.2 0.2 0.2]
         * xdmf_gridname : test_image
 -- Childrens : Field_index, test_image_field,
----------------
 GROUP test_mesh
=====================
 -- Parent Group : /
 -- Group attributes :
         * Elements_offset : [0]
         * Number_of_boundary_elements : [0]
         * Number_of_bulk_elements : [8]
         * Number_of_elements : [8]
         * Topology : Uniform
         * Xdmf_elements_code : ['Triangle']
         * description :
         * element_type : ['tri3']
         * elements_path : /test_mesh/Geometry/Elements
         * empty : False
         * group_type : 3DMesh
         * nodesID_path : /test_mesh/Geometry/Nodes_ID
         * nodes_path : /test_mesh/Geometry/Nodes
         * number_of_nodes : 6
         * xdmf_gridname : test_mesh
 -- Childrens : Geometry, Field_index, Test_field1, Test_field2, Test_field3, Test_field4,
----------------

This method also has the to_file and short arguments of the print_dataset_content method:

[36]:

data.print_grids_info(short=True, to_file='dataset_information.txt')
%cat dataset_information.txt

  |--GROUP test_image: /test_image (3DImage)
  |--GROUP test_mesh: /test_mesh (3DMesh)

6- Get xdmf tree content¶

As explained in the first Notebook of this User Guide, these grid Groups and associated data are stored in a dual format by the SampleData class. This dual format is composed of the dataset HDF5 file, and an associated XDMF file containing metadata, describing Grid groups topology, data types and fields.

The XDMF file is handled in the SampleData class by the xdmf_tree attribute, which is an instance of the lxml.etree class of the lxml package:

[37]:

data.xdmf_tree

[37]:

<lxml.etree._ElementTree at 0x7f4d3917c640>

The XDMF file is synchronized with the in-memory xdmf_tree argument when calling the sync method, or when deleting the SampleData instance. However, you may want to look at the content of the XDMF tree while you are interactively using your SampleData instance. In this case, you can use the print_xdmf method:

[38]:

data.print_xdmf()

<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd">
<Xdmf xmlns:xi="http://www.w3.org/2003/XInclude" Version="2.2">
  <Domain>
    <Grid Name="test_mesh" GridType="Uniform">
      <Geometry Type="XYZ">
        <DataItem Format="HDF" Dimensions="6  3" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Geometry/Nodes</DataItem>
      </Geometry>
      <Topology TopologyType="Triangle" NumberOfElements="8">
        <DataItem Format="HDF" Dimensions="24 " NumberType="Int" Precision="64">test_sampledata_ref.h5:/test_mesh/Geometry/Elements</DataItem>
      </Topology>
      <Attribute Name="field_Z0_plane" AttributeType="Scalar" Center="Node">
        <DataItem Format="HDF" Dimensions="6  1" NumberType="Int" Precision="8">test_sampledata_ref.h5:/test_mesh/Geometry/NodeTags/field_Z0_plane</DataItem>
      </Attribute>
      <Attribute Name="field_out_of_plane" AttributeType="Scalar" Center="Node">
        <DataItem Format="HDF" Dimensions="6  1" NumberType="Int" Precision="8">test_sampledata_ref.h5:/test_mesh/Geometry/NodeTags/field_out_of_plane</DataItem>
      </Attribute>
      <Attribute Name="field_2D" AttributeType="Scalar" Center="Cell">
        <DataItem Format="HDF" Dimensions="8  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Geometry/ElementsTags/field_2D</DataItem>
      </Attribute>
      <Attribute Name="field_Top" AttributeType="Scalar" Center="Cell">
        <DataItem Format="HDF" Dimensions="8  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Geometry/ElementsTags/field_Top</DataItem>
      </Attribute>
      <Attribute Name="field_Bottom" AttributeType="Scalar" Center="Cell">
        <DataItem Format="HDF" Dimensions="8  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Geometry/ElementsTags/field_Bottom</DataItem>
      </Attribute>
      <Attribute Name="Test_field1" AttributeType="Scalar" Center="Node">
        <DataItem Format="HDF" Dimensions="6  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Test_field1</DataItem>
      </Attribute>
      <Attribute Name="Test_field2" AttributeType="Scalar" Center="Node">
        <DataItem Format="HDF" Dimensions="6  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Test_field2</DataItem>
      </Attribute>
      <Attribute Name="Test_field3" AttributeType="Scalar" Center="Cell">
        <DataItem Format="HDF" Dimensions="8  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Test_field3</DataItem>
      </Attribute>
      <Attribute Name="Test_field4" AttributeType="Scalar" Center="Cell">
        <DataItem Format="HDF" Dimensions="8  1" NumberType="Float" Precision="64">test_sampledata_ref.h5:/test_mesh/Test_field4</DataItem>
      </Attribute>
    </Grid>
    <Grid Name="test_image" GridType="Uniform">
      <Topology TopologyType="3DCoRectMesh" Dimensions="10 10 10"/>
      <Geometry Type="ORIGIN_DXDYDZ">
        <DataItem Format="XML" Dimensions="3">-1. -1. -1.</DataItem>
        <DataItem Format="XML" Dimensions="3">0.2 0.2 0.2</DataItem>
      </Geometry>
      <Attribute Name="test_image_field" AttributeType="Scalar" Center="Node">
        <DataItem Format="HDF" Dimensions="10  10  10" NumberType="Int" Precision="16">test_sampledata_ref.h5:/test_image/test_image_field</DataItem>
      </Attribute>
    </Grid>
  </Domain>
</Xdmf>

As you can observe, you will get a print of the content of the XDMF file that would be written if you would close the file right now. You can observe that the XDMF file provides information on the grids that match those given by the Groups and Nodes attributes printed above with the previously studied method: the test image is a regular grid of 10x10x10 nodes, i.e. a 9x9x9 voxels grid. Only one field is defined on test_image, test_image_field, whereas two are defined on test_mesh.

This XDMF file can directly be opened in Paraview, if both file are closed. If any syntax or formatting issue is encountered when Paraview reads the XDMF file, it will return an error message and the data visualization will not be rendered. The print_xdmf method allows you to verify your XDMF data and syntax, to make sure that the data formatting is correct.

7- Get memory size of file and data items¶

SampleData is designed to create large datasets, with data items that can reprensent tens Gb of data or more. Being able to easily see and identify which data items use the most disk space is a crucial aspect for data management. Until now, with the method we have reviewed, we only have been able to print the Nodes disk sizes together with a lot of other information. In order to speed up this process, the SampleData class has one method that allow to directly query and print only the memory size of a Node, the get_node_disk_size method:

[39]:

data.get_node_disk_size(nodename='test_array')

Node test_array size on disk is    64.000 Kb

[39]:

(64.0, 'Kb')

As you can see, the default behavior of this method is to print a message indicating the Node disk size, but also to return a tuple containing the value of the disk size and its unit. If you want to print data in bytes, you may call this method with the convert argument set to False:

[40]:

data.get_node_disk_size(nodename='test_array', convert=False)

Node test_array size on disk is 65536.000 bytes

[40]:

(65536, 'bytes')

If you want to use this method to get a numerical value within a script, but do not want the class to print anything, you can use the print_flag argument:

[41]:

size, unit = data.get_node_disk_size(nodename='test_array', print_flag=False)
print(f'Printed by script: node size is {size} {unit}')

size, unit = data.get_node_disk_size(nodename='test_array', print_flag=False, convert=False)
print(f'Printed by script: node size is {size} {unit}')

Printed by script: node size is 64.0 Kb
Printed by script: node size is 65536 bytes

The disk size of the whole HDF5 file can also be printed/returned, using the get_file_disk_size method, that has the same print_flag and convert arguments:

[42]:

data.get_file_disk_size()

size, unit = data.get_file_disk_size(convert=False, print_flag=False)
print(f'\nPrinted by script: file size is {size} {unit}')

File size is     1.271 Mb for file
 test_sampledata_ref.h5

Printed by script: file size is 1333013 bytes

8- Get nodes/groups attributes (metadata)¶

Another central aspect of the SampleData class is the management of metadata, that can be attached to all Groups or Nodes of the dataset. Metadata comes in the form of HDF5 attributes, that are Name/Value pairs, and that we already encountered when exploring the outputs of methods like print_dataset_content, print_node_info…

Those methods print the Group/Node attributes together with other information. To only print the attributes of a given data item, you can use the print_node_attributes method:

[43]:

data.print_node_attributes(nodename='test_mesh')

 -- test_mesh attributes :
         * Elements_offset : [0]
         * Number_of_boundary_elements : [0]
         * Number_of_bulk_elements : [8]
         * Number_of_elements : [8]
         * Topology : Uniform
         * Xdmf_elements_code : ['Triangle']
         * description :
         * element_type : ['tri3']
         * elements_path : /test_mesh/Geometry/Elements
         * empty : False
         * group_type : 3DMesh
         * nodesID_path : /test_mesh/Geometry/Nodes_ID
         * nodes_path : /test_mesh/Geometry/Nodes
         * number_of_nodes : 6
         * xdmf_gridname : test_mesh

As you can see, this method prints a list of all data item attributes, with the format * Name : Value \n. It allows you to quickly see what attributes are stored together with a given data item, and their values.

If you want to get the value of a specific attribute, you can use the get_attribute method. It takes two arguments, the name of the attribute you want to retrieve, and the name of the data item where it is stored:

[44]:

Nnodes = data.get_attribute(attrname='number_of_nodes', nodename='test_mesh')
print(f'The mesh test_mesh has {Nnodes} nodes')

The mesh test_mesh has 6 nodes

You can also get all attributes of a data item as a dictionary. In this case, you just need to specify the name of the data item from which you want attributes, and use the get_dic_from_attributes method:

[45]:

mesh_attrs = data.get_dic_from_attributes(nodename='test_mesh')

for name, value in mesh_attrs.items():
    print(f' Attribute {name} is {value}')

 Attribute Elements_offset is [0]
 Attribute Number_of_boundary_elements is [0]
 Attribute Number_of_bulk_elements is [8]
 Attribute Number_of_elements is [8]
 Attribute Topology is Uniform
 Attribute Xdmf_elements_code is ['Triangle']
 Attribute description is
 Attribute element_type is ['tri3']
 Attribute elements_path is /test_mesh/Geometry/Elements
 Attribute empty is False
 Attribute group_type is 3DMesh
 Attribute nodesID_path is /test_mesh/Geometry/Nodes_ID
 Attribute nodes_path is /test_mesh/Geometry/Nodes
 Attribute number_of_nodes is 6
 Attribute xdmf_gridname is test_mesh

We have now seen how to explore all of types of information that a SampleData dataset may contain, individually or all together, interactively, from a Python console. Let us review now how to explore the content of SampleData datasets with external softwares.

IV - Visualize dataset contents with Vitables¶

All the information that you can get with all the methods presented in the previous section can also be accessed externally by opening the HDF5 dataset file with the Vitables software. This software is usually part of the Pytables package, that is a dependency of pymicro. You should be able to use it in a Python environement compatible with pymicro. If needed, you may refer to the Vitables website to find download and installations instructions for PyPi or conda: https://vitables.org/.

Vitables provide a graphical interface that allows you to browse through all your dataset data items, and access or modify their stored data and metadata values. You may either open Vitables and then open your HDF5 dataset file from the Vitables interface, or you can directly open Vitables to read a specfic file from command line, by running: vitables my_dataset_path.h5.

This command will work only if your dataset file is closed (if the SampleData instance still exists in your Python console, this will not work, you first need to delete your instance to close the files).

However, the SampleData class has a specific method to allowing to open your dataset with Vitables interactively, directly from your Python console: the method pause_for_visualization. As explained just above, this method closes the XDMF and HDF5 datasets, and runs in your shell the command vitables my_dataset_path.h5. Then, it freezes the interactive Python console and keep the dataset files closed, for as long as the Vitables software is running. When Vitables is shutdown, the SampleData class will reopen the HDF5 and XDMF files, synchronize with them and resume the interactive Python console.

Warning

When calling the pause_for_visualization method from a python console (ipython, Jupyter…), you may face environment issue leading to your shell not finding the proper Vitables software executable. To ensure that the right Vitables is found, the method can take an optional argument Vitables_path, which must be the path of the Vitables executable. If this argument is passed, the method will run, after closing the HDF5 and XDMF files, the command Vitables_path my_dataset_path.hdf5

Note

The method is not called here to allow automatic execution of the Notebook when building the documentation on a platform that do not have Vitables available.

[46]:

# uncomment to test
# data.pause_for_visualization(Vitables=True, Vitables_path='Path_to_Vitables_executable')

Please refer to the Vitables documentation, that can be downloaded here https://sourceforge.net/projects/vitables/files/ViTables-3.0.0/, to learn how to browse through your HDF5 file. The Vitables software is very intuitive, you will see that it is provides a usefull and convenient tool to explore your SampleData datasets outside of your interactive Python consoles.

V - Visualize datasets grids and fields with Paraview¶

As for Vitables, the pause_for_visualization method allows you to open your dataset with Paraview, interactively from a Python console.

Paraview will provide you with a very powerfull visualization tool to render your spatially organized data (grids) stored in your datasets. Unlike Vitables, Paraview must can read the XDMF format. Hence, if you want to open your dataset with Paraview, outside of a Python console, make sure that the HFD5 and XDMF file are not opened by another program, and run in your shell the command: paraview my_dataset_path.xdmf.

As you may have guessed, the pause_for_visualization method, when called interactively with the Paraview argument set to True, will close both files, and run this command, just like for the Vitables option. The datasets will remained closed and the Python console freezed for as long as you will keep the Paraview software running. When you will shutdown Paraview, the SampleData class will reopen the HDF5 and XDMF files, synchronize with them and resume the interactive Python console.

Warning

When calling the pause_for_visualization method from a python console (ipython, Jupyter…), you may face environment issue leading to your shell not finding the proper Paraview software executable. To ensure that the right Paraview is found, the method can take an optional argument Paraview_path, which must be the path of the Vitables executable. If this argument is passed, the method will run, after closing the HDF5 and XDMF files, the command Paraview_path my_dataset_path.xdmf

[47]:

# Like for Vitables --> uncomment to test
# data.pause_for_visualization(Paraview=True, Paraview_path='Path_to_Paraview_executable')

Note

It is recommended to use a recent version of the Paraview software to visualize SampleData datasets (>= 5.0). When opening the XDMF file, Paraview may ask you to choose a specific file reader. It is recommended to choose the XDMF_reader, and not the Xdmf3ReaderT, or Xdmf3ReaderS.

VI - Using command line tools¶

You can also examine the content of your HDF5 datasets with generoc HDF5 command line tools, such as h5ls or h5dump:

Warning

In the following, executable programs that come with the HDF5 library and the Pytables package are used. If you are executing this notebook with Jupyter, you may not be able to have those executable in your path, if your environment is not suitably set. A workaround consist in finding the absolute path of the executable, and replacing the executable name in the following cells by its full path. For instance, replace

`ptdump file.h5`

with

`/full/path/to/ptdump file.h5`

To find this full path, you can run in your shell the command which ptdump. Of course, the same applies for h5ls and h5dump.

Note

Most code lines below are commented as they produce very large outputs, that otherwise pollute the documentation if they are included in the automatic build process. Uncomment them to test them if you are using interactively these notebooks !

For that, you must first close your dataset. If you don’t, this tools will not be able to open the HDF5 file as it is opened by the SampleData class in the Python interpretor.

[48]:

del data

[49]:

# raw output of H5ls --> prints the childrens of the file root group
!h5ls ../data/test_sampledata_ref.h5

Index                    Group
test_group               Group
test_image               Group
test_mesh                Group

[50]:

# recursive output of h5ls (-r option) -->  prints all data items
!h5ls -r ../data/test_sampledata_ref.h5

/                        Group
/Index                   Group
/Index/Aliases           Group
/test_group              Group
/test_group/test_array   Dataset {51/8192}
/test_image              Group
/test_image/Field_index  Dataset {1/Inf}
/test_image/test_image_field Dataset {10/327, 10, 10}
/test_mesh               Group
/test_mesh/Field_index   Dataset {9/Inf}
/test_mesh/Geometry      Group
/test_mesh/Geometry/Elem_tag_type_list Dataset {3/Inf}
/test_mesh/Geometry/Elem_tags_list Dataset {3/Inf}
/test_mesh/Geometry/Elements Dataset {24/8192}
/test_mesh/Geometry/ElementsTags Group
/test_mesh/Geometry/ElementsTags/ET_2D Dataset {8/8192}
/test_mesh/Geometry/ElementsTags/ET_Bottom Dataset {4/8192}
/test_mesh/Geometry/ElementsTags/ET_Top Dataset {4/8192}
/test_mesh/Geometry/ElementsTags/field_2D Dataset {8/8192, 1}
/test_mesh/Geometry/ElementsTags/field_Bottom Dataset {8/8192, 1}
/test_mesh/Geometry/ElementsTags/field_Top Dataset {8/8192, 1}
/test_mesh/Geometry/NodeTags Group
/test_mesh/Geometry/NodeTags/NT_Z0_plane Dataset {4/8192}
/test_mesh/Geometry/NodeTags/NT_out_of_plane Dataset {2/8192}
/test_mesh/Geometry/NodeTags/field_Z0_plane Dataset {6/65536, 1}
/test_mesh/Geometry/NodeTags/field_out_of_plane Dataset {6/65536, 1}
/test_mesh/Geometry/Node_tags_list Dataset {2/Inf}
/test_mesh/Geometry/Nodes Dataset {6/2730, 3}
/test_mesh/Geometry/Nodes_ID Dataset {6/8192}
/test_mesh/Test_field1   Dataset {6/8192, 1}
/test_mesh/Test_field2   Dataset {6/8192, 1}
/test_mesh/Test_field3   Dataset {8/8192, 1}
/test_mesh/Test_field4   Dataset {8/8192, 1}

[51]:

# recursive (-r) and detailed (-d) output of h5ls --> also print the content of the data arrays
# !h5ls -rd ../data/test_sampledata_ref.h5

[52]:

# output of h5dump:
# !h5dump ../data/test_sampledata_ref.h5

As you can see if you uncommented and executed this cell, h5dump prints a a fully detailed description of your dataset: organization, data types, item names and path, and item content (value stored in arrays). As it produces a very large output, it may be convenient to write its output in a file:

[53]:

# !h5dump ../data/test_sampledata_ref.h5 > test_dump.txt

[54]:

# !cat test_dump.txt

You can also use the command line tool of the Pytables software ptdump, that also takes as argument the HDF5 file, and has two command options, the verbose mode -v, and the detailed mode -d:

[55]:

# uncomment to test !
# !ptdump ../data/test_sampledata_ref.h5

[56]:

# uncomment to test!
# !ptdump -v ../data/test_sampledata_ref.h5

[57]:

# uncomment to test !
# !ptdump -d ../data/test_sampledata_ref.h5

This second tutorial of the SampleData User Guide is now finished. You should now be able to easily find all the information you are interested in from a SampleData dataset !