3 - Creating and retrieving basic data items

This second Notebook will introduce you to:

  1. interactively get data stored in a data item

  2. creating and getting Group data items

  3. creating and getting HDF5 attributes (metadata)

  4. creating and getting data arrays

  5. creating and getting string arrays

  6. creating and getting structured arrays

  7. removing data items from datasets

Note

Throughout this notebook, it will be assumed that the reader is familiar with the overview of the SampleData file format and data model presented in the first notebook of this User Guide of this User Guide.

I - Interactively get data from data items

This first section will present the generic ways to get the data contained into a data item in a SampleData dataset. Like in the previous tutorial, we will use the reference dataset used for the pymicro.core package unit tests:

[1]:
from pymicro.core.samples import SampleData as SD
[2]:
from config import PYMICRO_EXAMPLES_DATA_DIR # import file directory path
import os
dataset_file = os.path.join(PYMICRO_EXAMPLES_DATA_DIR, 'test_sampledata_ref') # test dataset file path
data = SD(filename=dataset_file)

We will start by printing the content of the dataset and its Index (see previous tutorial, section III), to see which data we could load from the dataset:

[3]:
data.print_dataset_content(short=True)
data.print_index()
Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)
     --NODE test_array: /test_group/test_array (data_array) (   64.000 Kb)

  |--GROUP test_image: /test_image (3DImage)
     --NODE Field_index: /test_image/Field_index (string array) (   63.999 Kb)
     --NODE test_image_field: /test_image/test_image_field (field_array) (   63.867 Kb)

  |--GROUP test_mesh: /test_mesh (3DMesh)
     --NODE Field_index: /test_mesh/Field_index (string array) (   63.999 Kb)
    |--GROUP Geometry: /test_mesh/Geometry (Group)
       --NODE Elem_tag_type_list: /test_mesh/Geometry/Elem_tag_type_list (string array) (   63.999 Kb)
       --NODE Elem_tags_list: /test_mesh/Geometry/Elem_tags_list (string array) (   63.999 Kb)
       --NODE Elements: /test_mesh/Geometry/Elements (data_array) (   64.000 Kb)
      |--GROUP ElementsTags: /test_mesh/Geometry/ElementsTags (Group)
      |--GROUP NodeTags: /test_mesh/Geometry/NodeTags (Group)
       --NODE Node_tags_list: /test_mesh/Geometry/Node_tags_list (string array) (   63.999 Kb)
       --NODE Nodes: /test_mesh/Geometry/Nodes (data_array) (   63.984 Kb)
       --NODE Nodes_ID: /test_mesh/Geometry/Nodes_ID (data_array) (   64.000 Kb)

     --NODE Test_field1: /test_mesh/Test_field1 (field_array) (   64.000 Kb)
     --NODE Test_field2: /test_mesh/Test_field2 (field_array) (   64.000 Kb)
     --NODE Test_field3: /test_mesh/Test_field3 (field_array) (   64.000 Kb)
     --NODE Test_field4: /test_mesh/Test_field4 (field_array) (   64.000 Kb)


Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : array                                     H5_Path : /test_group/test_array
         Name : group                                     H5_Path : /test_group
         Name : image                                     H5_Path : /test_image
         Name : image_Field_index                         H5_Path : /test_image/Field_index
         Name : image_test_image_field                    H5_Path : /test_image/test_image_field
         Name : mesh                                      H5_Path : /test_mesh
         Name : mesh_ElTagsList                           H5_Path : /test_mesh/Geometry/Elem_tags_list
         Name : mesh_ElTagsTypeList                       H5_Path : /test_mesh/Geometry/Elem_tag_type_list
         Name : mesh_ElemTags                             H5_Path : /test_mesh/Geometry/ElementsTags
         Name : mesh_Elements                             H5_Path : /test_mesh/Geometry/Elements
         Name : mesh_Field_index                          H5_Path : /test_mesh/Field_index
         Name : mesh_Geometry                             H5_Path : /test_mesh/Geometry
         Name : mesh_NodeTags                             H5_Path : /test_mesh/Geometry/NodeTags
         Name : mesh_NodeTagsList                         H5_Path : /test_mesh/Geometry/Node_tags_list
         Name : mesh_Nodes                                H5_Path : /test_mesh/Geometry/Nodes
         Name : mesh_Nodes_ID                             H5_Path : /test_mesh/Geometry/Nodes_ID
         Name : mesh_Test_field1                          H5_Path : /test_mesh/Test_field1
         Name : mesh_Test_field2                          H5_Path : /test_mesh/Test_field2
         Name : mesh_Test_field3                          H5_Path : /test_mesh/Test_field3
         Name : mesh_Test_field4                          H5_Path : /test_mesh/Test_field4

SampleData datasets can contain many types of data items, with different formats, shapes and contents. For this reason, the class provides specific methods to get each type of data item. They will be presented, for each type of data, in the next sections of this Notebook.

In addition, the SampleData class provides two generic mechanisms to retrieve data, that only require the name of the targeted data item. They automatically try to identify which type of data matches the provided name, and call the adapted specific “get” method. They are usefull to quickly and easily get data, but do not allow to access all options offered by specific methods. We will start by reviewing these generic data access mechanisms.

Dictionary like access to data

The first way to get a data item is to use the SampleData class instance as if ot was a dictionary whose keys and values were respectively the data item Names and content. For a given data item, as dictionary key, you can use one of its 4 possible identificators, Name, Path, Indexname or Aliases (see tutorial 1, section II).

Let us see an example, by trying to get the array test_array of our dataset:

[4]:
# get array in a variable, using the data item Name
array = data['test_array']
print(array.shape,'\n', array)
print(type(array))

# directly print array, getting it with its Indexname
print('\n',data['array'])
(51,)
 [-1.00000000e+00 -9.39062506e-01 -8.81618592e-01 -8.27271946e-01
 -7.75679511e-01 -7.26542528e-01 -6.79599298e-01 -6.34619298e-01
 -5.91398351e-01 -5.49754652e-01 -5.09525449e-01 -4.70564281e-01
 -4.32738642e-01 -3.95928009e-01 -3.60022153e-01 -3.24919696e-01
 -2.90526857e-01 -2.56756360e-01 -2.23526483e-01 -1.90760202e-01
 -1.58384440e-01 -1.26329378e-01 -9.45278312e-02 -6.29146673e-02
 -3.14262660e-02  1.11022302e-16  3.14262660e-02  6.29146673e-02
  9.45278312e-02  1.26329378e-01  1.58384440e-01  1.90760202e-01
  2.23526483e-01  2.56756360e-01  2.90526857e-01  3.24919696e-01
  3.60022153e-01  3.95928009e-01  4.32738642e-01  4.70564281e-01
  5.09525449e-01  5.49754652e-01  5.91398351e-01  6.34619298e-01
  6.79599298e-01  7.26542528e-01  7.75679511e-01  8.27271946e-01
  8.81618592e-01  9.39062506e-01  1.00000000e+00]
<class 'numpy.ndarray'>

 [-1.00000000e+00 -9.39062506e-01 -8.81618592e-01 -8.27271946e-01
 -7.75679511e-01 -7.26542528e-01 -6.79599298e-01 -6.34619298e-01
 -5.91398351e-01 -5.49754652e-01 -5.09525449e-01 -4.70564281e-01
 -4.32738642e-01 -3.95928009e-01 -3.60022153e-01 -3.24919696e-01
 -2.90526857e-01 -2.56756360e-01 -2.23526483e-01 -1.90760202e-01
 -1.58384440e-01 -1.26329378e-01 -9.45278312e-02 -6.29146673e-02
 -3.14262660e-02  1.11022302e-16  3.14262660e-02  6.29146673e-02
  9.45278312e-02  1.26329378e-01  1.58384440e-01  1.90760202e-01
  2.23526483e-01  2.56756360e-01  2.90526857e-01  3.24919696e-01
  3.60022153e-01  3.95928009e-01  4.32738642e-01  4.70564281e-01
  5.09525449e-01  5.49754652e-01  5.91398351e-01  6.34619298e-01
  6.79599298e-01  7.26542528e-01  7.75679511e-01  8.27271946e-01
  8.81618592e-01  9.39062506e-01  1.00000000e+00]

As you can see, when used as a dictionary, the class returned the content of the test_array data item as a numpy array.

Attribute like access to data

In addition to the dictionary like access, you can also get data items as if they were attributes of the class, using their Name, Indexname or Alias:

[5]:
print(data.array)
[-1.00000000e+00 -9.39062506e-01 -8.81618592e-01 -8.27271946e-01
 -7.75679511e-01 -7.26542528e-01 -6.79599298e-01 -6.34619298e-01
 -5.91398351e-01 -5.49754652e-01 -5.09525449e-01 -4.70564281e-01
 -4.32738642e-01 -3.95928009e-01 -3.60022153e-01 -3.24919696e-01
 -2.90526857e-01 -2.56756360e-01 -2.23526483e-01 -1.90760202e-01
 -1.58384440e-01 -1.26329378e-01 -9.45278312e-02 -6.29146673e-02
 -3.14262660e-02  1.11022302e-16  3.14262660e-02  6.29146673e-02
  9.45278312e-02  1.26329378e-01  1.58384440e-01  1.90760202e-01
  2.23526483e-01  2.56756360e-01  2.90526857e-01  3.24919696e-01
  3.60022153e-01  3.95928009e-01  4.32738642e-01  4.70564281e-01
  5.09525449e-01  5.49754652e-01  5.91398351e-01  6.34619298e-01
  6.79599298e-01  7.26542528e-01  7.75679511e-01  8.27271946e-01
  8.81618592e-01  9.39062506e-01  1.00000000e+00]
[6]:
# get the test array in a variable with the attribute like access
array2 = data.test_array

# Test if both array are equal
import numpy as np
np.all(array == array2)
[6]:
True

Now that these two generic mechanisms have been presented, we will review the basic data item types that can compose your datasets, and how to create or retrieve them with specific class methods. The more complex data types, representing grids and fields, will not be presented here. Dedicated tutorials follow this one to introduce you to these more advanced features of the class.

To do that, we will create our own dataset. So first, we have to close the test dataset:

[7]:
data.set_verbosity(True)
del data

Deleting DataSample object
.... Storing content index in test_sampledata_ref.h5:/Index attributes
.... writing xdmf file : test_sampledata_ref.xdmf
.... flushing data in file test_sampledata_ref.h5
File test_sampledata_ref.h5 synchronized with in memory data tree

Dataset and Datafiles closed

II - HDF5 Groups

We will start by HDF5 Groups, are they are the most simple type of data item in the data model. Groups have 2 functions within SampleData datasets: 1. organize data by containing other data items 2. organizing metadata by containing attributes

First, we will start by creating a dataset, with the verbose and autodelete options, so that we get information on the actions performed by the class, and so that our dataset is removed once we end this tutorial. We also set the overwrite_hdf5 option to True, in case tutorial_dataset.h5/xdmf exist in the current work directory (created and not removed by another tutorial file for instance).

[8]:
data = SD(filename='tutorial_dataset', sample_name='test_sample', verbose = True, autodelete=True, overwrite_hdf5=True)

-- File "tutorial_dataset.h5" not found : file created
-- File "tutorial_dataset.xdmf" not found : file created
.... writing xdmf file : tutorial_dataset.xdmf

Minimal data model initialization....

Minimal data model initialization done

.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Note that the verbose mode of the class informed us that dataset files with the required name already existed and where hence deleted.

For now, our dataset is empty and thus contains only the Root group '/'. To create a group, you must use the add_group method. It has 4 arguments: * groupname: the group to create will have this Name * location: indicate the parent group that will contain the created group. By defaults it’s value is '/', the root group * indexname: the group to create will have this Indexname. If none is provided, the indexname will be duplicated from the Name * replace: if a group with the same Name exists, the SampleData class will remove it to create the new one only if this argument is set to True. If not, the group will not be created. By default, it is set to False

Let us create a test group, from the root group. We will call it test_group, and give it a short indexname, for instance testG:

[9]:
data.add_group(groupname='test_group', location='/', indexname='testG')

Creating Group group `test_group` in file tutorial_dataset.h5 at /
[9]:
/test_group (Group) 'test_group'
  children := []

As you can see, the verbose mode prints the confirmation that the group has been created. You can observe also that the method return a Group object, that is an instance of a class from the Pytables package. In practice, you do not need to use Pytables object when working with the SampleData class. However, if you want to use them, you can find the documentation of the group class here.

Let us now look at the content of our dataset:

[10]:
print(data)
Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : testG                                     H5_Path : /test_group

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)

The group has indeed been created, with the right Path, Name and Indexname.

Let us try to create a new group, with the same path and name, but a different indexname:

[11]:
import tables

# We run the command in a try structure as it will raise an exception
try:
    data.add_group(groupname='test_group', location='/', indexname='Gtest')
except tables.NodeError as NodeError:
    print(NodeError)
Group test_group already exists. Set arg. `replace=True` to replace it by a new Group.

We got an error, more specifically a NodeError linked to the HDF5 dataset structure, as the group already exists. As explained earlier, if the replace argument is set to False (the default value), the class protects the pre-existing data and do not create the new data item. As we are sure that we want to overwrite the Group, we must set the correct argument value:

[12]:
group = data.add_group(groupname='test_group', location='/', indexname='Gtest', replace=True)

Removing group test_group to replace it by new one.

Removing  node /test_group in content index....

item testG : /test_group removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node /test_group sucessfully removed

Creating Group group `test_group` in file tutorial_dataset.h5 at /

This time we got no error, the previously created group has been deleted, and the new one created. We also assigned the return Group object to the variable group. Let us verify:

[13]:
print(data)
print(group)
Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : Gtest                                     H5_Path : /test_group

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)

/test_group (Group) 'test_group'

As explained in the first section, to get this group data item from the dataset, you can use the dictionary or attribute like data item access. Both mechanisms will call the get_node SampleData method.

This method is meant to return simple data items under the form of a numpy array, or a Pytables node/group format. It takes one of the 4 possible data item identificators (name, indexname, path or alias) as argument. In this case, it should return a Group object, that should be the same as the one return by the add_group method.

Let us verify it:

[14]:
# get the Group object
group2 = data.get_node('test_group')
print(group2)

# Group objects can be compared:
print(f' Does the two Group instances represent the same group ? {group == group2}')

# get again with dictionary like access
group2 = data['test_group']
print(group2)

# get again with attribute like access
group2 = data.Gtest
print(group2)
/test_group (Group) 'test_group'
 Does the two Group instances represent the same group ? True
/test_group (Group) 'test_group'
/test_group (Group) 'test_group'

As you can see, the get_node method and the attribute/dictionary like access return the same Group instance, taht was return by the add_group method.

Now you now how to create and retrieve Groups.

III - HDF5 attributes

As explained in the previous section, one of the use of Groups can be to contain Attributes, to organize metadata. In this section, we will see how to create attributes. It is actually very simple to add attribute to a data item with SampleData.

Let us see an example. Suppose that we want to add to our new group test_group the name of the tutorial notebook file that created it, and the tutorial section where it is created. We will start by creating a dictionary gathering this metadata:

[15]:
metadata = {'tutorial_file':'2_SampleData_basic_data_items.ipynb',
            'tutorial_section':'Section II'}

Then, we simply add this metadata to test_group with the add_attributes method:

[16]:
data.add_attributes(metadata, nodename='Gtest')

Let us look at the content of your group to verify that the attributes have been added:

[17]:
data.print_node_attributes('Gtest')
 -- test_group attributes :
         * group_type : Group
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section II

As you can see, the metadata that we just added to the Group is printed. You can also observe that the Group already had metadata, the group_type attribute. Here this attribute is Group, which indicates that it is a standard HDF5 group, and not a Grid group (image or mesh, see data model here).

The methods to get attributes have been presented in the tutorial 1 (sec. II-8). We will reuse them here:

[18]:
tutorial_file = data.get_attribute('tutorial_file','Gtest')
tutorial_sec = data.get_attribute('tutorial_section', 'Gtest')
print(f'The group Gtest has been created with the notebook {tutorial_file}, at the section {tutorial_sec}')
The group Gtest has been created with the notebook 2_SampleData_basic_data_items.ipynb, at the section Section II
[19]:
Gtest_attrs = data.get_dic_from_attributes('Gtest')
print(f'The group Gtest has been created with the notebook {Gtest_attrs["tutorial_file"]},'
      f' at the section {Gtest_attrs["tutorial_section"]}')
The group Gtest has been created with the notebook 2_SampleData_basic_data_items.ipynb, at the section Section II

To conclude this section on attribute, we will introduce the set_description and get_description methods. These methods are a shortcut to create or get the content of a specific data item attribute, description. This attribute is intended to be a string of one or a few sentences that explains the content, origin and purpose of the data item:

[20]:
data.set_description(description="Just a short example to see how to use descriptions. This is a description.",
                     node='Gtest')
data.print_node_attributes('Gtest')
 -- test_group attributes :
         * description : Just a short example to see how to use descriptions. This is a description.
         * group_type : Group
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section II

[21]:
print(data.get_description('Gtest'))
Just a short example to see how to use descriptions. This is a description.

IV - Data arrays

Now that we can create Group and organize our datasets, we will want to add actual data in it.

The most common form of scientific data is an array of numbers. The most common and powerfull Python package used to manipulate large numeric arrays is the Numpy package. Through its implementation, and the support of the Pytables package, the SampleData class can directly load and return Numpy arrays, for the storage of numerical arrays in the datasets.

The method that you will need to use to add a numeric array is add_data_array. It accepts the following arguments:

  • location: the parent group that will contain the created group. Mandatory argument

  • name: the data array to create will have this Name

  • array

  • indexname: the data array to create will have this Indexname. If none is provided, the indexname will be duplicated from the Name

  • replace: if an array with the same Name exists, the SampleData class will remove it to create the new one only if this argument is set to True. By default, it is set to False

  • array: a numpy.ndarray, the numeric array to be stored into the dataset

In addition, it accepts two arguments linked to data compression, the chunkshape and compression_options arguments, that will not be discussed here, but rather in the tutorial dedicated to data compression. The name, indexname, location and replace work exactly as for the add_group method presented in the previous section, and the array argument is pretty explicit.

It is allowed to create an empty data array item in the dataset. The main purpose of this option will be highlighted in the tutorial on SampleData derived classes. For now, note that it allows to create the internal organization of your dataset without having to add any data to it. It allows for instance to preempt some data item names, indexnames, and to already add metadata. We will see below how to create an empty data array item, and later, add actual data to it.

Let us start by creating a random array of data with the numpy.random package, and store it into a data array in the group test_group.

[22]:
# we start by importing the numpy package
import numpy as np
# we create a random array of 20 elements
A = np.random.rand(20)
print(A)
[0.51535012 0.56508785 0.53913262 0.64040211 0.55415212 0.23042938
 0.67191761 0.12123758 0.88332728 0.32097958 0.36914473 0.244528
 0.74187258 0.280634   0.68578653 0.6472718  0.16596613 0.39701755
 0.24730563 0.61975144]

Now we add the array A to our dataset with name test_array, indexname Tarray, to the group test_group:

[23]:
data.add_data_array(location='Gtest', name='test_array', indexname='Tarray', array=A)

Adding array `test_array` into Group `Gtest`
[23]:
/test_group/test_array (CArray(20,)) 'Tarray'
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)

As for the group creation in the previous section, the verbose mode of the class informed us that the array has been added to the dataset, in the desired group. Once again, the method has return a Pytables object. Here it is a tables.Node object and not a tables.Group object.

You can see that this object contains a array, that is a Carray. A Carray is a chunkable array. It is a specific type of HDF5 array, whose data are split into multiple chunks which are all stored separately in the file. This enables a strong optimization of the reading speed when dealing with multidimensional arrays. We will not detail this feature of the HDF5 library in this tutorial, but rather in the tutorial dedicated to data compression. Though, it is strongly advised to study the concept ouf HDF5 chuncked layout to optimally use SampleData datasets (see Pytables optimization tips or HDF5 group dedicated page).

Let us look at the content of our dataset:

[24]:
print(data)
Dataset Content Index :
------------------------:
index printed with max depth `3` and under local root `/`

         Name : Gtest                                     H5_Path : /test_group
         Name : Tarray                                    H5_Path : /test_group/test_array

Printing dataset content with max depth 3
  |--GROUP test_group: /test_group (Group)
     --NODE test_array: /test_group/test_array (data_array) (   64.000 Kb)


We now see that our array has been added as a children of the test_group Group, as requested. Like for the Group created in the section II, we can add metadata to this dat item:

[25]:
metadata = {'tutorial_file':'2_SampleData_basic_data_items.ipynb',
            'tutorial_section':'Section IV'}
data.add_attributes(metadata, 'Tarray')
data.print_node_attributes('Tarray')
 -- test_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

As you can observe, array data items have an empty attribute, that indicates if the data item is associated or not to an empty data array (see a few cell above). They also have a node_type attribute, indicating their data item nature, here a Data Array.

Let us try to create a empty array now. In this case, you just have to remove the array argument from the method call:

[26]:
data.add_data_array(location='Gtest', name='empty_array', indexname='emptyA')

Adding array `empty_array` into Group `Gtest`
[26]:
/test_group/empty_array (CArray(1,)) 'emptyA'
  atom := Int64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)

The verbose mode of the class indeed informs us that we created an empty data array item. Let us print again the content of our dataset, this time with the detailed format to also see the attributes of our data items:

[27]:
data.print_dataset_content(short=False)
Printing dataset content with max depth 3

****** DATA SET CONTENT ******
 -- File: tutorial_dataset.h5
 -- Size:    71.492 Kb
 -- Data Model Class: SampleData

 GROUP /
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * sample_name : test_sample
 -- Childrens : Index, test_group,
----------------
************************************************


 GROUP test_group
=====================
 -- Parent Group : /
 -- Group attributes :
         * description : Just a short example to see how to use descriptions. This is a description.
         * group_type : Group
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section II
 -- Childrens : test_array, empty_array,
----------------
****** Group /test_group CONTENT ******

 NODE: /test_group/empty_array
====================
 -- Parent Group : test_group
 -- Node name : empty_array
 -- empty_array attributes :
         * empty : True
         * node_type : data_array

 -- content : /test_group/empty_array (CArray(1,)) 'emptyA'
 -- Compression options for node `/test_group/empty_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_group/test_array
====================
 -- Parent Group : test_group
 -- Node name : test_array
 -- test_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

 -- content : /test_group/test_array (CArray(20,)) 'Tarray'
 -- Compression options for node `/test_group/test_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------


************************************************


We have as requested our array with the 20 element array, and the empty array. You can see that SampleData stores a 1 element array in empty arrays, as it is not possible to create a node with a 0d array. That is why the empty attribute is attached to array data items

We will try to get our newly added data items as numpy arrays now. As for the test_array Group, we can use the get_node method for this. We can now introduce its second argument, the as_numpy option. If this argument is set to True, get_node return the data item as a numpy.ndarray. If it is set to False (its default value), the method returns a tables.Node object from the Pytables class, identical to the one returned by the add_data_array method when the data item was created.

Let us see some example:

[28]:
array_node = data.get_node('test_array')
array = data.get_node('test_array', as_numpy=True)
print('array_node returned with "as_nump=False":\n',array_node,'\n')
print('array returned with "as_nump=True":\n', array, type(array))
array_node returned with "as_nump=False":
 /test_group/test_array (CArray(20,)) 'Tarray'

array returned with "as_nump=True":
 [0.51535012 0.56508785 0.53913262 0.64040211 0.55415212 0.23042938
 0.67191761 0.12123758 0.88332728 0.32097958 0.36914473 0.244528
 0.74187258 0.280634   0.68578653 0.6472718  0.16596613 0.39701755
 0.24730563 0.61975144] <class 'numpy.ndarray'>

What happens if we try to get our empty array ?

[29]:
empty_array_node = data.get_node('empty_array')
empty_array = data.get_node('empty_array', as_numpy=True)

print(f'The empty array node {empty_array_node} is not truly empty in the dataset','\n')
print(f'It actually contains ... {empty_array} ... a one element array with value 0')
The empty array node /test_group/empty_array (CArray(1,)) 'emptyA' is not truly empty in the dataset

It actually contains ... [0] ... a one element array with value 0

As mentioned earlier, the empty array is not truly empty.

We have seen that the get_node method can have two different behaviors with data arrays. So what happens if we try to get a data array from our dataset using one of the two generic getter mechanisms explained in section I ?

[30]:
array = data['test_array']
print(f'What we got with the dictionary like access is a {type(array)}')

array = data.test_array
print(f'What we got with the attribute like access is a {type(array)}')
What we got with the dictionary like access is a <class 'numpy.ndarray'>
What we got with the attribute like access is a <class 'numpy.ndarray'>

As you can see, the generic mechanisms call the get_node method with the as_numpy=True option.

The last thing we have to discuss about data arrays, is how to add actual data to an empty data array item. Actually, when calling the add_data_array method with the name/location of an empty array, the method behaves as if it had been called with replace=True. However, in this case, all metadata that was attached to the empty node is preserved and reattached to the data item created with the inputed array.

Let us add some metadata to the empty array, to test this feature:

[31]:
metadata = {'tutorial_file':'2_SampleData_basic_data_items.ipynb',
            'tutorial_section':'Section IV'}
data.add_attributes(metadata, 'empty_array')
data.print_node_attributes('empty_array')
 -- empty_array attributes :
         * empty : True
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

Let us create a data array and try to add it to the empty array.

[32]:
A = np.arange(20)
data.add_data_array(location='Gtest', name='empty_array', indexname='emptyA', array=A)

Adding array `empty_array` into Group `Gtest`

Removing  node /test_group/empty_array in content index....


item emptyA : /test_group/empty_array removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node /test_group/empty_array sucessfully removed
[32]:
/test_group/empty_array (CArray(20,)) 'emptyA'
  atom := Int64Atom(shape=(), dflt=0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := (8192,)
[33]:
print(data['emptyA'])
data.print_node_attributes('empty_array')
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
 -- empty_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

As you can see thanks to the verbose mode, the old empty data array node is removed, and replaced with a new one containing our data, but also the metadata previously attached to the empty array. A node_type as additionally been attached to it to account for the nature of the data newly stored in the data item.

Except for the control of data compression parameters, you now know all that is to know to create and retrieve data arrays with SampleData.

V - String arrays

We now move to another usefull type of data item. In many cases, it may be usefull to store long lists of strings. Data arrays are restricted to numerical arrays. Attributes are meant to store data of small size and are thus not suited fot it either.

To realize this task, you will need to rely on String arrays, that can be added thanks to the add_string_array method. It is basically a mapping of a Python string list to a HDF5 Pytables data item. Hence, this method has arguments that you now know well: name, location, indexname, replace. In addition, it has a data argument that must be the Python list of strings that you want to store in the dataset.

Let us see an example:

[34]:
List = ['this','is','a','not so long','list','of strings','for','the tutorial !!']
data.add_string_array(name='string_array', location='test_group', indexname='Sarray', data=List)

Adding String Array `string_array` into Group `test_group`
[34]:
/test_group/string_array (EArray(8,)) ''
  atom := StringAtom(itemsize=255, shape=(), dflt=b'')
  maindim := 0
  flavor := 'numpy'
  byteorder := 'irrelevant'
  chunkshape := (257,)

Like the previous ones, this method verbose mode informs you that the string array has been created, and returns the Pytables node object associated to the created data item.

Let us look at the dataset content:

[35]:
data.print_dataset_content(short=False)
Printing dataset content with max depth 3

****** DATA SET CONTENT ******
 -- File: tutorial_dataset.h5
 -- Size:   138.141 Kb
 -- Data Model Class: SampleData

 GROUP /
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * sample_name : test_sample
 -- Childrens : Index, test_group,
----------------
************************************************


 GROUP test_group
=====================
 -- Parent Group : /
 -- Group attributes :
         * description : Just a short example to see how to use descriptions. This is a description.
         * group_type : Group
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section II
 -- Childrens : test_array, empty_array, string_array,
----------------
****** Group /test_group CONTENT ******

 NODE: /test_group/empty_array
====================
 -- Parent Group : test_group
 -- Node name : empty_array
 -- empty_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

 -- content : /test_group/empty_array (CArray(20,)) 'emptyA'
 -- Compression options for node `/test_group/empty_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_group/string_array
====================
 -- Parent Group : test_group
 -- Node name : string_array
 -- string_array attributes :
         * empty : False
         * node_type : string_array

 -- content : /test_group/string_array (EArray(8,)) ''
 -- Compression options for node `/test_group/string_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_group/test_array
====================
 -- Parent Group : test_group
 -- Node name : test_array
 -- test_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

 -- content : /test_group/test_array (CArray(20,)) 'Tarray'
 -- Compression options for node `/test_group/test_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------


************************************************


You can see in the information printed about the string array that we just created, that it has a node_type attribute, indicating that it is a String Array.

To manipulate a string array use the ‘get_node’ method to get the array, and then manipulate it as a list of binary strings. Indeed, strings are automatically converted to bytes when creating this type of data item. You will hence need to use the str.decode() method to get the elements of the string_array as UTF-8 or ASCII formatted strings:

[36]:
sarray = data['Sarray']
S1 = sarray[0] # here we get a Python bytes string
S2 = sarray[0].decode('utf-8') # here we get a utf-8 string
print(S1)
print(S2,'\n')

# Let us print all strings contained in the string array:
for string in sarray:
    print(string.decode('utf-8'), end=' ')
b'this'
this

this is a not so long list of strings for the tutorial !!

The particularity of String arrays, is that they are enlargeable. To add additionnal elements to them, you may use the append_string_array method, that takes as arguments the name of the string array, and a list of strings:

[37]:
# we add 3 new elements to the array
data.append_string_array(name='Sarray', data=['We can make','it','bigger !'])
# now Let us print the enlarged list of strings:
for string in data['Sarray']:
    print(string.decode('utf-8'), end=' ')
this is a not so long list of strings for the tutorial !! We can make it bigger !

And that is all that it is to know about String arrays !

VI - Structured arrays

The last data item type that we will review in this tutorial is analogous to the data array item type (section IV), but is meant to store Numpy structured arrays (you are strongly encouraged ). Those are ndarrays whose datatype is a composition of simpler datatypes organized as a sequence of named fields, in other words, heterogeneous arrays.

The Pytables package, which handles the HDF5 dataset within the SampleData class, use a class called table to store structured arrays. This termonology is reused within the SampleData class. Hence, to add a structured array, you may use the add_table method. This method accepts the same arguments as add_data_array, plus a description argument, that may be an instance of the tables.IsDescription class (see here), or a numpy.dtype object. It is an object whose role is to describe the structure of the array (name and type of array Fields). The data argument value must be a numpy.ndarray whose dtype is consistent with the description.

Let us see an example. Imagine that we want to create a structured array to store data describing material particles, containing for each particle, its nature, an identity number, its dimensions, and a boolean value indicating the presence of damage at the particle.

To do this, we have to create a suitable numpy.dtype and numpy structured array:

[38]:
# creation of a numpy dtype --> takes as input a list of tuples ('field_name', 'field_type')
# Numpy dtype reminder: S25: binary strings with 25 characters, ?: boolean values
#
sample_type = np.dtype([('Nature','S25'), ('Id_number',np.int16), ('Dimensions',np.double,(3,)), ('Damaged','?')])

# now we create an empty array of 2 elements of this type
sample_array = np.empty(shape=(2,), dtype=sample_type)

# now we create data to represent 2 samples
sample_array['Nature'] = ['Intermetallic', 'Carbide']
sample_array['Id_number'] = [1,2]
sample_array['Dimensions'] = [[20,20,50],[2,2,3]]
sample_array['Damaged'] = [True,False]

print(sample_array)
[(b'Intermetallic', 1, [20., 20., 50.],  True)
 (b'Carbide', 2, [ 2.,  2.,  3.], False)]

Now that we have our numpy.dtype, and our structured array, we can create the table:

[39]:
# create the structured array data item
tab = data.add_table(name='test_table', location='test_group', indexname='tableT', description=sample_type,
                     data=sample_array)

# adding one attribute to the table
data.add_attributes({'tutorial_section':'VI'},'tableT')

# printing information on the table
data.print_node_info('tableT')

Adding table `test_table` into Group `test_group`

-- Compression Options for dataset test_table

 NODE: /test_group/test_table
====================
 -- Parent Group : test_group
 -- Node name : test_table
 -- test_table attributes :
         * node_type : structured_array
         * tutorial_section : VI

 -- content : /test_group/test_table (Table(2,)) ''
 -- table description :
{
  "Nature": StringCol(itemsize=25, shape=(), dflt=b'', pos=0),
  "Id_number": Int16Col(shape=(), dflt=0, pos=1),
  "Dimensions": Float64Col(shape=(3,), dflt=0.0, pos=2),
  "Damaged": BoolCol(shape=(), dflt=False, pos=3)}
 -- Compression options for node `tableT`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (1260,)
 -- Node memory size :    63.984 Kb
----------------


You see above that the method print_node_info prints the description of the structured array stored in the dataset, which allows you to know what are their fields and associated data types. You can observe however, that these fields and their types are specific object from the Pytable package: column objects (StringCol, Int16Col…). You can also see here a node_type attribute indicating that the node is a Structured Array data item.

The method returned like in for the previous data item studied, the equivalent Pytables Node object. This table object interestingly has a description attribute, and a numpy.dtype attribute:

[40]:
print(tab.description)
print(tab.dtype)
Description([('Nature', '()S25'), ('Id_number', '()i2'), ('Dimensions', '(3,)f8'), ('Damaged', '()b1')])
[('Nature', 'S25'), ('Id_number', '<i2'), ('Dimensions', '<f8', (3,)), ('Damaged', '?')]

Once the table is created, it is stil possible to expand it in two ways: 1. adding new rows 2. adding new columns

To add new rows to the table, you will need to create a numpy.ndarray that is compatible with the table, i.e. meeting the 2 criteria: 1. having a dtype compatible with the table description (same fields associated to same types) 2. having a compatible shape (all dimensions except last must have identical shapes)

Let us try to add two new sample rows to our table:

[41]:
sample_array['Nature'] = ['Intermetallic', 'Carbide']
sample_array['Id_number'] = [3,4]
sample_array['Dimensions'] = [[50,20,30],[3,2,3]]
sample_array['Damaged'] = [True,True]
[42]:
data.append_table(name='tableT', data=sample_array)

We should now look at our dataset to see if the data has been appended to our structured array. Let us see what happens if we use a generic getter mechanism on our structured array:

[43]:
print(type(data['tableT']),'\n')
print(data['tableT'].dtype, '\n')
print(data['tableT'],'\n')
# you can also directly get table columns:
print('Damaged particle ?',data['tableT']['Damaged'],'\n')
print('Particles Nature:',data['tableT']['Nature'])
<class 'numpy.ndarray'>

[('Nature', 'S25'), ('Id_number', '<i2'), ('Dimensions', '<f8', (3,)), ('Damaged', '?')]

[(b'Intermetallic', 1, [20., 20., 50.],  True)
 (b'Carbide', 2, [ 2.,  2.,  3.], False)
 (b'Intermetallic', 3, [50., 20., 30.],  True)
 (b'Carbide', 4, [ 3.,  2.,  3.],  True)]

Damaged particle ? [ True False  True  True]

Particles Nature: [b'Intermetallic' b'Carbide' b'Intermetallic' b'Carbide']

As for the data array item type, the generic mechanisms here return the data item as a numpy.ndarray, with the dtype of the structured table. We can indeed read the right number of lines, and the right values. Note that strings are necessarily stored as bytes in the dataset, so you must decode them to print or use them in standard string format:

[44]:
print(data['tableT']['Nature'][0].decode('ascii'))
Intermetallic

We will now see how to add new columns to the table, using the add_tablecols method. It allows to add a structured numpy.ndarray as additional columns to an already existing table. It takes 3 arguments: tablename (Name, Path, Indexname or Alias of the table to which you want to add columns), description and data. As for the add_table method, the data argument dtype must be consistent with the description argument.

Let us add two columns to our structured array, to store for instance the particle position and chemical composition :

[45]:
# we create a new dtype with the new fields
cols_dtype = np.dtype([('Position',np.double,(3,)), ('Composition','S25')])

# now we create an empty array of 2 elements of this type
new_cols = np.empty(shape=(4,), dtype=cols_dtype)

# now we create data to fill the new columns
new_cols['Position'] = [[100.,150.,300],[10,25,10],[520,300,450],[56,12,45]]
new_cols['Composition'] = ['Cr3Si','Fe3C','MgZn2','SiC']
[46]:
data.add_tablecols(tablename='tableT', description=cols_dtype, data=new_cols)

Updating `tableT` with fields {'Nature': (dtype('S25'), 0), 'Id_number': (dtype('int16'), 25), 'Dimensions': (dtype(('<f8', (3,))), 27), 'Damaged': (dtype('bool'), 51), 'Position': (dtype(('<f8', (3,))), 52), 'Composition': (dtype('S25'), 76)}

New table description is `{'Nature': (dtype('S25'), 0), 'Id_number': (dtype('int16'), 25), 'Dimensions': (dtype(('<f8', (3,))), 27), 'Damaged': (dtype('bool'), 51), 'Position': (dtype(('<f8', (3,))), 52), 'Composition': (dtype('S25'), 76)}`

data is: [(b'0.0', 0, [0., 0., 0.], False, [0., 0., 0.], b'0.0')
 (b'0.0', 0, [0., 0., 0.], False, [0., 0., 0.], b'0.0')
 (b'0.0', 0, [0., 0., 0.], False, [0., 0., 0.], b'0.0')
 (b'0.0', 0, [0., 0., 0.], False, [0., 0., 0.], b'0.0')]

(get_tablecol) Getting column Nature from : tutorial_dataset.h5:/test_group/test_table

(get_tablecol) Getting column Id_number from : tutorial_dataset.h5:/test_group/test_table

(get_tablecol) Getting column Dimensions from : tutorial_dataset.h5:/test_group/test_table

(get_tablecol) Getting column Damaged from : tutorial_dataset.h5:/test_group/test_table

Removing  node /test_group/test_table in content index....


item tableT : /test_group/test_table removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node test_table sucessfully removed

Adding table `test_table` into Group `/test_group`

-- Compression Options for dataset test_table

(get_tablecol) Getting column Position from : tutorial_dataset.h5:/test_group/test_table

(get_tablecol) Getting column Composition from : tutorial_dataset.h5:/test_group/test_table
[47]:
data.print_node_info('tableT')
print(data['tableT'],'\n')

 NODE: /test_group/test_table
====================
 -- Parent Group : test_group
 -- Node name : test_table
 -- test_table attributes :
         * node_type : structured_array
         * tutorial_section : VI

 -- content : /test_group/test_table (Table(4,)) ''
 -- table description :
{
  "Nature": StringCol(itemsize=25, shape=(), dflt=b'', pos=0),
  "Id_number": Int16Col(shape=(), dflt=0, pos=1),
  "Dimensions": Float64Col(shape=(3,), dflt=0.0, pos=2),
  "Damaged": BoolCol(shape=(), dflt=False, pos=3),
  "Position": Float64Col(shape=(3,), dflt=0.0, pos=4),
  "Composition": StringCol(itemsize=25, shape=(), dflt=b'', pos=5)}
 -- Compression options for node `tableT`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (1260,)
 -- Node memory size :   124.277 Kb
----------------


[(b'Intermetallic', 1, [20., 20., 50.],  True, [100., 150., 300.], b'Cr3Si')
 (b'Carbide', 2, [ 2.,  2.,  3.], False, [ 10.,  25.,  10.], b'Fe3C')
 (b'Intermetallic', 3, [50., 20., 30.],  True, [520., 300., 450.], b'MgZn2')
 (b'Carbide', 4, [ 3.,  2.,  3.],  True, [ 56.,  12.,  45.], b'SiC')]

As you can see from the verbose mode prints, when adding new columns to a table, the SampleData class get the data from the original table, and creates a new tables including the additional columns. From the print_node_info output, you can verify that in the process, the metadata attached to the original table has been preserved. You can also observe that the table description has been correctly enriched with the Position and Composition fields.

Note that if you provide an additional columns array that do not match the shape of the stored table, you will get a mismatch error.

To conclude this section on structured arrays, we will see a method allowing to set the values for a full column of a stored structured array, the set_tablecol method. You have to pass as arguments, the name of the table and the name of the column field you want to modify, and a numpy array to set the new values of the column. Of course, the array type must be consistent with the type of the modified column.

To see an example, Let us set all the values of the 'Damaged' column of our table to True:

[48]:
data.set_tablecol(tablename='tableT', colname='Damaged', column=np.array([True,True,True,True]))
print('\n',data['tableT'],'\n')
print(data['tableT']['Damaged'],'\n')

(get_tablecol) Getting column Damaged from : tutorial_dataset.h5:/test_group/test_table

 [(b'Intermetallic', 1, [20., 20., 50.],  True, [100., 150., 300.], b'Cr3Si')
 (b'Carbide', 2, [ 2.,  2.,  3.],  True, [ 10.,  25.,  10.], b'Fe3C')
 (b'Intermetallic', 3, [50., 20., 30.],  True, [520., 300., 450.], b'MgZn2')
 (b'Carbide', 4, [ 3.,  2.,  3.],  True, [ 56.,  12.,  45.], b'SiC')]

[ True  True  True  True]

You have now learn how to create and get values from all basic data item types that can be stored into SampleData datasets.

Before closing this tutorial, we will see how to remove data items from datasets.

VII - Removing data items from datasets

Removing data items is very easy, you juste have to call the remove_node method, and provide the name of the data item you want to remove. When removing non empty Groups from the dataset, the optional recursive argument should be set to True, to allow the method to remove the group and all of its childrens. If this is not the case, the method will not remove Groups that have childrens.

Let us try to remove our test_array from the dataset:

[49]:
data.remove_node('test_array')
data.print_dataset_content()

Removing  node /test_group/test_array in content index....


item Tarray : /test_group/test_array removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node test_array sucessfully removed
Printing dataset content with max depth 3

****** DATA SET CONTENT ******
 -- File: tutorial_dataset.h5
 -- Size:   398.229 Kb
 -- Data Model Class: SampleData

 GROUP /
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * sample_name : test_sample
 -- Childrens : Index, test_group,
----------------
************************************************


 GROUP test_group
=====================
 -- Parent Group : /
 -- Group attributes :
         * description : Just a short example to see how to use descriptions. This is a description.
         * group_type : Group
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section II
 -- Childrens : empty_array, string_array, test_table,
----------------
****** Group /test_group CONTENT ******

 NODE: /test_group/empty_array
====================
 -- Parent Group : test_group
 -- Node name : empty_array
 -- empty_array attributes :
         * empty : False
         * node_type : data_array
         * tutorial_file : 2_SampleData_basic_data_items.ipynb
         * tutorial_section : Section IV

 -- content : /test_group/empty_array (CArray(20,)) 'emptyA'
 -- Compression options for node `/test_group/empty_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (8192,)
 -- Node memory size :    64.000 Kb
----------------

 NODE: /test_group/string_array
====================
 -- Parent Group : test_group
 -- Node name : string_array
 -- string_array attributes :
         * empty : False
         * node_type : string_array

 -- content : /test_group/string_array (EArray(11,)) ''
 -- Compression options for node `/test_group/string_array`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (257,)
 -- Node memory size :    63.999 Kb
----------------

 NODE: /test_group/test_table
====================
 -- Parent Group : test_group
 -- Node name : test_table
 -- test_table attributes :
         * node_type : structured_array
         * tutorial_section : VI

 -- content : /test_group/test_table (Table(4,)) ''
 -- table description :
{
  "Nature": StringCol(itemsize=25, shape=(), dflt=b'', pos=0),
  "Id_number": Int16Col(shape=(), dflt=0, pos=1),
  "Dimensions": Float64Col(shape=(3,), dflt=0.0, pos=2),
  "Damaged": BoolCol(shape=(), dflt=False, pos=3),
  "Position": Float64Col(shape=(3,), dflt=0.0, pos=4),
  "Composition": StringCol(itemsize=25, shape=(), dflt=b'', pos=5)}
 -- Compression options for node `/test_group/test_table`:
        complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None
 --- Chunkshape: (1260,)
 -- Node memory size :   124.277 Kb
----------------


************************************************


The array data item has indeed been removed from the dataset. Let us now try to remove our test group, and all of its childrens. We should end up with an empty dataset at the end:

[50]:
data.remove_node('test_group', recursive=True)
data.print_dataset_content()

Removing  node /test_group in content index....

Removing  node /test_group/empty_array in content index....


item emptyA : /test_group/empty_array removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node <closed tables.carray.CArray at 0x7f60c5d95d70> sucessfully removed

Removing  node /test_group/string_array in content index....


item Sarray : /test_group/string_array removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node <closed tables.earray.EArray at 0x7f60c5d95cd0> sucessfully removed

Removing  node /test_group/test_table in content index....


item tableT : /test_group/test_table removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node <closed tables.table.Table at 0x7f60c5d33440> sucessfully removed

item Gtest : /test_group removed from context index dictionary
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Node test_group sucessfully removed
Printing dataset content with max depth 3

****** DATA SET CONTENT ******
 -- File: tutorial_dataset.h5
 -- Size:   271.905 Kb
 -- Data Model Class: SampleData

 GROUP /
=====================
 -- Parent Group : /
 -- Group attributes :
         * description :
         * sample_name : test_sample
 -- Childrens : Index,
----------------
************************************************


You can see from the verbose mode output that the method has indeed removed the group, and all of its childrens. You may also remove node attributes using the remove_attribute and remove_attributes methods.

That’s it ! You know now how to remove data items from your datasets. This tutorial is finished, we can now close our test dataset.

[51]:
del data

Deleting DataSample object
.... Storing content index in tutorial_dataset.h5:/Index attributes
.... writing xdmf file : tutorial_dataset.xdmf
.... flushing data in file tutorial_dataset.h5
File tutorial_dataset.h5 synchronized with in memory data tree

Dataset and Datafiles closed
SampleData Autodelete:
 Removing hdf5 file tutorial_dataset.h5 and xdmf file tutorial_dataset.xdmf