{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "38aa16f3",
   "metadata": {},
   "source": [
    "# Create Predefined Custom Data Models for your datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09512322",
   "metadata": {},
   "source": [
    "This tutorial will teach you how to defined derived classes from SampleData, in order to create datasets with an automatically generated data model that is tailored to a specific need. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e384dcab",
   "metadata": {},
   "source": [
    "## I - SampleData derived classes "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cce50e7",
   "metadata": {},
   "source": [
    "The `SampleData` class allows to create and interact with complex HDF5 datasets. New datasets are created empty, and can be constructed freely according to the needs of the user. When using the class to work with many datasets that should share the same type of internal organization and content, users will have to rebuild this internal data model for each new dataset. In addition, in order to defined scripts or classes that aim at batch processing some data items that are found in each of these datasest, they will have to make sure that the these item names and/or pathes are identical in all datasets.   \n",
    "\n",
    "These considerations show that the automatic generation of a non-empty and specific *data model* would be a usefull addition to the features of `SampleData`. For that purpose, the class implements two simple mechanisms through class inheritance, that are the subject of the present tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "112c09d4",
   "metadata": {},
   "source": [
    "### Custom Data Model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "769b0711",
   "metadata": {},
   "source": [
    "The `SampleData` class defines a minimal data model for all the datasets that structures all created datasets. This data model is an organized collection of data item *indexnames, pathes* and *types*, provided via two dictionaries, that are:\n",
    "\n",
    "1. `minimal_content_index_dic`: the path of each data item in the data model\n",
    "2. `minimal_content_type_dic`: the type of each data item in the data model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b384a8a6",
   "metadata": {},
   "source": [
    "#### The content index dictionary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abc4ccbf",
   "metadata": {},
   "source": [
    "Each item of this dictionary defines a data item of the data model. Its key will be the *indexname* given to the data item in the dataset, and the item value must be a string giving a valid path for the data item in the dataset. When a dataset is created, the class will automatically create a data item for each key of this dictionary, and set its path in the dataset with the associated value in the dictionary. \n",
    "\n",
    "For the `SampleData` class, this dictionary is empty, no data model is prescribed. Hence, datasets that are created with `SampleData` are empty (they just hase a Root Group, as explained in a previous [tutorial](./Datasets_Files.ipynb). To create datasets with a prescribed data model, the idea is to implement a class that is derived from `SampleData`,  with a non-empty `minimal_content_index_dic`, that implements the desired data model.\n",
    "\n",
    "This dictionary should hence look like this:\n",
    "\n",
    "```python\n",
    "       minimal_content_index_dic = {'item1': '/path_to_item1',\n",
    "                                    'item2': '/path_to_item1/path_to_item2',\n",
    "                                    'item3': '/path_to_item3',\n",
    "                                    'item4': '/path_to_item1/path_to_item4',\n",
    "                                     '...': '...',}\n",
    "```\n",
    "\n",
    "An item of the form `'wrongitem': '/undeclared_item/path_to_wrong_item'` would have been a non valid path.\n",
    "\n",
    "The dictionary example just above would lead to the creation of at least 4 data items, with names `item1`, `item2`, `item3` and `item4`, with items 1 and 3 being directly attached to the dataset *Root Group*, and the items 2 and 4 being childrens of item 1. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1463f960",
   "metadata": {},
   "source": [
    "#### The content type dictionary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35fa03c3",
   "metadata": {},
   "source": [
    "The second dictionary must have the same keys as the `minimal_content_index_dic`. **Its values must be valid *SampleData* data item types**. The type of data item that are automatically created at the dataset creation with the names and pathes specified by `minimal_content_index_dic`, are prescribed by the `minimal_content_type_dic`  \n",
    "\n",
    "Possible values and associated data types are (see previous tutorials for description of these data types):\n",
    "\n",
    "* `Group`: creates a HDF5 group data item\n",
    "* `2DImage`, `3DImage`, or `Image`: creates an empty Image group\n",
    "* `2DMesh`, `3DMesh`, `Mesh`: creates an empty Mesh group\n",
    "* `data_array`: creates an empty Data Array\n",
    "* `field_array`: creates an empty Field Array (its path must be a children of a an Image or Mesh group)\n",
    "* `string_array`: creates an empty String Array \n",
    "* a `numpy.dtype` or a `tables.IsDescription` class ([see here](https://www.pytables.org/usersguide/libref/declarative_classes.html#the-isdescription-class) and [the tutorial on basic data items](./Data_Items.ipynb)):"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abb029ab",
   "metadata": {},
   "source": [
    "This dictionary should share the same keys as the `minimal_content_index` dictionary, and should look like this:\n",
    "\n",
    "```python\n",
    "       minimal_content_type_dic = {'item1': '3DMesh',\n",
    "                                   'item2': 'field_array',\n",
    "                                   'item3': 'data_array',\n",
    "                                   'item4':  array_np.dtype,\n",
    "                                   '...': '...',}\n",
    "```\n",
    "\n",
    "In this case, the first item would be created as a *Mesh Group*, the second will be created as a field data item stored in this mesh, the third as a data array attached to the *Root Group*, and the last as a *Structured Table* attached to the Mesh Group."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "822fa77b",
   "metadata": {},
   "source": [
    "*****\n",
    "These two dictionaries are returned by the `minimal_data_model` method of the `SampleData` class. They are used during the dataset object initialization, to create the prescribed data model, and populate it with empty objects, with the right names and organization. This allows to prepend a set of names and pathes that form a particular data model that all objects created by the class should have. \n",
    "\n",
    "To create a dataset class with the above data model, its implementation should thus look like this at this stage:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5f3104d-4d60-4eb1-b4c5-12657b4f6833",
   "metadata": {},
   "source": [
    "```python\n",
    "class MyDatasets(SampleData):\n",
    "    \"\"\"Example of SampleData derived class.\n",
    "    \n",
    "       This is how to implement a class of datasets with a custom data model.\n",
    "    \"\"\"\n",
    "    \n",
    "    def minimal_data_model(self):\n",
    "        \n",
    "        minimal_content_index_dic = {'item1': '/path_to_item1',\n",
    "                                    'item2': '/path_to_item1/path_to_item2',\n",
    "                                    'item3': '/path_to_item3',\n",
    "                                    'item4': '/path_to_item1/path_to_item4'}\n",
    "        minimal_content_type_dic = {'item1': '3DMesh',\n",
    "                                    'item2': 'field_array',\n",
    "                                    'item3': 'data_array',\n",
    "                                    'item4':  array_np.dtype}\n",
    "        \n",
    "        return minimal_content_index_dic, minimal_content_type_dic\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "549614bc",
   "metadata": {},
   "source": [
    "This dictionaries are labeled as **minimal data model**, as they only prescribe the data items and organization that will be generated in each created dataset of the subclass. The user is free to enrich the datasets with any additional data item (see previous tutorial to learn how to do it).\n",
    "\n",
    "To sum up, creating a interface to create and interact with datasets with a prescribed data model, you have to:\n",
    "\n",
    "1. Implement a new class, inherited from SampleData\n",
    "2. Override the `minimal_data_model` method and write your data model in the two dictionaries returned by the class "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd2f39c3",
   "metadata": {},
   "source": [
    "You will then get a class derived from *SampleData* (hence with all its methods and features), that creates datasets with this prescribed data model."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b139203f",
   "metadata": {},
   "source": [
    "### Custom initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a722e521",
   "metadata": {},
   "source": [
    "The other mechanisms that is important to design subclasses of *SampleData*, is the specification of initialization commands that are runed each time at dataset opening. These operations can include, for instance, the definition of class attributes, prints, sanity checks etc..... The `_after_file_open` method of the `SampleData` class has been designed to this end. It is called by the class constructor after opening the HDF5 dataset and loading the dataset Index and data tree in the class instance. \n",
    "\n",
    "To create your custom dataset initialization routine, you can hence override the `_after_file_open` method in your derived class, and implement your initialization procedure. For instance, if you want your class to warn the user that a data item is empty in the dataset, you could implement your class as follows:\n",
    "\n",
    "```python\n",
    "class MyDatasets(SampleData):\n",
    "    \"\"\"Example of SampleData derived class.\n",
    "    \n",
    "       This is how to implement a class of datasets with a custom data model\n",
    "       and initialization procedure.\n",
    "    \"\"\"\n",
    "    \n",
    "    def minimal_data_model(self):\n",
    "        \"\"\"Define data model of MyDatasets class.\"\"\"\n",
    "        minimal_content_index_dic = {'item1': '/path_to_item1',\n",
    "                                    'item2': '/path_to_item1/path_to_item2',\n",
    "                                    'item3': '/path_to_item3',\n",
    "                                    'item4': '/path_to_item1/path_to_item4'}\n",
    "        minimal_content_type_dic = {'item1': '3DMesh',\n",
    "                                    'item2': 'field_array',\n",
    "                                    'item3': 'data_array',\n",
    "                                    'item4':  array_np.dtype}\n",
    "        \n",
    "        return minimal_content_index_dic, minimal_content_type_dic\n",
    "        \n",
    "    def _after_file_open(self):\n",
    "        \"\"\"Initialization procedure for MyDatasets.\"\"\"\n",
    "        \n",
    "        if self._is_empty('item3'):\n",
    "            print('Warning: data array \"item3\" is empty in the dataset !')\n",
    "        else:\n",
    "            print('\"item3\" is not empty !')\n",
    "        return\n",
    "        \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d2cf6b4",
   "metadata": {},
   "source": [
    "## II - A practical example : The Microstructure Class"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fabbdf77",
   "metadata": {},
   "source": [
    "The `Microstructure` class has been designed to build datasets representing polycrystalline material samples. The `Microstructure` class also offers many application specific methods to interact with polycrystalline materials datasets, that are detailed in dedicated pages of this User's guide. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afee4ee9",
   "metadata": {},
   "source": [
    "Following the principles detailed in the previous section, the `Microstructure` class is implemented as a subclass of the `SampleData` class:\n",
    "```python\n",
    "class Microstructure(SampleData):\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "270a469a",
   "metadata": {},
   "source": [
    "Let us review its prescribed data model and initialization procedure to use it as a practical example of custom data model creation. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf7f8848",
   "metadata": {},
   "source": [
    "### Class data model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f08c5e5f",
   "metadata": {},
   "source": [
    "The code of the `minimal_data_model` method of the *Microstructure* class is replicated below:\n",
    "\n",
    "```python\n",
    "    def minimal_data_model(self):\n",
    "        \"\"\"Data model for a polycrystalline microstructure.\n",
    "\n",
    "        Specify the minimal contents of the hdf5 (Group names, paths and group\n",
    "        types) in the form of a dictionary {content: location}. This extends\n",
    "        `~pymicro.core.SampleData.minimal_data_model` method.\n",
    "\n",
    "        :return: a tuple containing the two dictionnaries.\n",
    "        \"\"\"\n",
    "        minimal_content_index_dic = {'Image_data': '/CellData',\n",
    "                                     'grain_map': '/CellData/grain_map',\n",
    "                                     'phase_map': '/CellData/phase_map',\n",
    "                                     'mask': '/CellData/mask',\n",
    "                                     'Mesh_data': '/MeshData',\n",
    "                                     'Grain_data': '/GrainData',\n",
    "                                     'GrainDataTable': ('/GrainData/'\n",
    "                                                        'GrainDataTable'),\n",
    "                                     'Phase_data': '/PhaseData'}\n",
    "        minimal_content_type_dic = {'Image_data': '3DImage',\n",
    "                                    'grain_map': 'field_array',\n",
    "                                    'phase_map': 'field_array',\n",
    "                                    'mask': 'field_array',\n",
    "                                    'Mesh_data': 'Mesh',\n",
    "                                    'Grain_data': 'Group',\n",
    "                                    'GrainDataTable': GrainData,\n",
    "                                    'Phase_data': 'Group'}\n",
    "        return minimal_content_index_dic, minimal_content_type_dic\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99e5ef76",
   "metadata": {},
   "source": [
    "### Datasets initialization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c8e8b0b",
   "metadata": {},
   "source": [
    "The `_after_file_open` method of the `Microstructure` is composed of the following lines of code:\n",
    "```python\n",
    "    def _after_file_open(self):\n",
    "        \"\"\"Initialization code to run after opening a Sample Data file.\"\"\"\n",
    "        self.grains = self.get_node('GrainDataTable')\n",
    "        if self._file_exist:\n",
    "            self.active_grain_map = self.get_attribute('active_grain_map',\n",
    "                                                       'CellData')\n",
    "            if self.active_grain_map is None:\n",
    "                self.set_active_grain_map()\n",
    "            self._init_phase(phase)\n",
    "            if not hasattr(self, 'active_phase_id'):\n",
    "                self.active_phase_id = 1\n",
    "        else:\n",
    "            self.set_active_grain_map()\n",
    "            self._init_phase(phase)\n",
    "            self.active_phase_id = 1\n",
    "        return\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "238203eb-e8e4-4d7c-b63a-2171eddb17b7",
   "metadata": {},
   "source": [
    "When opening a dataset, a class attribute `grains` is associated with the *Structured Array* node `GrainDataTable`. This `grains` attribute is used by many of the class methods. Hence, the `_after_file_open` method is used here to ensure that this attribute is properly associated to the *GrainDataTable* data item, for each opening of the dataset. The class initialization also executes the `_init_phase` and `set_active_grain_map` methods, that serve a similar purpose for other data items. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61509282-59b5-4751-a461-89502b06e53c",
   "metadata": {},
   "source": [
    "### Creating a Microstructure dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d315e86-b6a5-4527-8741-b4c99f6256e9",
   "metadata": {},
   "source": [
    "To conclude this tutorial, we will create a Microstructure object and look at its content. The class constructor arguments are similar to those of the `SampleData` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5244220c-2e75-4c9d-9a7d-f445e1f42c28",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adding empty field /CellData/grain_map to mesh group /CellData\n",
      "Adding empty field /CellData/phase_map to mesh group /CellData\n",
      "Adding empty field /CellData/mask to mesh group /CellData\n",
      "new phase added: unknown\n",
      "Microstructure\n",
      "* name: micro\n",
      "* lattice: Lattice (Symmetry.cubic) a=1.000, b=1.000, c=1.000 alpha=90.0, beta=90.0, gamma=90.0\n",
      "\n",
      "Dataset Content Index :\n",
      "------------------------:\n",
      "index printed with max depth `3` and under local root `/`\n",
      "\n",
      "\t Name : Image_data                                H5_Path : /CellData \t\n",
      "\t Name : Mesh_data                                 H5_Path : /MeshData \t\n",
      "\t Name : Grain_data                                H5_Path : /GrainData \t\n",
      "\t Name : Phase_data                                H5_Path : /PhaseData \t\n",
      "\t Name : grain_map                                 H5_Path : /CellData/grain_map \t\n",
      "\t Name : Image_data_Field_index                    H5_Path : /CellData/Field_index \t\n",
      "\t Name : phase_map                                 H5_Path : /CellData/phase_map \t\n",
      "\t Name : mask                                      H5_Path : /CellData/mask \t\n",
      "\t Name : GrainDataTable                            H5_Path : /GrainData/GrainDataTable \t\n",
      "\t Name : phase_01                                  H5_Path : /PhaseData/phase_01 \t\n",
      "\n",
      "Printing dataset content with max depth 3\n",
      "  |--GROUP CellData: /CellData (emptyImage) \n",
      "     --NODE Field_index: /CellData/Field_index (string_array - empty) (   63.999 Kb)\n",
      "     --NODE grain_map: /CellData/grain_map (field_array - empty) (   64.000 Kb)\n",
      "     --NODE mask: /CellData/mask (field_array - empty) (   64.000 Kb)\n",
      "     --NODE phase_map: /CellData/phase_map (field_array - empty) (   64.000 Kb)\n",
      "\n",
      "  |--GROUP GrainData: /GrainData (Group) \n",
      "     --NODE GrainDataTable: /GrainData/GrainDataTable (structured_array - empty) (    0.000 bytes)\n",
      "\n",
      "  |--GROUP MeshData: /MeshData (emptyMesh) \n",
      "  |--GROUP PhaseData: /PhaseData (Group) \n",
      "    |--GROUP phase_01: /PhaseData/phase_01 (Group) \n",
      "\n",
      "\n",
      "The Grain object has been initialized:\n",
      "/GrainData/GrainDataTable (Table(0,)) ''\n",
      "Microstructure Autodelete: \n",
      " Removing hdf5 file test_microstructure.h5\n"
     ]
    }
   ],
   "source": [
    "# import SampleData class\n",
    "from pymicro.crystal.microstructure import Microstructure \n",
    "\n",
    "# create a microstructure dataset\n",
    "micro = Microstructure(filename='test_microstructure', autodelete=True)\n",
    "\n",
    "# print the content of the microstructure dataset\n",
    "print(micro)\n",
    "\n",
    "# print class attributes that are initialized by the _after_file_open method\n",
    "print('The Grain object has been initialized:')\n",
    "print(micro.grains)\n",
    "\n",
    "# close the dataset\n",
    "del micro"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29d4993c-4bfe-4450-97eb-0fb7ee8a7f6f",
   "metadata": {},
   "source": [
    "The dataset has indeed been created with a content that conforms to the data model prescribed by the `minimal_data_model` method of the `Microstructure` class. Each of these items corresponds to data used systematically to study a polycrystalline material sample. In this context, the implementation of the data model serves the following purposes:\n",
    "\n",
    "* it can be used as a standard data model for polycrystalline data sets, thus promoting data exchange and interoperability\n",
    "* it allows to implement a high level interface with reduced complexity to interact with these data items \n",
    "\n",
    "The interface is provided by the `Microstructure` class, that allows to perform data processings that are frequently used in material science on polycrystalline datasets. In addition, the pre-existing data model facilitates the implementation of new processing functionalities within the class. This is illustrated here by the `grains` class attribute, that has been associated to the `GrainDataTable` data item in the dataset, as shown by the printed information above. This attribute an accessible and explicit object to get information and apply processing on the data describing the grains of the microstructure represented by the dataset. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29252f72-a23f-434f-beb6-d9e284eab3e6",
   "metadata": {},
   "source": [
    "This conclude this short tutorial on creating custom data models with `SampleData`. The `Microstructure` class features and use is detailed in a dedicated part of this User Guide."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "env_dev",
   "language": "python",
   "name": "env_dev"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}