Databases ========= Introduction ------------ The python library integrates pre-defined modules for several well-known database used in the deep learning community, such as MNIST, GTSRB, CIFAR10 and so on. That way, no extra step is necessary to be able to directly build a network and learn it on these database. The library allow you to add pre-process data with built in Transformation. Database -------- The python library provide you with multiple object to manipulate common database. Loading hand made database can be done using :py:class:`n2d2.database.DIR`. Like in the following example : .. testcode:: # Creating the database object db = n2d2.database.DIR() provider = n2d2.provider.DataProvider(db, data_dims) # The zeroes represent the depth to seek the data. db.load(data_path, 0, label_path, 0) # With this line we put all the data in the learn partition: db.partition_stimuli(learn=1, validation=0, test=0) provider.set_partition("Learn") inputs_tensor = provider.read_random_batch() DIR ~~~ Loading a custom database ^^^^^^^^^^^^^^^^^^^^^^^^^ Hand made database stored in files directories are directly supported with the ``DIR_Database`` module. For example, suppose your database is organized as following : - ``GST/airplanes``: 800 images - ``GST/car_side``: 123 images - ``GST/Faces``: 435 images - ``GST/Motorbikes``: 798 images You can then instanciate this database as input of your neural network using the following line: .. code-block:: python database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2) Each subdirectory will be treated as a different label, so there will be 4 different labels, named after the directory name. The stimuli are equi-partitioned for the learning set and the validation set, meaning that the same number of stimuli for each category is used. If the learn fraction is 0.4 and the validation fraction is 0.2, as in the example above, the partitioning will be the following: +-------------+------------------+-------------+------------------+------------+ | Label ID | Label name | Learn set | Validation set | Test set | +-------------+------------------+-------------+------------------+------------+ | [0.5ex] 0 | ``airplanes`` | 49 | 25 | 726 | +-------------+------------------+-------------+------------------+------------+ | 1 | ``car_side`` | 49 | 25 | 49 | +-------------+------------------+-------------+------------------+------------+ | 2 | ``Faces`` | 49 | 25 | 361 | +-------------+------------------+-------------+------------------+------------+ | 3 | ``Motorbikes`` | 49 | 25 | 724 | +-------------+------------------+-------------+------------------+------------+ | | Total: | 196 | 100 | 1860 | +-------------+------------------+-------------+------------------+------------+ .. Note:: If ``equiv_label_partitioning`` is 1 (default setting), the number of stimuli per label that will be partitioned in the learn and validation sets will correspond to the number of stimuli from the label with the fewest stimuli. To load and partition more than one ``DataPath``, one can use the :py:meth:`n2d2.database.Database.load` method. This method will load data in the partition ``Unpartitionned``, you can move the stimuli in the ``Learn``, ``Validation`` or ``Test`` partition using the :py:meth:`n2d2.database.Database.partition_stimuli` method. Handling labelization ^^^^^^^^^^^^^^^^^^^^^ By default, your labels will be ordered by alphabetical order. If you need your label to be in a specific order, you can specify it using an exterior file we will name it ``label.dat`` for this example : .. code-block:: airplanes 0 car_side 1 Motorbikes 3 Faces 2 Then to load the database we will use : .. code-block:: python database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2, label_path="./label.dat", label_depth=0) .. warning:: It is important to specify ``label_depth=0`` if you are specifying ``label_path`` ! .. autoclass:: n2d2.database.DIR :members: :inherited-members: Numpy ~~~~~ The :py:class:`n2d2.database.Numpy` allows to create a database using Numpy array. This can be especially usefull if you already have a dataloader written in Python. .. note:: The labels are optional, this can be usefull if you have previously trained your model and only need data to calibrate you model using the :py:func:`n2d2.quantizer.PTQ` function. Usage example ^^^^^^^^^^^^^ .. code-block:: python import n2d2 import numpy as np db = n2d2.database.Numpy() db.load([ np.ones([1,2,3]), np.zeros([1,2,3]), np.ones([1,2,3]), np.zeros([1,2,3]), ], [ 0, 1, 0, 1 ]) db.partition_stimuli(1., 0., 0.) # Learn Validation Test provider = n2d2.provider.DataProvider(db, [3, 2, 1], batch_size=2) provider.set_partition("Learn") print("First stimuli :") print(next(provider)) .. autoclass:: n2d2.database.Numpy :members: :inherited-members: MNIST ~~~~~ .. autoclass:: n2d2.database.MNIST :members: :inherited-members: ILSVRC2012 ~~~~~~~~~~ .. autoclass:: n2d2.database.ILSVRC2012 :members: :inherited-members: CIFAR10 ~~~~~~~ .. autoclass:: n2d2.database.CIFAR10 :members: :inherited-members: CIFAR100 ~~~~~~~~ .. autoclass:: n2d2.database.CIFAR100 :members: :inherited-members: Cityscapes ~~~~~~~~~~ .. autoclass:: n2d2.database.Cityscapes :members: :inherited-members: GTSRB ~~~~~ .. autoclass:: n2d2.database.GTSRB :members: :inherited-members: Transformations --------------- .. autoclass:: n2d2.transform.Transformation :members: :inherited-members: Composite ~~~~~~~~~ .. autoclass:: n2d2.transform.Composite :members: :inherited-members: PadCrop ~~~~~~~ .. autoclass:: n2d2.transform.PadCrop :members: :inherited-members: Distortion ~~~~~~~~~~ .. autoclass:: n2d2.transform.Distortion :members: :inherited-members: Rescale ~~~~~~~ .. autoclass:: n2d2.transform.Rescale :members: :inherited-members: Reshape ~~~~~~~ .. autoclass:: n2d2.transform.Reshape :members: :inherited-members: ColorSpace ~~~~~~~~~~ .. autoclass:: n2d2.transform.ColorSpace :members: :inherited-members: Flip ~~~~~~~~~~ .. autoclass:: n2d2.transform.Flip :members: :inherited-members: RangeAffine ~~~~~~~~~~~ .. autoclass:: n2d2.transform.RangeAffine :members: :inherited-members: SliceExtraction ~~~~~~~~~~~~~~~ .. autoclass:: n2d2.transform.SliceExtraction :members: :inherited-members: RandomResizeCrop ~~~~~~~~~~~~~~~~ .. autoclass:: n2d2.transform.RandomResizeCrop :members: :inherited-members: ChannelExtraction ~~~~~~~~~~~~~~~~~ .. autoclass:: n2d2.transform.ChannelExtraction :members: :inherited-members: Sending data to the Neural Network ---------------------------------- Once a database is loaded, n2d2 use :py:class:`n2d2.provider.DataProvider` to provide data to the neural network. The :py:class:`n2d2.provider.DataProvider` will automatically apply the :py:class:`n2d2.transform.Transformation` to the dataset. To add a transformation to the provider, you should use the method :py:meth:`n2d2.transform.Transformation.add_transformation`. .. autoclass:: n2d2.provider.DataProvider :members: :inherited-members: Example ------- In this example, we will show you how to create a :py:class:`n2d2.database.Database`, :py:class:`n2d2.provider.Provider` and apply :py:class:`n2d2.transformation.Transformation` to the data. We will use the :py:class:`n2d2.database.MNIST` database driver, rescale the images to a 32x32 pixels size and then print the data used for the learning. .. testcode:: # Loading data database = n2d2.database.MNIST(data_path=path, validation=0.1) # Initializing DataProvider provider = n2d2.provider.DataProvider(database, [32, 32, 1], batch_size=batch_size) # Applying Transformation provider.add_transformation(n2d2.transform.Rescale(width=32, height=32)) # Setting the partition of data we will use provider.set_partition("Learn") # Iterating other the inputs for inputs in provider: print(inputs)