Databases¶

Introduction¶

The python library integrates pre-defined modules for several well-known database used in the deep learning community, such as MNIST, GTSRB, CIFAR10 and so on. That way, no extra step is necessary to be able to directly build a network and learn it on these database. The library allow you to add pre-process data with built in Transformation.

Database¶

The python library provide you with multiple object to manipulate common database.

Loading hand made database can be done using n2d2.database.DIR.

Like in the following example :

# Creating the database object
db = n2d2.database.DIR()

provider = n2d2.provider.DataProvider(db, data_dims)

# The zeroes represent the depth to seek the data.
db.load(data_path, 0, label_path, 0)

# With this line we put all the data in the learn partition:
db.partition_stimuli(learn=1, validation=0, test=0)
provider.set_partition("Learn")

inputs_tensor = provider.read_random_batch()

DIR¶

Loading a custom database¶

Hand made database stored in files directories are directly supported with the DIR_Database module. For example, suppose your database is organized as following :

GST/airplanes: 800 images
GST/car_side: 123 images
GST/Faces: 435 images
GST/Motorbikes: 798 images

You can then instanciate this database as input of your neural network using the following line:

database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2)

Each subdirectory will be treated as a different label, so there will be 4 different labels, named after the directory name.

The stimuli are equi-partitioned for the learning set and the validation set, meaning that the same number of stimuli for each category is used. If the learn fraction is 0.4 and the validation fraction is 0.2, as in the example above, the partitioning will be the following:

Label ID	Label name	Learn set	Validation set	Test set
[0.5ex] 0	`airplanes`	49	25	726
1	`car_side`	49	25	49
2	`Faces`	49	25	361
3	`Motorbikes`	49	25	724
	Total:	196	100	1860

Note

If equiv_label_partitioning is 1 (default setting), the number of stimuli per label that will be partitioned in the learn and validation sets will correspond to the number of stimuli from the label with the fewest stimuli.

To load and partition more than one DataPath, one can use the n2d2.database.Database.load() method.

This method will load data in the partition Unpartitionned, you can move the stimuli in the Learn, Validation or Test partition using the n2d2.database.Database.partition_stimuli() method.

Handling labelization¶

By default, your labels will be ordered by alphabetical order. If you need your label to be in a specific order, you can specify it using an exterior file we will name it label.dat for this example :

airplanes 0
car_side 1
Motorbikes 3
Faces 2

Then to load the database we will use :

database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2, label_path="./label.dat", label_depth=0)

Warning

It is important to specify label_depth=0 if you are specifying label_path !

Numpy¶

The n2d2.database.Numpy allows to create a database using Numpy array. This can be especially usefull if you already have a dataloader written in Python.

Note

The labels are optional, this can be usefull if you have previously trained your model and only need data to calibrate you model using the n2d2.quantizer.PTQ() function.

Usage example¶

import n2d2
import numpy as np

db = n2d2.database.Numpy()
db.load([
        np.ones([1,2,3]),
        np.zeros([1,2,3]),
        np.ones([1,2,3]),
        np.zeros([1,2,3]),
],
[
        0,
        1,
        0,
        1
])
db.partition_stimuli(1., 0., 0.) # Learn Validation Test

provider = n2d2.provider.DataProvider(db, [3, 2, 1], batch_size=2)
provider.set_partition("Learn")
print("First stimuli :")
print(next(provider))

MNIST¶

ILSVRC2012¶

CIFAR10¶

CIFAR100¶

Cityscapes¶

GTSRB¶

Transformations¶

Composite¶

PadCrop¶

Distortion¶

Rescale¶

Reshape¶

ColorSpace¶

Flip¶

RangeAffine¶

SliceExtraction¶

RandomResizeCrop¶

ChannelExtraction¶

Sending data to the Neural Network¶

Once a database is loaded, n2d2 use n2d2.provider.DataProvider to provide data to the neural network.

The n2d2.provider.DataProvider will automatically apply the n2d2.transform.Transformation to the dataset. To add a transformation to the provider, you should use the method n2d2.transform.Transformation.add_transformation().

Example¶

In this example, we will show you how to create a n2d2.database.Database, n2d2.provider.Provider and apply n2d2.transformation.Transformation to the data.

We will use the n2d2.database.MNIST database driver, rescale the images to a 32x32 pixels size and then print the data used for the learning.

# Loading data
database = n2d2.database.MNIST(data_path=path, validation=0.1)

# Initializing DataProvider
provider = n2d2.provider.DataProvider(database, [32, 32, 1], batch_size=batch_size)

# Applying Transformation
provider.add_transformation(n2d2.transform.Rescale(width=32, height=32))

# Setting the partition of data we will use
provider.set_partition("Learn")

# Iterating other the inputs
for inputs in provider:
        print(inputs)