Databases

Introduction

The python library integrates pre-defined modules for several well-known database used in the deep learning community, such as MNIST, GTSRB, CIFAR10 and so on. That way, no extra step is necessary to be able to directly build a network and learn it on these database. The library allow you to add pre-process data with built in Transformation.

Database

The python library provide you with multiple object to manipulate common database.

Loading hand made database can be done using n2d2.database.DIR.

Like in the following example :

# Creating the database object
db = n2d2.database.DIR()

provider = n2d2.provider.DataProvider(db, data_dims)

# The zeroes represent the depth to seek the data.
db.load(data_path, 0, label_path, 0)

# With this line we put all the data in the learn partition:
db.partition_stimuli(learn=1, validation=0, test=0)
provider.set_partition("Learn")

inputs_tensor = provider.read_random_batch()

DIR

Loading a custom database

Hand made database stored in files directories are directly supported with the DIR_Database module. For example, suppose your database is organized as following :

  • GST/airplanes: 800 images

  • GST/car_side: 123 images

  • GST/Faces: 435 images

  • GST/Motorbikes: 798 images

You can then instanciate this database as input of your neural network using the following line:

database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2)

Each subdirectory will be treated as a different label, so there will be 4 different labels, named after the directory name.

The stimuli are equi-partitioned for the learning set and the validation set, meaning that the same number of stimuli for each category is used. If the learn fraction is 0.4 and the validation fraction is 0.2, as in the example above, the partitioning will be the following:

Label ID

Label name

Learn set

Validation set

Test set

[0.5ex] 0

airplanes

49

25

726

1

car_side

49

25

49

2

Faces

49

25

361

3

Motorbikes

49

25

724

Total:

196

100

1860

Note

If equiv_label_partitioning is 1 (default setting), the number of stimuli per label that will be partitioned in the learn and validation sets will correspond to the number of stimuli from the label with the fewest stimuli.

To load and partition more than one DataPath, one can use the n2d2.database.Database.load() method.

This method will load data in the partition Unpartitionned, you can move the stimuli in the Learn, Validation or Test partition using the n2d2.database.Database.partition_stimuli() method.

Handling labelization

By default, your labels will be ordered by alphabetical order. If you need your label to be in a specific order, you can specify it using an exterior file we will name it label.dat for this example :

airplanes 0
car_side 1
Motorbikes 3
Faces 2

Then to load the database we will use :

database = n2d2.database.DIR("./GST", learn=0.4, validation=0.2, label_path="./label.dat", label_depth=0)

Warning

It is important to specify label_depth=0 if you are specifying label_path !

Numpy

The n2d2.database.Numpy allows to create a database using Numpy array. This can be especially usefull if you already have a dataloader written in Python.

Note

The labels are optional, this can be usefull if you have previously trained your model and only need data to calibrate you model using the n2d2.quantizer.PTQ() function.

Usage example

import n2d2
import numpy as np

db = n2d2.database.Numpy()
db.load([
        np.ones([1,2,3]),
        np.zeros([1,2,3]),
        np.ones([1,2,3]),
        np.zeros([1,2,3]),
],
[
        0,
        1,
        0,
        1
])
db.partition_stimuli(1., 0., 0.) # Learn Validation Test

provider = n2d2.provider.DataProvider(db, [3, 2, 1], batch_size=2)
provider.set_partition("Learn")
print("First stimuli :")
print(next(provider))

MNIST

ILSVRC2012

CIFAR10

CIFAR100

Cityscapes

GTSRB

Transformations

Composite

PadCrop

Distortion

Rescale

Reshape

ColorSpace

Flip

RangeAffine

SliceExtraction

RandomResizeCrop

ChannelExtraction

Sending data to the Neural Network

Once a database is loaded, n2d2 use n2d2.provider.DataProvider to provide data to the neural network.

The n2d2.provider.DataProvider will automatically apply the n2d2.transform.Transformation to the dataset. To add a transformation to the provider, you should use the method n2d2.transform.Transformation.add_transformation().

Example

In this example, we will show you how to create a n2d2.database.Database, n2d2.provider.Provider and apply n2d2.transformation.Transformation to the data.

We will use the n2d2.database.MNIST database driver, rescale the images to a 32x32 pixels size and then print the data used for the learning.

# Loading data
database = n2d2.database.MNIST(data_path=path, validation=0.1)

# Initializing DataProvider
provider = n2d2.provider.DataProvider(database, [32, 32, 1], batch_size=batch_size)

# Applying Transformation
provider.add_transformation(n2d2.transform.Rescale(width=32, height=32))

# Setting the partition of data we will use
provider.set_partition("Learn")

# Iterating other the inputs
for inputs in provider:
        print(inputs)