Export: other / legacy
======================

.. role:: raw-html(raw)
   :format: html

.. |check|  unicode:: U+02713 .. CHECK MARK
.. |cross|  unicode:: U+02717 .. BALLOT X

.. |ccheck| replace:: :raw-html:`<font color="green">` |check| :raw-html:`</font>`
.. |ccross| replace:: :raw-html:`<font color="red">` |cross| :raw-html:`</font>`


::

    n2d2 "mnist24_16c4s2_24c5s2_150_10.ini" -export CPP_OpenCL

Export types:

- ``C`` C export using OpenMP;

- ``C_HLS`` C export tailored for HLS with Vivado HLS;

- ``CPP_OpenCL`` C++ export using OpenCL;

- ``CPP_Cuda`` C++ export using Cuda;

- ``CPP_cuDNN`` C++ export using cuDNN;

- ``SC_Spike`` SystemC spike export.


Other program options related to the exports:

+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value]   | Description                                                                                                                                                                                                                                                                                                   |
+==========================+===============================================================================================================================================================================================================================================================================================================+
| ``-nbbits`` [8]          | Number of bits for the weights and signals. Must be 8, 16, 32 or 64 for integer export, or -32, -64 for floating point export. The number of bits can be arbitrary for the ``C_HLS`` export (for example, 6 bits). It must be -32 for the ``CPP_TensorRT`` export, the precision is directly set at runtime   |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-calib`` [0]           | Number of stimuli used for the calibration. 0 = no calibration (default), -1 = use the full test dataset for calibration                                                                                                                                                                                      |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-calib-passes`` [2]    | Number of KL passes for determining the layer output values distribution truncation threshold (0 = use the max. value, no truncation)                                                                                                                                                                         |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-no-unsigned``         | If present, disable the use of unsigned data type in integer exports                                                                                                                                                                                                                                          |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-db-export`` [-1]      | Max. number of stimuli to export (0 = no dataset export, -1 = unlimited)                                                                                                                                                                                                                                      |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

C export
~~~~~~~~

Test the exported network:

::

    cd export_C_int8
    make
    ./bin/n2d2_test

The result should look like:

::

    ...
    1652.00/1762    (avg = 93.757094%)
    1653.00/1763    (avg = 93.760635%)
    1654.00/1764    (avg = 93.764172%)
    Tested 1764 stimuli
    Success rate = 93.764172%
    Process time per stimulus = 187.548186 us (12 threads)

    Confusion matrix:
    -------------------------------------------------
    | T \ E |       0 |       1 |       2 |       3 |
    -------------------------------------------------
    |     0 |     329 |       1 |       5 |       2 |
    |       |  97.63% |   0.30% |   1.48% |   0.59% |
    |     1 |       0 |     692 |       2 |       6 |
    |       |   0.00% |  98.86% |   0.29% |   0.86% |
    |     2 |      11 |      27 |     609 |      55 |
    |       |   1.57% |   3.85% |  86.75% |   7.83% |
    |     3 |       0 |       0 |       1 |      24 |
    |       |   0.00% |   0.00% |   4.00% |  96.00% |
    -------------------------------------------------
    T: Target    E: Estimated

CPP\_OpenCL export
~~~~~~~~~~~~~~~~~~

The OpenCL export can run the generated program in GPU or CPU
architectures. Compilation features:

+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Preprocessor command [default value]   | Description                                                                                                                                                           |
+========================================+=======================================================================================================================================================================+
| ``PROFILING`` [0]                      | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances.   |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``GENERATE_KBIN`` [0]                  | Generate the binary output of the OpenCL kernel .cl file use. The binary is store in the /bin folder.                                                                 |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``LOAD_KBIN`` [0]                      | Indicate to the program to load an OpenCL kernel as a binary from the /bin folder instead of a .cl file.                                                              |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``CUDA`` [0]                           | Use the CUDA OpenCL SDK locate at :math:`{/usr/local/cuda}`                                                                                                           |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``MALI`` [0]                           | Use the MALI OpenCL SDK locate at :math:`{/usr/Mali_OpenCL_SDK_vXXX}`                                                                                                 |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``INTEL`` [0]                          | Use the INTEL OpenCL SDK locate at :math:`{/opt/intel/opencl}`                                                                                                        |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``AMD`` [1]                            | Use the AMD OpenCL SDK locate at :math:`{/opt/AMDAPPSDK-XXX}`                                                                                                         |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Program options related to the OpenCL export:

+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value]   | Description                                                                                                                                                        |
+==========================+====================================================================================================================================================================+
| ``-cpu``                 | If present, force to use a CPU architecture to run the program                                                                                                     |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-gpu``                 | If present, force to use a GPU architecture to run the program                                                                                                     |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-batch`` [1]           | Size of the batch to use                                                                                                                                           |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-stimulus`` [NULL]     | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder.   |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Test the exported network:

::

    cd export_CPP_OpenCL_float32
    make
    ./bin/n2d2_opencl_test -gpu


CPP\_cuDNN export
~~~~~~~~~~~~~~~~~

The cuDNN export can run the generated program in NVIDIA GPU
architecture. It use CUDA and cuDNN library. Compilation features:

+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Preprocessor command [default value]   | Description                                                                                                                                                           |
+========================================+=======================================================================================================================================================================+
| ``PROFILING`` [0]                      | Compile the binary with a synchronization between each layers and return the mean execution time of each layer. This preprocessor option can decrease performances.   |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``ARCH32`` [0]                         | Compile the binary with the 32-bits architecture compatibility.                                                                                                       |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Program options related to the cuDNN export:

+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option [default value]   | Description                                                                                                                                                        |
+==========================+====================================================================================================================================================================+
| ``-batch`` [1]           | Size of the batch to use                                                                                                                                           |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-dev`` [0]             | CUDA Device ID selection                                                                                                                                           |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-stimulus`` [NULL]     | Path to a specific input stimulus to test. For example: -stimulus :math:`{/stimulus/env0000.pgm}` command will test the file env0000.pgm of the stimulus folder.   |
+--------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Test the exported network:

::

    cd export_CPP_cuDNN_float32
    make
    ./bin/n2d2_cudnn_test

C\_HLS export
~~~~~~~~~~~~~

Test the exported network:

::

    cd export_C_HLS_int8
    make
    ./bin/n2d2_test

Run the High-Level Synthesis (HLS) with Xilinx Vivado HLS:

::

    vivado_hls -f run_hls.tcl

Layer compatibility table
~~~~~~~~~~~~~~~~~~~~~~~~~

Layer compatibility table in function of the export type:

+---------------+------------------------------------------------------+
| Layer         | Export type                                          |
| compatibility +----------+-------------+-------------+---------------+
| table         | C        | C\_HLS      | CPP\_OpenCL | CPP\_TensorRT |
+===============+==========+=============+=============+===============+
|Conv           | |ccheck| | |ccheck|    | |ccheck|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Pool           | |ccheck| | |ccheck|    | |ccheck|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Fc             | |ccheck| | |ccheck|    | |ccheck|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Softmax        | |ccheck| | |ccross|    | |ccheck|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|FMP            | |ccheck| | |ccross|    | |ccheck|    | |ccross|      |
+---------------+----------+-------------+-------------+---------------+
|Deconv         | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|ElemWise       | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Resize         | |ccheck| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Padding        | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|LRN            | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|Anchor         | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|ObjectDet      | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|ROIPooling     | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+
|RP             | |ccross| | |ccross|    | |ccross|    | |ccheck|      |
+---------------+----------+-------------+-------------+---------------+


BatchNorm is not mentionned because batch normalization parameters are
automatically fused with convolutions parameters with the command
“-fuse”.