Export: DNeuro ============== **N2D2-IP only: available upon request.** .. role:: raw-html(raw) :format: html .. |check| unicode:: U+02713 .. CHECK MARK .. |cross| unicode:: U+02717 .. BALLOT X .. |ccheck| replace:: :raw-html:`` |check| :raw-html:`` .. |ccross| replace:: :raw-html:`` |cross| :raw-html:`` Export type: ``DNeuro_V2`` DNeuro RTL export for FPGA. :: n2d2 MobileNet_ONNX.ini -seed 1 -w /dev/null -export DNeuro_V2 Introduction ------------ DNeuro is a synthetizable dataflow architecture, optimized for deep convolutional neural networks (CNN). It allows a fine grain allocation control of the DSP and memory resources, for each layer in a network. Globally, the FPGA resource usage can be maximized for a given network topology in order to minimize its latency. The main features of the DNeuro are: - Data flow architecture requiring few memory (potentially **no DDR**); - Very high use rate of the DSP per cycle (> 90%); - Configurable precision (integers from 2 to 16 bits, typically 8 bits); - Up to 4 MAC/DSP operations per cycle. The DNeuro is composed of specialized computing blocs, corresponding to specific type and configuration of layers (convolution, max pooling...), that can be chained to form a full neural network. The bloc allocation and chaining is done automatically with N2D2. Interface ~~~~~~~~~ The DNeuro interface is extremely simple and behaves like a pipeline/FIFO. An example of the top-level DNeuro RTL entity is described below, for one input channel and 3 output channels: .. code:: vhdl -- Input size: 1*640*480 -- Output size: 3*80*60 entity network is generic ( constant G_BATCH_SIZE: positive := 1; constant G_FIFO_DEPTH: positive := 1; constant G_DATA_LENGTH: positive := 8; constant G_ACC_S_LENGTH: positive := 18; constant G_NB_OUTPUTS_INST_N_1_ENV: positive := 1; constant G_NB_OUTPUTS_MERG_N_1_ENV: positive := 1 ); port ( clk : in std_logic; rstn : in std_logic; i_data : in std_logic_vector ((G_DATA_LENGTH*G_BATCH_SIZE)-1 downto 0); i_valid_data : in std_logic; o_en : out std_logic; o_data : out std_logic_vector ((3*G_DATA_LENGTH*G_BATCH_SIZE)-1 downto 0); o_valid_data : out std_logic; i_en : in std_logic ); end network; Supported layers ~~~~~~~~~~~~~~~~ +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Layer type | Support | Comments | +=================================================================+===========+========================================================+ | Dropout | n.a. | removed during export | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Fc | |ccheck| | implemented with Conv during export | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *InnerProduct* :math:`\rightarrow` see Fc | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Transformation | |cross| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | BatchNorm | n.a. | merged with Conv during export with ``-fuse`` option | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Conv | |ccheck| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *Concat* :math:`\rightarrow` implicit for Conv/Deconv/Pool/Fc | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Deconv | |cross| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | ElemWise | |ccheck| | *Sum* operation only | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *EltWise* :math:`\rightarrow` see ElemWise | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *Flatten* :math:`\rightarrow` implicit to Fc/Rbf | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | LRN | |cross| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *Maxout* :math:`\rightarrow` see Pool | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Padding | |ccheck| | merged with Conv/Pool during export | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Pool | |ccheck| | *Max* operation only | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Resize | |ccheck| | *NearestNeighbor* mode only | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Softmax | |cross| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *SortLabel* :math:`\rightarrow` see .Target\* | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | Unpool | |cross| | | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | *Upscale* :math:`\rightarrow` see Resize | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ | .Target\* | |ccheck| | top-1 sorting | +-----------------------------------------------------------------+-----------+--------------------------------------------------------+ +---------------------------------------------+-----------+------------------------------------------------+ | Activation type | Support | Specificities | +=============================================+===========+================================================+ | Linear | |ccheck| | saturated arithmetic | +---------------------------------------------+-----------+------------------------------------------------+ | Logistic | |ccheck| | saturation approximation, configurable zero, | | | | up to two configurable thresholds | +---------------------------------------------+-----------+------------------------------------------------+ | *ReLU* :math:`\rightarrow` see Rectifier | +---------------------------------------------+-----------+------------------------------------------------+ | *bReLU* :math:`\rightarrow` see Rectifier | +---------------------------------------------+-----------+------------------------------------------------+ | Rectifier | |ccheck| | saturated arithmetic (positive values) | +---------------------------------------------+-----------+------------------------------------------------+ | Saturation | |ccheck| | | +---------------------------------------------+-----------+------------------------------------------------+ | Softplus | |cross| | | +---------------------------------------------+-----------+------------------------------------------------+ | Tanh | |cross| | | +---------------------------------------------+-----------+------------------------------------------------+ Usage ----- Simulation ~~~~~~~~~~ When a network is exported, test vectors are exported automatically too, if the ``-db-export`` command line option value is > 0 (by default, the full test set is exported). All the test vectors are exported for the C++ emulator, while only the first image is pre-loaded as a test vector for the RTL simulation in the ``RTL/NETWORK/TB/network_tb.vhd`` file. This testbench is configured with a clock frequency of 100MHz (regardless of the ``EstimationFrequency`` export parameter). The testbench reads 3 times the same (first) image and outputs the results in the ``out_file/out.txt`` file, located in ``RTL/NETWORK/simu/VsimTOOL`` for ModelSim. :: cd RTL/NETWORK/simu make vsim C++ emulation ~~~~~~~~~~~~~ The DNeuro export comes with a C++ bit-accurate emulator. By default, the provided emulator will use the same parameters as the ones defined in the export. For testing purposes it is possible to change the accumulation size by defining the ``ACC_NB_BITS`` variable. :: cd EMULATOR CXXFLAGS="-DACC_NB_BITS=18" make ./dneuro_v2_emulator When running the emulator, all the exported images are evaluated by default, and a global score is computed from individual images good or bad classifications. It is possible to evaluate a single image with the following command line argument: :: ./dneuro_v2_emulator -stimulus stimuli/env00.ppm The ``stimuli/env00.ppm`` is an already pre-processed image automatically exported by N2D2 and ready to be feed at the input of the neural network. The emulator generates for each network's layer an output file *layer_name_output.txt* containing the output tensor values of the layer, as expected for the DNeuro IP. Synthesis ~~~~~~~~~ To generate a project ready for synthesis in Vivado or Quartus, use the scripts provided in ``RTL/NETWORK/simu/PythonTOOL``. To generate a Vivado project, run: :: cd RTL/NETWORK/simu python PythonTOOL/vivadoGenerate.py This script creates a new project in ``RTL/NETWORK/simu/VivadoTOOL/project_export_DNeuro``. Do not forget to change the default project's part. .. Warning:: Do not create a project and add the sources manually, as the sources organization into libraries will not be setup properly: the sources in the directories ``CONV_COMMON``, ``CONV_Tn_Oy_CHy_K1_Sy_P1`` and ``CONV_Tn_Oy_CHy_K1_Sy_Pn`` must be placed in libraries of the same name! Export parameters ~~~~~~~~~~~~~~~~~ Extra parameters can be passed during export using the ``-export-parameters params.ini`` command line argument. The parameters must be saved in an INI-like file. List of general available parameters: +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | Argument [default value] | Description | +=================================================================+==========================================================================================================================+ | ``NbDSPs`` | Set the maximum number of DSPs that the network can use on the FPGA | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``NbMemoryBytes`` | Set the maximum memory, in bytes, that the network can use on the FPGA | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``Network`` [network] | Name of the top-level HDL entity | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``EstimationFrequency`` [200] | Frequency used for the FPS estimation given by the export | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``AccumulationNbBits`` [2.DATA_LENGTH+4] | Number of bits to use for the accumulation | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ Output map class conversion to RGB settings: +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | Argument [default value] | Description | +=================================================================+==========================================================================================================================+ | ``OutputMapToRGB`` [0] | If true (1), add an extra layer at the end of the network that converts the output of the network to an RGB output | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``OutputMapToRGBBackgroundClass`` [] | When ``OutputMapToRGB`` is 1, set the class that is used for background objects. The overlay color for this class will | | | be transparent | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``OutputMapToRGBColorMasks`` [] | When ``OutputMapToRGB`` is 1, list of colors to use for the classes | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``OutputMapToRGBBinaryThresholdUpper`` [0] | Upper threshold for binary outputs | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``OutputMapToRGBBinaryThresholdLower`` [0] | Lower threshold for binary outputs | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ Internal per layer settings (for debug purpose only!): +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | Argument [default value] | Description | +=================================================================+==========================================================================================================================+ | ``RTLType`` [] | Specific name of the RTL library module to use for this layer | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``NbChannelsInstantiation`` [] | Specific number of channels to instantiate | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``NbOutputsInstantiation`` [] | Specific number of outputs to instantiate | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``KernelHeightInstantiation`` [] | Specific number of kernel height to instantiate | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ | ``KernelWidthInstantiation`` [] | Specific number of kernel width to instantiate | +-----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+ FPGA compatibility tables ~~~~~~~~~~~~~~~~~~~~~~~~~ .. |ok| replace:: :math:`\bullet` .. |ult| replace:: :raw-html:`` • :raw-html:`` .. |mem| replace:: :raw-html:`` • :raw-html:`` .. |equ| replace:: :raw-html:`` • :raw-html:`` .. |alt| replace:: :raw-html:`` ◦ :raw-html:`` Legend: | |ok| should be OK for the standard 224x224 input, but depends on the resolution; | |ult| should be OK for the standard 224x224 input using also the UltraRAM, but depends on the resolution (Xilinx FPGA only); | |mem| M20K memory may be insufficient depending on the resolution; | |equ| there is a better equivalent neural network (see on the same column); | |alt| using an alternative neural network is possible with a small accuracy loss. Arria 10 Neural networks compatibility table with DNeuro, in terms of memory requirement. +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | **Arria 10** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX** | **GX** | | +-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | | **160** | **220** | **270** | **320** | **480** | **570** | **660** | **900** | **1150** | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | **M20K (MB)** | 1.12 | 1.37 | 1.87 | 2.12 | 3.5 | 4.37 | 5.25 | 5.87 | 6.62 | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | **DSP** | 156 | 191 | 830 | 985 | 1,368 | 1,523 | 1,688 | 1,518 | 1,518 | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | **Mult. (MAC/c.)** | 312 | 382 | 1,660 | 1,970 | 2,736 | 3,046 | 3,376 | 3,036 | 3,036 | +=======================+=============+=============+===================+===================+=============+=============+===================+===================+===================+ | MobileNet_v1_0.25 | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.5 | | | |mem| | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.75 | | | | | |mem| | |mem| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v1_1.0 | | | | | | | |mem| | |mem| | |mem| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.0 | | | |equ| |mem| | |equ| |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.1 | | | |equ| |mem| | |equ| |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.35 | | | | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.5 | | | | | |mem| | |ok| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.75 | | | | | |mem| | |mem| | |ok| | |ok| | |ok| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.0 | | | | | | |mem| | |mem| | |mem| | |mem| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.3 | | | | | | | | |mem| | |mem| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.4 | | | | | | | | | |mem| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | AlexNet | | | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | VGG-16 | | | | | | |equ| | |equ| |alt| | |equ| |alt| | |equ| |alt| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | GoogLeNet | | | | | | | |equ| | |equ| | |equ| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | ResNet-18 | | | | | | | |equ| | |equ| | |equ| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | ResNet-34 | | | | | | | | |equ| | |equ| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ | ResNet-50 | | | | | | | | | |alt| | +-----------------------+-------------+-------------+-------------------+-------------------+-------------+-------------+-------------------+-------------------+-------------------+ Stratix 10 Neural networks compatibility table with DNeuro, in terms of memory requirement. +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **Stratix 10** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | **GX/SX** | | +------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | | **400** | **650** | **850** | **1100** | **1650** | **2100** | **2500** | **2800** | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **M20K (MB)** | 3.75 | 6.12 | 8.5 | 13.37 | 14.25 | 15.87 | 24.37 | 28.62 | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **DSP** | 648 | 1,152 | 2,016 | 2,592 | 3,145 | 3,744 | 5,011 | 5,760 | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **Mult. (MAC/c.)** | 1,296 | 2,304 | 4,032 | 5,184 | 6,290 | 7,488 | 10,022 | 11,520 | +=======================+==================+===================+===================+===================+===================+===================+===================+===================+ | MobileNet_v1_0.25 | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.5 | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.75 | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_1.0 | | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.0 | |equ| |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.1 | |equ| |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.35 | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.5 | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.75 | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.0 | | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.3 | | | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.4 | | | |mem| | |ok| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | AlexNet | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | VGG-16 | | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | GoogLeNet | | |equ| | |equ| | |equ| |mem| | |ok| | |ok| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-18 | | |equ| | |equ| | |equ| | |equ| |mem| | |equ| |mem| | |ok| | |ok| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-34 | | | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| |mem| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-50 | | | |alt| | |alt| | |alt| | |alt| | |alt| | |alt| | +-----------------------+------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ Zynq UltraScale+ Neural networks compatibility table with DNeuro, in terms of memory requirement. +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **Zynq UltraScale+** | **ZU2** | **ZU3** | **ZU4** | **ZU5** | **ZU6** | **ZU7** | **ZU9** | **ZU11** | **ZU15** | **ZU17** | **ZU19** | | +-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | **EG** | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **BRAM (MB)** | 0.66 | 0.95 | 0.56 | 1.02 | 3.13 | 1.37 | 4.01 | 2.63 | 3.27 | 3.5 | 4.32 | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **UltraRAM (MB)** | | | 1.68 | 2.25 | | 3.37 | | 2.81 | 3.93 | 3.58 | 4.5 | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **Total RAM (MB)** | 0.66 | 0.95 | 2.24 | 3.27 | 3.12 | 4.74 | 4.01 | 5.44 | 7.2 | 7.08 | 8.82 | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **DSP** | 240 | 360 | 728 | 1,248 | 1,973 | 1,728 | 2,520 | 2,928 | 3,528 | 1,590 | 1,968 | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | **Mult. (MAC/c.)** | 480 | 720 | 1,456 | 2,496 | 3,946 | 3,456 | 5,040 | 5,856 | 7,056 | 3,180 | 3,936 | +========================+===========+===========+===========================+===========+===========+===================+===================+===================+===================+===================+===================+ | MobileNet_v1_0.25 | |mem| | |ok| | |ult| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.5 | | | |mem| | |ult| | |ok| | |ult| | |ok| | |ok| | |ok| | |ok| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.75 | | | | |mem| | |mem| | |mem| | |mem| | |ult| | |ult| | |ult| | |ult| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v1_1.0 | | | | | | |mem| | | |mem| | |mem| | |mem| | |ult| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.0 | | | |equ| |mem| |ult| | |ult| | |ult| | |ult| | |ok| | |ok| | |ok| | |ok| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.1 | | | |equ| |mem| |ult| | |ult| | |ult| | |ult| | |ok| | |ok| | |ok| | |ok| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.35 | | | |mem| | |ult| | |mem| | |ult| | |ok| | |ult| | |ok| | |ok| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.5 | | | | |mem| | |mem| | |ult| | |ok| | |ult| | |ult| | |ult| | |ok| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.75 | | | | |mem| | |mem| | |mem| | |mem| | |ult| | |ult| | |ult| | |ult| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.0 | | | | | | |mem| | |mem| | |mem| | |ult| | |ult| | |ult| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.3 | | | | | | | | | |mem| | |mem| | |mem| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.4 | | | | | | | | | |mem| | |mem| | |mem| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | AlexNet | | | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | VGG-16 | | | | | | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | |equ| |alt| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | GoogLeNet | | | | | | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-18 | | | | | | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-34 | | | | | | | | | |equ| | |equ| | |equ| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | ResNet-50 | | | | | | | | | |alt| | |alt| | |alt| | +------------------------+-----------+-----------+---------------------------+-----------+-----------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ Kintex UltraScale+ Neural networks compatibility table with DNeuro, in terms of memory requirement. +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **Kintex UltraScale+** | **KU3P** | **KU5P** | **KU9P** | **KU11P** | **KU13P** | **KU15P** | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **BRAM (MB)** | 1.58 | 2.11 | 4.01 | 2.63 | 3.27 | 4.32 | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **UltraRAM (MB)** | 1.68 | 2.25 | | 2.81 | 3.93 | 4.5 | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **Total RAM (MB)** | 3.26 | 4.36 | 4.01 | 5.44 | 7.2 | 8.82 | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **DSP** | 1,368 | 1,825 | 2,520 | 2,928 | 3,528 | 1,968 | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | **Mult. (MAC/c.)** | 2,736 | 3,650 | 5,040 | 5,856 | 7,056 | 3,936 | +==========================+===================+============+============+===================+===================+===================+ | MobileNet_v1_0.25 | |ok| | |ok| | |ok| | |ok| | |ok| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.5 | |ult| | |ult| | |ok| | |ok| | |ok| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v1_0.75 | |mem| | |mem| | |mem| | |ult| | |ult| | |ult| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v1_1.0 | | | | |mem| | |mem| | |ult| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.0 | |equ| |ult| | |ult| | |ok| | |ok| | |ok| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | SqueezeNet_v1.1 | |equ| |ult| | |ult| | |ok| | |ok| | |ok| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.35 | |mem| | |ult| | |ok| | |ult| | |ok| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.5 | |mem| | |ult| | |ok| | |ult| | |ult| | |ok| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_0.75 | |mem| | |mem| | |mem| | |ult| | |ult| | |ult| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.0 | | |mem| | |mem| | |mem| | |ult| | |ult| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.3 | | | | | |mem| | |mem| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | MobileNet_v2_1.4 | | | | | |mem| | |mem| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | AlexNet | |equ| | |equ| | |equ| | |equ| | |equ| | |equ| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | VGG-16 | | |equ| | |equ| | |equ| |alt| | |equ| |alt| | |equ| |alt| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | GoogLeNet | | | | |equ| | |equ| | |equ| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | ResNet-18 | | | | |equ| | |equ| | |equ| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | ResNet-34 | | | | | |equ| | |equ| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ | ResNet-50 | | | | | |alt| | |alt| | +--------------------------+-------------------+------------+------------+-------------------+-------------------+-------------------+ Aerial Imagery Segmentation DEMO -------------------------------- Specifications ~~~~~~~~~~~~~~ Specifications of the Aerial Imagery Segmentation DEMO: +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Feature | DEMO | Max. | Description | +=====================+===========================+================================+======================================+ | Input resolution | VGA | 720p | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | (640x480) | (1280x720) | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Output resolution | 80x60 | 160x90 | Native resolution before upscaling | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Precision | INT8 | INT8 | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Batch | 1 | 2 | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | NN Complexity | :math:`\sim`\ 1GMAC | :math:`\sim`\ 2.5GMAC | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | NN Parameters | :math:`\sim`\ 100k | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Processing speed | :math:`\sim`\ 150 FPS | :math:`\sim`\ 120 FPS | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Objects detected | 8 | Transport assets: | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | | | *aircraft*, *large vehicle* | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | | | *small vehicle*, *ship* | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | | | Ground assets: | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | | | *harbor*, *sport field* | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | | | | *swimming pool*, *storage tank* | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | FPGA model | Arria 10 SX 270 | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | FPGA DSP blocks | 830 | 2 MAC/DSP block with batch 2 | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | FPGA memory | 2.17MB | | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | Mem. usage | 1MB | ? | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | FPGA frequency | 200 MHz | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | GMAC/s (th.) | 166GMAC/s | 332GMAC/s | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | GMAC/s (real) | :math:`\sim`\ 150GMAC/s | :math:`\sim`\ 300GMAC/s | | +---------------------+---------------------------+--------------------------------+--------------------------------------+ | MAC/DSP/cycle | 0.9 | 1.8 | DSP usage efficiency | +---------------------+---------------------------+--------------------------------+--------------------------------------+ .. figure:: ../_static/AerialSegNN.png :alt: Neural network used for the application. Neural network used for the application. Application preview ~~~~~~~~~~~~~~~~~~~ The application preview is a web-based interface allowing to freely navigate on a map and see the segmentation result in real time. Its main characteristics are: - Web interface combining the open source *OpenLayers* map visualization API and data from either *IGN-F/Géoportail* or *Microsoft Bing Maps*; - The neural network is run on a server and the segmentation result is updated and displayed automatically at the right of the aerial view, in real time; - The same interface is run on the tablet computer with the aerial view map in full screen, to be send via to the DNeuro via the HDMI interface. To generate the application preview, starting from the learned project in N2D2, create a TensorRT export with the following commands: :: n2d2 MobileNet_DEMO.ini -export CPP_TensorRT -nbbits -32 -db-export 0 cd export_CPP_TensorRT_float32 make WRAPPER_PYTHON=2.7 cp bin/n2d2_tensorRT_inference.so . python generate_model.py Copy the files ``n2d2_tensorRT_inference.so`` and ``n2d2_tensorRT_model.dat`` in the web server location. Start the Python web server: :: ./server.py Open the application preview in a navigator: :: http://127.0.0.1:8888/ .. figure:: ../_static/AerialSegApp.png :alt: Application preview in the navigator. Application preview in the navigator. DNeuro generation ~~~~~~~~~~~~~~~~~ Generate the DNeuro project: :: n2d2 MobileNet_DEMO.ini -export DNeuro_V2 -fuse -w weights_normalized -db-export 10 -export-parameters MobileNet_DEMO_DNeuro.ini -calib -1 -calib-reload cd export_DNeuro_V2_int8 If you do not have a CUDA-capable NVidia GPU installed, you can use instead of . If the calibration was already done once, it is possible to reload the calibration data with the ``-calib-reload`` option. Description of the arguments: +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Argument | Description | +===========================================================================================+============================================================================================================================================================================================================+ | ``MobileNet_DEMO.ini`` | INI model | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-export DNeuro_V2`` | Select the DNeuro export type | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-fuse`` | Fuse BatchNorm with Conv automatically | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-w weights_normalized`` | Use normalized weights for the export (the ``weights_normalized`` folder is created after the test). This argument is absolutely necessary to avoid weights saturation when converting to 8 bit integers | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-db-export 10`` | Specifies the number of stimuli to export for the testbench | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-export-parameters MobileNet_DEMO_DNeuro.ini`` | DNeuro parameter file for the export (see section [sec:DNeuroParams]) | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-calib -1`` | Use automatic calibration for the export. Use the full test dataset for the calibration (-1) | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ``-calib-reload`` | Reload previous calibration data, if it already exists | +-------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ Example of the output: .. code-block:: console ... Generating DNeuro_V2 export to "export_DNeuro_V2_int8": -> Generating network Using automatic configuration for the network. -> Generating emulator network -> Generating cell conv1 -> Generating cell conv1_3x3_dw -> Generating cell conv1_1x1 -> Generating cell conv2_3x3_dw -> Generating cell conv2_1x1 -> Generating cell conv3_3x3_dw -> Generating cell conv3_1x1 -> Generating cell conv4_3x3_dw -> Generating cell conv4_1x1 -> Generating cell conv5_3x3_dw -> Generating cell conv5_1x1 -> Generating cell conv6_3x3_dw -> Generating cell conv6_1x1 -> Generating cell conv7_1_3x3_dw -> Generating cell conv7_1_1x1 -> Generating cell conv7_2_3x3_dw -> Generating cell conv7_2_1x1 -> Generating cell conv7_3_3x3_dw -> Generating cell conv9_1x1 -> Generating cell resize Estimated usage per layer: --conv1-- ... --conv9_1x1-- RTL type: CONV_Tn_Oy_CHy_K1_Sy_Pn Number of MACs: 5529600 Number of affected DSPs: 6 Number of MACs/DSPs: 921600 Memory for weights (bytes): 1152 Memory used for calculations (bytes): 1536 --resize-- RTL type: RESIZE_NEAREST_NEIGHBOUR Memory for weights (bytes): 0 Memory used for calculations (bytes): 0 Total number of MACs: 855187968 Total number of used DSPs: 794 Total memory required for weights: 74.72 KiB Total memory required for calculations: 937.50 KiB Total memory required: 1012.22 KiB Available DSPs on FPGA: 830 Available memory on FPGA: 1953.12 KiB Estimated FPS at 200 Mhz: 162.76 FPS Slowest cell: conv7_2_1x1 Done! Run the network on the emulator: :: cd EMULATOR make Face Detection DEMO ------------------- This demo uses the open-source *AppFaceDetection* application that comes with N2D2. .. figure:: ../_static/FaceDetectionDEMO.jpg :alt: Face detection DEMO preview on IMDB-WIKI images. Face detection DEMO preview on IMDB-WIKI images. The generate the DNeuro, one must change the *IMDBWIKI.ini* file as follows: - Uncomment the ``[database]`` section, in order to be able to perform a calibration on the dataset (the IMDB-WIKI dataset must be present); - Remove the ``[post.Transformation-*]`` sections, which are currently not exportable; - Remove the ``[fc3.gender]`` and ``[fc3.gender.Target]``, as only single-branch networks are currently supported; - Add a resize block after ``[fc3.face]`` and use it as target instead of ``[fc3.face.Target]`` in order to obtain an output of the same size as the input. The end of the *IMDBWIKI.ini* file should look like: .. code-block:: ini [fc3.face] ... [resize] Input=fc3.face Type=Resize NbOutputs=[fc3.face]NbOutputs Mode=NearestNeighbor OutputWidth=[sp]SizeX OutputHeight=[sp]SizeY ConfigSection=resize.config [resize.config] AlignCorners=1 [resize.Target] LabelsMapping=IMDBWIKI_target_face.dat NoDisplayLabel=0 [common.config] ... The DNeuro project can now be generated (it is possible to re-use the export parameter file from the Aerial Imagery Segmentation DEMO): :: n2d2 IMDBWIKI.ini -export DNeuro_V2 -fuse -w weights_normalized -db-export 10 -export-parameters MobileNet_DEMO_DNeuro.ini -calib -1 -calib-reload Example of the output for the *IMDBWIKI.ini* network with 640x480 input resolution and a 1,000 DSP maximum constraint: .. code-block:: console Estimated usage per layer: --conv1.1-- ... --fc3.face-- RTL type: CONV_Tn_Oy_CHy_K1_Sy_Pn Number of MACs: 614400 Number of affected DSPs: 1 Number of MACs/DSPs: 614400 Memory for weights (bytes): 128 Memory used for calculations (bytes): 256 --to_rgb-- RTL type: VALUE_TO_RGB Memory for weights (bytes): 0 Memory used for calculations (bytes): 0 --resize-- RTL type: RESIZE_NEAREST_NEIGHBOUR Memory for weights (bytes): 0 Memory used for calculations (bytes): 0 Total number of MACs: 10102864576 Total number of used DSPs: 937 Total memory required for weights: 404.30 KiB Total memory required for calculations: 1024.66 KiB Total memory required: 1428.95 KiB Available DSPs on FPGA: 1000 Available memory on FPGA: 1953.12 KiB Estimated FPS at 200 Mhz: 18.17 FPS Slowest cell: conv2.2