Export: DNeuro¶

N2D2-IP only: available upon request.

Export type: DNeuro_V2: DNeuro RTL export for FPGA.

n2d2 MobileNet_ONNX.ini -seed 1 -w /dev/null -export DNeuro_V2

Introduction¶

DNeuro is a synthetizable dataflow architecture, optimized for deep convolutional neural networks (CNN). It allows a fine grain allocation control of the DSP and memory resources, for each layer in a network. Globally, the FPGA resource usage can be maximized for a given network topology in order to minimize its latency.

The main features of the DNeuro are:

Data flow architecture requiring few memory (potentially no DDR);
Very high use rate of the DSP per cycle (> 90%);
Configurable precision (integers from 2 to 16 bits, typically 8 bits);
Up to 4 MAC/DSP operations per cycle.

The DNeuro is composed of specialized computing blocs, corresponding to specific type and configuration of layers (convolution, max pooling…), that can be chained to form a full neural network. The bloc allocation and chaining is done automatically with N2D2.

Interface¶

The DNeuro interface is extremely simple and behaves like a pipeline/FIFO.

An example of the top-level DNeuro RTL entity is described below, for one input channel and 3 output channels:

-- Input size: 1*640*480
-- Output size: 3*80*60
entity network is
generic (
    constant G_BATCH_SIZE: positive := 1;
    constant G_FIFO_DEPTH: positive := 1;
    constant G_DATA_LENGTH: positive := 8;
    constant G_ACC_S_LENGTH: positive := 18;
    constant G_NB_OUTPUTS_INST_N_1_ENV: positive  := 1;
    constant G_NB_OUTPUTS_MERG_N_1_ENV: positive  := 1
);
port (
    clk             : in std_logic;
    rstn            : in std_logic;
    i_data          : in std_logic_vector ((G_DATA_LENGTH*G_BATCH_SIZE)-1 downto 0);
    i_valid_data    : in std_logic;
    o_en            : out std_logic;
    o_data          : out std_logic_vector ((3*G_DATA_LENGTH*G_BATCH_SIZE)-1 downto 0);
    o_valid_data    : out std_logic;
    i_en            : in std_logic
);
end network;

Supported layers¶

Layer type	Support	Comments
Dropout	n.a.	removed during export
Fc	✓	implemented with Conv during export
InnerProduct \(\rightarrow\) see Fc
Transformation	✗
BatchNorm	n.a.	merged with Conv during export with `-fuse` option
Conv	✓
Concat \(\rightarrow\) implicit for Conv/Deconv/Pool/Fc
Deconv	✗
ElemWise	✓	Sum operation only
EltWise \(\rightarrow\) see ElemWise
Flatten \(\rightarrow\) implicit to Fc/Rbf
LRN	✗
Maxout \(\rightarrow\) see Pool
Padding	✓	merged with Conv/Pool during export
Pool	✓	Max operation only
Resize	✓	NearestNeighbor mode only
Softmax	✗
SortLabel \(\rightarrow\) see .Target*
Unpool	✗
Upscale \(\rightarrow\) see Resize
.Target*	✓	top-1 sorting

Activation type	Support	Specificities
Linear	✓	saturated arithmetic
Logistic	✓	saturation approximation, configurable zero, up to two configurable thresholds
ReLU \(\rightarrow\) see Rectifier
bReLU \(\rightarrow\) see Rectifier
Rectifier	✓	saturated arithmetic (positive values)
Saturation	✓
Softplus	✗
Tanh	✗

Usage¶

Simulation¶

When a network is exported, test vectors are exported automatically too, if the -db-export command line option value is > 0 (by default, the full test set is exported). All the test vectors are exported for the C++ emulator, while only the first image is pre-loaded as a test vector for the RTL simulation in the RTL/NETWORK/TB/network_tb.vhd file. This testbench is configured with a clock frequency of 100MHz (regardless of the EstimationFrequency export parameter). The testbench reads 3 times the same (first) image and outputs the results in the out_file/out.txt file, located in RTL/NETWORK/simu/VsimTOOL for ModelSim.

cd RTL/NETWORK/simu
make vsim

C++ emulation¶

The DNeuro export comes with a C++ bit-accurate emulator.

By default, the provided emulator will use the same parameters as the ones defined in the export. For testing purposes it is possible to change the accumulation size by defining the ACC_NB_BITS variable.

cd EMULATOR
CXXFLAGS="-DACC_NB_BITS=18" make
./dneuro_v2_emulator

When running the emulator, all the exported images are evaluated by default, and a global score is computed from individual images good or bad classifications. It is possible to evaluate a single image with the following command line argument:

./dneuro_v2_emulator -stimulus stimuli/env00.ppm

The stimuli/env00.ppm is an already pre-processed image automatically exported by N2D2 and ready to be feed at the input of the neural network. The emulator generates for each network’s layer an output file layer_name_output.txt containing the output tensor values of the layer, as expected for the DNeuro IP.

Synthesis¶

To generate a project ready for synthesis in Vivado or Quartus, use the scripts provided in RTL/NETWORK/simu/PythonTOOL. To generate a Vivado project, run:

cd RTL/NETWORK/simu
python PythonTOOL/vivadoGenerate.py

This script creates a new project in RTL/NETWORK/simu/VivadoTOOL/project_export_DNeuro. Do not forget to change the default project’s part.

Warning

Do not create a project and add the sources manually, as the sources organization into libraries will not be setup properly: the sources in the directories CONV_COMMON, CONV_Tn_Oy_CHy_K1_Sy_P1 and CONV_Tn_Oy_CHy_K1_Sy_Pn must be placed in libraries of the same name!

Export parameters¶

Extra parameters can be passed during export using the -export-parameters params.ini command line argument. The parameters must be saved in an INI-like file.

List of general available parameters:

Argument [default value]	Description
`NbDSPs`	Set the maximum number of DSPs that the network can use on the FPGA
`NbMemoryBytes`	Set the maximum memory, in bytes, that the network can use on the FPGA
`Network` [network]	Name of the top-level HDL entity
`EstimationFrequency` [200]	Frequency used for the FPS estimation given by the export
`AccumulationNbBits` [2.DATA_LENGTH+4]	Number of bits to use for the accumulation

Output map class conversion to RGB settings:

Argument [default value]	Description
`OutputMapToRGB` [0]	If true (1), add an extra layer at the end of the network that converts the output of the network to an RGB output
`OutputMapToRGBBackgroundClass` []	When `OutputMapToRGB` is 1, set the class that is used for background objects. The overlay color for this class will be transparent
`OutputMapToRGBColorMasks` []	When `OutputMapToRGB` is 1, list of colors to use for the classes
`OutputMapToRGBBinaryThresholdUpper` [0]	Upper threshold for binary outputs
`OutputMapToRGBBinaryThresholdLower` [0]	Lower threshold for binary outputs

Internal per layer settings (for debug purpose only!):

Argument [default value]	Description
`RTLType` []	Specific name of the RTL library module to use for this layer
`NbChannelsInstantiation` []	Specific number of channels to instantiate
`NbOutputsInstantiation` []	Specific number of outputs to instantiate
`KernelHeightInstantiation` []	Specific number of kernel height to instantiate
`KernelWidthInstantiation` []	Specific number of kernel width to instantiate

FPGA compatibility tables¶

Legend:: \(\bullet\) should be OK for the standard 224x224 input, but depends on the resolution;

• should be OK for the standard 224x224 input using also the UltraRAM, but depends on the resolution (Xilinx FPGA only);

• M20K memory may be insufficient depending on the resolution;

• there is a better equivalent neural network (see on the same column);

◦ using an alternative neural network is possible with a small accuracy loss.
Arria 10: Neural networks compatibility table with DNeuro, in terms of memory requirement.

Arria 10	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX	GX
Arria 10	160	220	270	320	480	570	660	900	1150
M20K (MB)	1.12	1.37	1.87	2.12	3.5	4.37	5.25	5.87	6.62
DSP	156	191	830	985	1,368	1,523	1,688	1,518	1,518
Mult. (MAC/c.)	312	382	1,660	1,970	2,736	3,046	3,376	3,036	3,036
MobileNet_v1_0.25	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.5			•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.75					•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_1.0							•	•	•
SqueezeNet_v1.0			• •	• •	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
SqueezeNet_v1.1			• •	• •	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.35				•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.5					•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.75					•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_1.0						•	•	•	•
MobileNet_v2_1.3								•	•
MobileNet_v2_1.4									•
AlexNet			•	•	•	•	•	•	•
VGG-16						•	• ◦	• ◦	• ◦
GoogLeNet							•	•	•
ResNet-18							•	•	•
ResNet-34								•	•
ResNet-50									◦

Stratix 10: Neural networks compatibility table with DNeuro, in terms of memory requirement.

Stratix 10	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX	GX/SX
Stratix 10	400	650	850	1100	1650	2100	2500	2800
M20K (MB)	3.75	6.12	8.5	13.37	14.25	15.87	24.37	28.62
DSP	648	1,152	2,016	2,592	3,145	3,744	5,011	5,760
Mult. (MAC/c.)	1,296	2,304	4,032	5,184	6,290	7,488	10,022	11,520
MobileNet_v1_0.25	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.5	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.75	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_1.0		•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
SqueezeNet_v1.0	• \(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
SqueezeNet_v1.1	• \(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.35	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.5	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.75	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_1.0		•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_1.3			•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_1.4			•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
AlexNet	•	•	•	•	•	•	•	•
VGG-16		• ◦	• ◦	• ◦	• ◦	• ◦	• ◦	• ◦
GoogLeNet		•	•	• •	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
ResNet-18		•	•	•	• •	• •	\(\bullet\)	\(\bullet\)
ResNet-34			•	•	•	•	•	• •
ResNet-50			◦	◦	◦	◦	◦	◦

Zynq UltraScale+: Neural networks compatibility table with DNeuro, in terms of memory requirement.

Zynq UltraScale+	ZU2	ZU3	ZU4	ZU5	ZU6	ZU7	ZU9	ZU11	ZU15	ZU17	ZU19
Zynq UltraScale+	EG	EG	EG	EG	EG	EG	EG	EG	EG	EG	EG
BRAM (MB)	0.66	0.95	0.56	1.02	3.13	1.37	4.01	2.63	3.27	3.5	4.32
UltraRAM (MB)			1.68	2.25		3.37		2.81	3.93	3.58	4.5
Total RAM (MB)	0.66	0.95	2.24	3.27	3.12	4.74	4.01	5.44	7.2	7.08	8.82
DSP	240	360	728	1,248	1,973	1,728	2,520	2,928	3,528	1,590	1,968
Mult. (MAC/c.)	480	720	1,456	2,496	3,946	3,456	5,040	5,856	7,056	3,180	3,936
MobileNet_v1_0.25	•	\(\bullet\)	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.5			•	•	\(\bullet\)	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.75				•	•	•	•	•	•	•	•
MobileNet_v1_1.0						•		•	•	•	•
SqueezeNet_v1.0			• • •	•	•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
SqueezeNet_v1.1			• • •	•	•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.35			•	•	•	•	\(\bullet\)	•	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.5				•	•	•	\(\bullet\)	•	•	•	\(\bullet\)
MobileNet_v2_0.75				•	•	•	•	•	•	•	•
MobileNet_v2_1.0						•	•	•	•	•	•
MobileNet_v2_1.3									•	•	•
MobileNet_v2_1.4									•	•	•
AlexNet			•	•	•	•	•	•	•	•	•
VGG-16						• ◦	• ◦	• ◦	• ◦	• ◦	• ◦
GoogLeNet						•	•	•	•	•	•
ResNet-18						•	•	•	•	•	•
ResNet-34									•	•	•
ResNet-50									◦	◦	◦

Kintex UltraScale+: Neural networks compatibility table with DNeuro, in terms of memory requirement.

Kintex UltraScale+	KU3P	KU5P	KU9P	KU11P	KU13P	KU15P
BRAM (MB)	1.58	2.11	4.01	2.63	3.27	4.32
UltraRAM (MB)	1.68	2.25		2.81	3.93	4.5
Total RAM (MB)	3.26	4.36	4.01	5.44	7.2	8.82
DSP	1,368	1,825	2,520	2,928	3,528	1,968
Mult. (MAC/c.)	2,736	3,650	5,040	5,856	7,056	3,936
MobileNet_v1_0.25	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.5	•	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v1_0.75	•	•	•	•	•	•
MobileNet_v1_1.0				•	•	•
SqueezeNet_v1.0	• •	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
SqueezeNet_v1.1	• •	•	\(\bullet\)	\(\bullet\)	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.35	•	•	\(\bullet\)	•	\(\bullet\)	\(\bullet\)
MobileNet_v2_0.5	•	•	\(\bullet\)	•	•	\(\bullet\)
MobileNet_v2_0.75	•	•	•	•	•	•
MobileNet_v2_1.0		•	•	•	•	•
MobileNet_v2_1.3					•	•
MobileNet_v2_1.4					•	•
AlexNet	•	•	•	•	•	•
VGG-16		•	•	• ◦	• ◦	• ◦
GoogLeNet				•	•	•
ResNet-18				•	•	•
ResNet-34					•	•
ResNet-50					◦	◦

Aerial Imagery Segmentation DEMO¶

Specifications¶

Specifications of the Aerial Imagery Segmentation DEMO:

Feature	DEMO	Max.	Description
Input resolution	VGA	720p
	(640x480)	(1280x720)
Output resolution	80x60	160x90	Native resolution before upscaling
Precision	INT8	INT8
Batch	1	2
NN Complexity	\(\sim\)1GMAC	\(\sim\)2.5GMAC
NN Parameters	\(\sim\)100k
Processing speed	\(\sim\)150 FPS	\(\sim\)120 FPS
Objects detected	8		Transport assets:
			aircraft, large vehicle
			small vehicle, ship
			Ground assets:
			harbor, sport field
			swimming pool, storage tank
FPGA model	Arria 10 SX 270
FPGA DSP blocks	830		2 MAC/DSP block with batch 2
FPGA memory	2.17MB
Mem. usage	1MB	?
FPGA frequency	200 MHz
GMAC/s (th.)	166GMAC/s	332GMAC/s
GMAC/s (real)	\(\sim\)150GMAC/s	\(\sim\)300GMAC/s
MAC/DSP/cycle	0.9	1.8	DSP usage efficiency

Neural network used for the application.¶

Application preview¶

The application preview is a web-based interface allowing to freely navigate on a map and see the segmentation result in real time. Its main characteristics are:

Web interface combining the open source OpenLayers map visualization API and data from either IGN-F/Géoportail or Microsoft Bing Maps;
The neural network is run on a server and the segmentation result is updated and displayed automatically at the right of the aerial view, in real time;
The same interface is run on the tablet computer with the aerial view map in full screen, to be send via to the DNeuro via the HDMI interface.

To generate the application preview, starting from the learned project in N2D2, create a TensorRT export with the following commands:

n2d2 MobileNet_DEMO.ini -export CPP_TensorRT -nbbits -32 -db-export 0
cd export_CPP_TensorRT_float32
make WRAPPER_PYTHON=2.7
cp bin/n2d2_tensorRT_inference.so .
python generate_model.py

Copy the files n2d2_tensorRT_inference.so and n2d2_tensorRT_model.dat in the web server location.

Start the Python web server:

./server.py

Open the application preview in a navigator:

http://127.0.0.1:8888/

DNeuro generation¶

Generate the DNeuro project:

n2d2 MobileNet_DEMO.ini -export DNeuro_V2 -fuse -w weights_normalized -db-export 10 -export-parameters MobileNet_DEMO_DNeuro.ini -calib -1 -calib-reload
cd export_DNeuro_V2_int8

If you do not have a CUDA-capable NVidia GPU installed, you can use instead of .

If the calibration was already done once, it is possible to reload the calibration data with the -calib-reload option.

Description of the arguments:

Argument	Description
`MobileNet_DEMO.ini`	INI model
`-export DNeuro_V2`	Select the DNeuro export type
`-fuse`	Fuse BatchNorm with Conv automatically
`-w weights_normalized`	Use normalized weights for the export (the `weights_normalized` folder is created after the test). This argument is absolutely necessary to avoid weights saturation when converting to 8 bit integers
`-db-export 10`	Specifies the number of stimuli to export for the testbench
`-export-parameters MobileNet_DEMO_DNeuro.ini`	DNeuro parameter file for the export (see section [sec:DNeuroParams])
`-calib -1`	Use automatic calibration for the export. Use the full test dataset for the calibration (-1)
`-calib-reload`	Reload previous calibration data, if it already exists

Example of the output:

...

Generating DNeuro_V2 export to "export_DNeuro_V2_int8":
-> Generating network
Using automatic configuration for the network.
-> Generating emulator network
-> Generating cell conv1
-> Generating cell conv1_3x3_dw
-> Generating cell conv1_1x1
-> Generating cell conv2_3x3_dw
-> Generating cell conv2_1x1
-> Generating cell conv3_3x3_dw
-> Generating cell conv3_1x1
-> Generating cell conv4_3x3_dw
-> Generating cell conv4_1x1
-> Generating cell conv5_3x3_dw
-> Generating cell conv5_1x1
-> Generating cell conv6_3x3_dw
-> Generating cell conv6_1x1
-> Generating cell conv7_1_3x3_dw
-> Generating cell conv7_1_1x1
-> Generating cell conv7_2_3x3_dw
-> Generating cell conv7_2_1x1
-> Generating cell conv7_3_3x3_dw
-> Generating cell conv9_1x1
-> Generating cell resize

Estimated usage per layer:
--conv1--

...

--conv9_1x1--
RTL type: CONV_Tn_Oy_CHy_K1_Sy_Pn
Number of MACs: 5529600
Number of affected DSPs: 6
Number of MACs/DSPs: 921600
Memory for weights (bytes): 1152
Memory used for calculations (bytes): 1536

--resize--
RTL type: RESIZE_NEAREST_NEIGHBOUR
Memory for weights (bytes): 0
Memory used for calculations (bytes): 0


Total number of MACs: 855187968
Total number of used DSPs: 794
Total memory required for weights: 74.72 KiB
Total memory required for calculations: 937.50 KiB
Total memory required: 1012.22 KiB

Available DSPs on FPGA: 830
Available memory on FPGA: 1953.12 KiB

Estimated FPS at 200 Mhz: 162.76 FPS
Slowest cell: conv7_2_1x1


Done!

Run the network on the emulator:

cd EMULATOR
make

Face Detection DEMO¶

This demo uses the open-source AppFaceDetection application that comes with N2D2.

The generate the DNeuro, one must change the IMDBWIKI.ini file as follows:

Uncomment the [database] section, in order to be able to perform a calibration on the dataset (the IMDB-WIKI dataset must be present);
Remove the [post.Transformation-*] sections, which are currently not exportable;
Remove the [fc3.gender] and [fc3.gender.Target], as only single-branch networks are currently supported;
Add a resize block after [fc3.face] and use it as target instead of [fc3.face.Target] in order to obtain an output of the same size as the input.

The end of the IMDBWIKI.ini file should look like:

[fc3.face]
...

[resize]
Input=fc3.face
Type=Resize
NbOutputs=[fc3.face]NbOutputs
Mode=NearestNeighbor
OutputWidth=[sp]SizeX
OutputHeight=[sp]SizeY
ConfigSection=resize.config
[resize.config]
AlignCorners=1

[resize.Target]
LabelsMapping=IMDBWIKI_target_face.dat
NoDisplayLabel=0

[common.config]
...

The DNeuro project can now be generated (it is possible to re-use the export parameter file from the Aerial Imagery Segmentation DEMO):

n2d2 IMDBWIKI.ini -export DNeuro_V2 -fuse -w weights_normalized -db-export 10 -export-parameters MobileNet_DEMO_DNeuro.ini -calib -1 -calib-reload

Example of the output for the IMDBWIKI.ini network with 640x480 input resolution and a 1,000 DSP maximum constraint:

Estimated usage per layer:
--conv1.1--

...

--fc3.face--
RTL type: CONV_Tn_Oy_CHy_K1_Sy_Pn
Number of MACs: 614400
Number of affected DSPs: 1
Number of MACs/DSPs: 614400
Memory for weights (bytes): 128
Memory used for calculations (bytes): 256

--to_rgb--
RTL type: VALUE_TO_RGB
Memory for weights (bytes): 0
Memory used for calculations (bytes): 0

--resize--
RTL type: RESIZE_NEAREST_NEIGHBOUR
Memory for weights (bytes): 0
Memory used for calculations (bytes): 0


Total number of MACs: 10102864576
Total number of used DSPs: 937
Total memory required for weights: 404.30 KiB
Total memory required for calculations: 1024.66 KiB
Total memory required: 1428.95 KiB

Available DSPs on FPGA: 1000
Available memory on FPGA: 1953.12 KiB

Estimated FPS at 200 Mhz: 18.17 FPS
Slowest cell: conv2.2