文書の過去の版を表示しています。

Neural Network Accelerator (NPU/NNE)

CPU core	Clock	core	fp32	fp16	bfloat16	int16	int8	int4
Qualcomm Snapdragon 835 CPU Kryo 280	2.8 + 1.77	4+4	0.139 TFLOPS
Qualcomm Snapdragon 845 CPU Kryo 385	2.45 + 1.9	4+4	0.146 TFLOPS	0.29 TFLOPS
Intel Core i9-9900K	3.6GHz	8	0.922 TFLOPS
Intel Ryzen 9 3950X	3.5GHz	16	1.792 TFLOPS
Mobile NPU/NNE	Clock	core	fp32	fp16	bfloat16	int16	int8	int4
Apple iPhone X Apple A11 Bionic NE							0.6 TOPS
Apple iPhone XS Apple A12 Bionic NE							5 TOPS
Google Pixel 2 VisualCore						3 TOPS ?
Google Pixel 3 VisualCore						3 TOPS ?
Huawei P20 Pro Kirin 970 NPU				1.92 TFLOPS ?
Huawei P30 Pro Kirin 980 NPU				4.22 TFLOPS ?
Samsung Exynos 9820 NPU
Google Edge TPU						4 TOPS
Intel Movidius Compute Stick Myriad 2 VPU
Intel Neural Compute Stick 2 Myriad X VPU							4 TOPS ?
GPU core	Clock	core	fp32	fp16	bfloat16	int16	int8	int4
Google Edge TPU Vivante GC7000Lite	1.0GHz?	16sp?	0.032 TFLOPS	0.064 TFLOPS
NVIDIA Jetson Nano Tegra X Maxwell	0.92GHz	128sp	0.236 TFLOPS	0.472 TFLOPS
AMD RADEON Vega 56	1.47GHz	3584sp	10.54 TFLOPS	21.09 TFLOPS			42.18 TOPS	84.35 TOPS
AMD RADEON Vega 64	1.55GHz	4096sp	12.67 TFLOPS	25.33 TFLOPS			50.66 TOPS	101.32 TOPS
AMD RADEON VII	1.75GHz	3840sp	13.8 TFLOPS	27.7 TFLOPS			55.3 TOPS	110.7 TOPS
NVIDIA GeForce RTX 2060 (Turing)	1.68GHz	240tc 1920sp	6.45 TFLOPS	51.6 (12.90) TFLOPS			103.2 TOPS	206.4 TOPS
NVIDIA GeForce RTX 2060 Super (Turing)	1.65GHz	272tc 2176sp	7.18 TFLOPS	(14.36) TFLOPS
NVIDIA GeForce RTX 2070 (Turing)	1.62GHz	288tc 2304sp	7.46 TFLOPS	59.7 (14.93) TFLOPS			119.4 TOPS	238.9 TOPS
NVIDIA GeForce RTX 2070 Super (Turing)	1.77GHz	320tc 2560sp	9.62 TFLOPS	72.5 (18.12) TFLOPS			145.0 TOPS	290.0 TOPS
NVIDIA GeForce RTX 2080 (Turing)	1.71GHz	368tc 2944sp	10.07 TFLOPS	80.5 (20.14) TFLOPS			161.1 TOPS	322.2 TOPS
NVIDIA GeForce RTX 2080 Super (Turing)	1.81GHz	384tc 3072sp	11.15 TFLOPS	89.2 (22.3) TFLOPS			178.4 TOPS	356.8 TOPS
NVIDIA GeForce RTX 2080 Ti (Turing)	1.55GHz	544tc 4352sp	13.45 TFLOPS	107.6 (26.9) TFLOPS			215.2 TOPS	430.3 TOPS
NVIDIA Quadro RTX 4000 (Turing)	1.55GHz	288tc 2304sp	7.12 TFLOPS	57.0 (14.2) TFLOPS			113.9 TOPS	227.8 TOPS
NVIDIA Quadro RTX 5000 (Turing)	1.81GHz	384tc 3072sp	11.15 TFLOPS	89.2 (22.3) TFLOPS			178.4 TOPS	356.8 TOPS
NVIDIA Quadro RTX 6000/8000 (Turing)	1.77GHz	576tc 4608sp	16.31 TFLOPS	130.5 (32.6) TFLOPS			261.0 TOPS	522.0 TOPS
NVIDIA Quadro Titan V (Volta)	1.46GHz	640tc 5120sp	14.90 TFLOPS	119.2 (29.8) TFLOPS
NVIDIA Tesla V100 (Volta)	1.53GHz	640tc 5120sp	15.67 TFLOPS	125.3 (31.3) TFLOPS

参考にしたもの

NVIDIA TensorCore

Volta
Turing

1 TensorCore = 64 mad , GFLOPS = TensorCore * 128 * GHz

SBC

SBC	SoC	CPU core	core	CPU clock	GPU	sp	GPU clock	GPU fp32	GPU fp16	NPU	NPU int16	RAM	MEM B/W	ROM
Coral Dev Board	NXP i.MX 8M	Cortex-A53	4	1.5 GHz	Vivante GC7000 Lite	16 sp	1.0 GHz	32 GFLOPS	64 GFLOPS	Edge TPU	4 TOPS	LPDDR4-3200 1GB	32bit 12.8 GB/s	eMMC 8GB
NVIDIA Jetson Nano	Tegra X1	Cortex-A57	4	1.4 GHz	Maxwell	128 sp	0.92 GHz	236 GFLOPS	472 GFLOPS	–	–	LPDDR4-3200 4GB	64bit 25.6 GB/s	eMMC 16GB
Raspberry Pi 4	BCM2711	Cortex-A72	4	1.5 GHz	VideoCore VI	sp	0.5 GHz	GFLOPS		–	–	LPDDR4-2400 4GB	?bit ? GB/s	–
Raspberry Pi 3+	BCM2837B0	Cortex-A53	4	1.4 GHz	VideoCore IV	48 sp	0.3 GHz	28.8 GFLOPS		–	–	LPDDR2-900 1GB	32bit 3.6 GB/s	–

HYPERでんち

目次

Neural Network Accelerator (NPU/NNE)

NVIDIA TensorCore

SBC