ai:npu
Neural Network Accelerator (NPU/NNE)
CPU core | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
---|---|---|---|---|---|---|---|---|
Qualcomm Snapdragon 835 CPU Kryo 280 | 2.8 + 1.77 | 4+4 | 0.139 TFLOPS | |||||
Qualcomm Snapdragon 845 CPU Kryo 385 | 2.45 + 1.9 | 4+4 | 0.146 TFLOPS | 0.29 TFLOPS | ||||
Intel Core i9-9900K | 3.6GHz | 8 | 0.922 TFLOPS | |||||
Intel Ryzen 9 3950X | 3.5GHz | 16 | 1.792 TFLOPS | |||||
Mobile NPU/NNE | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
Apple iPhone X Apple A11 Bionic NE | 0.6 TOPS | |||||||
Apple iPhone XS Apple A12 Bionic NE | 5 TOPS | |||||||
Google Pixel 2 VisualCore | 3 TOPS ? | |||||||
Google Pixel 3 VisualCore | 3 TOPS ? | |||||||
Huawei P20 Pro Kirin 970 NPU | 1.92 TFLOPS ? | |||||||
Huawei P30 Pro Kirin 980 NPU | 4.22 TFLOPS ? | |||||||
Samsung Exynos 9820 NPU | ||||||||
Google Edge TPU | 4 TOPS | |||||||
Intel Movidius Compute Stick Myriad 2 VPU | ||||||||
Intel Neural Compute Stick 2 Myriad X VPU | 4 TOPS ? | |||||||
GPU core | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
Google Edge TPU Vivante GC7000Lite | 1.0GHz? | 16sp? | 0.032 TFLOPS | 0.064 TFLOPS | ||||
NVIDIA Jetson Nano Tegra X Maxwell | 0.92GHz | 128sp | 0.236 TFLOPS | 0.472 TFLOPS | ||||
AMD RADEON Vega 56 | 1.47GHz | 3584sp | 10.54 TFLOPS | 21.09 TFLOPS | 42.18 TOPS | 84.35 TOPS | ||
AMD RADEON Vega 64 | 1.55GHz | 4096sp | 12.67 TFLOPS | 25.33 TFLOPS | 50.66 TOPS | 101.32 TOPS | ||
AMD RADEON VII | 1.75GHz | 3840sp | 13.8 TFLOPS | 27.7 TFLOPS | 55.3 TOPS | 110.7 TOPS | ||
NVIDIA GeForce RTX 2060 (Turing) | 1.68GHz | 240tc 1920sp | 6.45 TFLOPS | 51.6 (12.90) TFLOPS | 103.2 TOPS | 206.4 TOPS | ||
NVIDIA GeForce RTX 2060 Super (Turing) | 1.65GHz | 272tc 2176sp | 7.18 TFLOPS | 57.4 (14.36) TFLOPS | 114.9 TOPS | 229.8 TOPS | ||
NVIDIA GeForce RTX 2070 (Turing) | 1.62GHz | 288tc 2304sp | 7.46 TFLOPS | 59.7 (14.93) TFLOPS | 119.4 TOPS | 238.9 TOPS | ||
NVIDIA GeForce RTX 2070 Super (Turing) | 1.77GHz | 320tc 2560sp | 9.62 TFLOPS | 72.5 (18.12) TFLOPS | 145.0 TOPS | 290.0 TOPS | ||
NVIDIA GeForce RTX 2080 (Turing) | 1.71GHz | 368tc 2944sp | 10.07 TFLOPS | 80.5 (20.14) TFLOPS | 161.1 TOPS | 322.2 TOPS | ||
NVIDIA GeForce RTX 2080 Super (Turing) | 1.81GHz | 384tc 3072sp | 11.15 TFLOPS | 89.2 (22.3) TFLOPS | 178.4 TOPS | 356.8 TOPS | ||
NVIDIA GeForce RTX 2080 Ti (Turing) | 1.55GHz | 544tc 4352sp | 13.45 TFLOPS | 107.6 (26.9) TFLOPS | 215.2 TOPS | 430.3 TOPS | ||
NVIDIA Quadro RTX 4000 (Turing) | 1.55GHz | 288tc 2304sp | 7.12 TFLOPS | 57.0 (14.2) TFLOPS | 113.9 TOPS | 227.8 TOPS | ||
NVIDIA Quadro RTX 5000 (Turing) | 1.81GHz | 384tc 3072sp | 11.15 TFLOPS | 89.2 (22.3) TFLOPS | 178.4 TOPS | 356.8 TOPS | ||
NVIDIA Quadro RTX 6000/8000 (Turing) | 1.77GHz | 576tc 4608sp | 16.31 TFLOPS | 130.5 (32.6) TFLOPS | 261.0 TOPS | 522.0 TOPS | ||
NVIDIA Quadro Titan V (Volta) | 1.46GHz | 640tc 5120sp | 14.90 TFLOPS | 119.2 (29.8) TFLOPS | ||||
NVIDIA Tesla V100 (Volta) | 1.53GHz | 640tc 5120sp | 15.67 TFLOPS | 125.3 (31.3) TFLOPS |
参考にしたもの
関連
NVIDIA TensorCore
- Volta
- Turing
1 TensorCore = 64 mad , GFLOPS = TensorCore * 128 * GHz
SBC
SBC | SoC | CPU core | IA | core | CPU clock | CPU fp32 | GPU | GPU API | sp | GPU clock | GPU fp32 | GPU fp16 | ROP | NPU | NPU | RAM | MEM B/W | ROM | price |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Coral Dev Board | NXP i.MX 8M | Cortex-A53 | ARMv8.0A | 4 | 1.5 GHz | 48 GFLOPS | Vivante GC7000 Lite | ES3.x | 16 sp | 1.0 GHz | 32 GFLOPS | 64 GFLOPS | 1 | Edge TPU | 4 TOPS | LPDDR4-3200 1GB | 32bit 12.8 GB/s | eMMC 8GB | $150 |
ASUS Tiner Edge T | NXP i.MX 8M | Cortex-A53 | ARMv8.0A | 4 | 1.5 GHz | 48 GFLOPS | Vivante GC7000 Lite | ES3.x | 16 sp | 1.0 GHz | 32 GFLOPS | 64 GFLOPS | 1 | Edge TPU | 4 TOPS | LPDDR4-3200 1GB | 32bit 12.8 GB/s | eMMC 8GB | |
ASUS Tiner Edge R | RK3399Pro | Cortex-A72+A53 | ARMv8.0A | 2+4 | 1.8+1.4 GHz | GFLOPS | Mali-T860MP4 | ES3.x | sp | 800 MHz | GFLOPS | GFLOPS | NPU | 3 TOPS | LPDDR4 4+2GB | 64bit GB/s | eMMC 16GB | ||
NVIDIA Jetson Nano (DevKit) | Tegra X1 | Cortex-A57 | ARMv8.0A | 4 | 1.43 GHz | 46 GFLOPS | Maxwell | ES3.2/GL4.6/Vulkan/CUDA | 128 sp | 922 MHz | 236 GFLOPS | 472 GFLOPS | 16 | – | – | LPDDR4-3200 4GB | 64bit 25.6 GB/s | – | $99 |
NVIDIA Jetson Xavier NX | Xavier NX | Carmel | ARMv8.2A | 6 | 1.4-1.9 GHz | GFLOPS | Volta | ES3.2/GL4.6/Vulkan/CUDA | 384 sp | 1100 MHz | 844.8 GFLOPS | 1689.6 GFLOPS | Tensor Core | 21 TOPS | LPDDR4-3200 8GB | 128bit 51.2 GB/s | eMMC 16GB | $399 | |
Raspberry Pi 4B | BCM2711 | Cortex-A72 | ARMv8.0A | 4 | 1.5 GHz | 48 GFLOPS | VideoCore VI | ES3.x | sp | 500 MHz | GFLOPS | GFLOPS | – | – | LPDDR4-2400 1-8GB | 32bit 9.6 GB/s | – | $35-75 | |
Raspberry Pi 3B+ | BCM2837B0 | Cortex-A53 | ARMv8.0A | 4 | 1.4 GHz | 45 GFLOPS | VideoCore IV | ES2.0 | 48 sp | 300 MHz | 28.8 GFLOPS | – | 4 | – | – | LPDDR2-900 1GB | 32bit 3.6 GB/s | – | $35 |
Raspberry Pi 3B | BCM2837 | Cortex-A53 | ARMv8.0A | 4 | 1.2 GHz | 38 GFLOPS | VideoCore IV | ES2.0 | 48 sp | 300 MHz | 28.8 GFLOPS | – | 4 | – | – | LPDDR2-900 1GB | 32bit 3.6 GB/s | – | $35 |
Raspberry Pi 2B v1.2 | BCM2837 | Cortex-A53 | ARMv8.0A | 4 | 0.9 GHz | 29 GFLOPS | VideoCore IV | ES2.0 | 48 sp | 300 MHz | 28.8 GFLOPS | – | 4 | – | – | LPDDR2-900 1GB | 32bit 3.6 GB/s | – | $35 |
Raspberry Pi 2B | BCM2836 | Cortex-A7 | ARMv7A | 4 | 0.9 GHz | 7 GFLOPS | VideoCore IV | ES2.0 | 48 sp | 250 MHz | 24.0 GFLOPS | – | 4 | – | – | LPDDR2-900 1GB | 32bit 3.6 GB/s | – | $35 |
Raspberry Pi 1B | BCM2835 | ARM1176JFZ-S | ARMv6 | 1 | 0.7 GHz | 0.7 GFLOPS | VideoCore IV | ES2.0 | 48 sp | 250 MHz | 24.0 GFLOPS | – | 4 | – | – | 0.5GB | – | $35 | |
Dragonboard 410c | Snapdragon 410 | Cortex-A53 | ARMv8.0A | 4 | 1.2 GHz | 38 GFLOPS | Adreno 306 | ES3.0 | 24 sp | 450 MHz | 21.6 GFLOPS | – | 2? | – | – | LPDDR3-1066 1GB | 32bit 4.3 GB/s | eMMC 8GB | $75 |
ASUS Tinker Board | RK3288 | Cortex-A17 | ARMv7A | 4 | 1.8 GHz | 58 GFLOPS | Mali-T764MP4 | ES3.x | 68 sp | 600 MHz | 81.6 GFLOPS | 163.2 GFLOPS | 4 | – | – | LPDDR3 2GB | 64bit GB/s | – | $60 |
ASUS Tinker Board S | RK3288 | Cortex-A17 | ARMv7A | 4 | 1.8 GHz | 58 GFLOPS | Mali-T764MP4 | ES3.x | 68 sp | 600 MHz | 81.6 GFLOPS | 163.2 GFLOPS | 4 | – | – | LPDDR3 2GB | 64bit GB/s | eMMC 16GB |
Mali-T760 17sp/core
ai/npu.txt · 最終更新: 2020/06/13 19:51 by oga