ai:npu
文書の過去の版を表示しています。
Neural Network Accelerator (NPU/NNE)
CPU core | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
---|---|---|---|---|---|---|---|---|
Qualcomm Snapdragon 835 CPU Kryo 280 | 2.8 + 1.77 | 4+4 | 0.139 TFLOPS | |||||
Qualcomm Snapdragon 845 CPU Kryo 385 | 2.45 + 1.9 | 4+4 | 0.146 TFLOPS | 0.29 TFLOPS | ||||
Intel Core i9-9900K | 3.6GHz | 8 | 0.922 TFLOPS | |||||
Intel Ryzen 9 3950X | 3.5GHz | 16 | 1.792 TFLOPS | |||||
Mobile NPU/NNE | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
Apple iPhone X Apple A11 Bionic NE | 0.6 TOPS | |||||||
Apple iPhone XS Apple A12 Bionic NE | 5 TOPS | |||||||
Google Pixel 2 VisualCore | 3 TOPS ? | |||||||
Google Pixel 3 VisualCore | 3 TOPS ? | |||||||
Huawei P20 Pro Kirin 970 NPU | 1.92 TFLOPS ? | |||||||
Huawei P30 Pro Kirin 980 NPU | 4.22 TFLOPS ? | |||||||
Samsung Exynos 9820 NPU | ||||||||
Google Edge TPU | 4 TOPS | |||||||
Intel Movidius Compute Stick Myriad 2 VPU | ||||||||
Intel Neural Compute Stick 2 Myriad X VPU | 4 TOPS ? | |||||||
GPU core | Clock | core | fp32 | fp16 | bfloat16 | int16 | int8 | int4 |
Google Edge TPU Vivante GC7000Lite | 1.0GHz? | 16sp? | 0.032 TFLOPS | 0.064 TFLOPS | ||||
NVIDIA Jetson Nano Tegra X Maxwell | 0.92GHz | 128sp | 0.236 TFLOPS | 0.472 TFLOPS | ||||
AMD RADEON Vega 56 | 1.47GHz | 3584sp | 10.54 TFLOPS | 21.09 TFLOPS | 42.18 TOPS | 84.35 TOPS | ||
AMD RADEON Vega 64 | 1.55GHz | 4096sp | 12.67 TFLOPS | 25.33 TFLOPS | 50.66 TOPS | 101.32 TOPS | ||
AMD RADEON VII | 1.75GHz | 3840sp | 13.8 TFLOPS | 27.7 TFLOPS | 55.3 TOPS | 110.7 TOPS | ||
NVIDIA GeForce RTX 2060 (Turing) | 1.68GHz | 240tc 1920sp | 6.45 TFLOPS | 51.6 (12.90) TFLOPS | 103.2 TOPS | 206.4 TOPS | ||
NVIDIA GeForce RTX 2060 Super (Turing) | 1.65GHz | 272tc 2176sp | 7.18 TFLOPS | (14.36) TFLOPS | ||||
NVIDIA GeForce RTX 2070 (Turing) | 1.62GHz | 288tc 2304sp | 7.46 TFLOPS | 59.7 (14.93) TFLOPS | 119.4 TOPS | 238.9 TOPS | ||
NVIDIA GeForce RTX 2070 Super (Turing) | 1.77GHz | 320tc 2560sp | 9.62 TFLOPS | 72.5 (18.12) TFLOPS | 145.0 TOPS | 290.0 TOPS | ||
NVIDIA GeForce RTX 2080 (Turing) | 1.71GHz | 368tc 2944sp | 10.07 TFLOPS | 80.5 (20.14) TFLOPS | 161.1 TOPS | 322.2 TOPS | ||
NVIDIA GeForce RTX 2080 Super (Turing) | 1.81GHz | 384tc 3072sp | 11.15 TFLOPS | 89.2 (22.3) TFLOPS | 178.4 TOPS | 356.8 TOPS | ||
NVIDIA GeForce RTX 2080 Ti (Turing) | 1.55GHz | 544tc 4352sp | 13.45 TFLOPS | 107.6 (26.9) TFLOPS | 215.2 TOPS | 430.3 TOPS | ||
NVIDIA Quadro RTX 4000 (Turing) | 1.55GHz | 288tc 2304sp | 7.12 TFLOPS | 57.0 (14.2) TFLOPS | 113.9 TOPS | 227.8 TOPS | ||
NVIDIA Quadro RTX 5000 (Turing) | 1.81GHz | 384tc 3072sp | 11.15 TFLOPS | 89.2 (22.3) TFLOPS | 178.4 TOPS | 356.8 TOPS | ||
NVIDIA Quadro RTX 6000/8000 (Turing) | 1.77GHz | 576tc 4608sp | 16.31 TFLOPS | 130.5 (32.6) TFLOPS | 261.0 TOPS | 522.0 TOPS | ||
NVIDIA Quadro Titan V (Volta) | 1.46GHz | 640tc 5120sp | 14.90 TFLOPS | 119.2 (29.8) TFLOPS | ||||
NVIDIA Tesla V100 (Volta) | 1.53GHz | 640tc 5120sp | 15.67 TFLOPS | 125.3 (31.3) TFLOPS |
参考にしたもの
NVIDIA TensorCore
- Volta
- Turing
1 TensorCore = 64 mad , GFLOPS = TensorCore * 128 * GHz
SBC
SBC | SoC | CPU core | core | CPU clock | GPU | sp | GPU clock | GPU fp32 | GPU fp16 | NPU | NPU int16 | RAM | MEM B/W | ROM |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Coral Dev Board | NXP i.MX 8M | Cortex-A53 | 4 | 1.5 GHz | Vivante GC7000 Lite | 16 sp | 1.0 GHz | 32 GFLOPS | 64 GFLOPS | Edge TPU | 4 TOPS | LPDDR4-3200 1GB | 32bit 12.8 GB/s | eMMC 8GB |
NVIDIA Jetson Nano | Tegra X1 | Cortex-A57 | 4 | 1.4 GHz | Maxwell | 128 sp | 0.92 GHz | 236 GFLOPS | 472 GFLOPS | – | – | LPDDR4-3200 4GB | 64bit 25.6 GB/s | eMMC 16GB |
Raspberry Pi 4 | BCM2711 | Cortex-A72 | 4 | 1.5 GHz | VideoCore VI | sp | 0.5 GHz | GFLOPS | – | – | LPDDR4-2400 4GB | ?bit ? GB/s | – | |
Raspberry Pi 3+ | BCM2837B0 | Cortex-A53 | 4 | 1.4 GHz | VideoCore IV | 48 sp | 0.3 GHz | 28.8 GFLOPS | – | – | LPDDR2-900 1GB | 32bit 3.6 GB/s | – |
ai/npu.1562249501.txt.gz · 最終更新: 2019/07/04 23:11 by oga