CPU の浮動小数点演算能力の詳細
それぞれの演算命令で、1cycle に実行できる演算の数を割り出したものです。
IPC
float 32bit | Scalar (32bit) | SIMD 2 (64bit) | SIMD 4 (128bit) | SIMD 8 (256bit) | SIMD 16 (512bit) |
CPU/SoC | CPU core | FPU | SIMD Width | add | mul | mad/fma | total | add | mul | mad/fma | total | add | mul | mad/fma | total | add | mul | mad/fma | total | add | mul | mad/fma | total |
BCM2835 | ARM1176JZF-S | VFPv2 | 64bit | 64bit mad | 0.5 | 0.5 | 0.5 | 0.5 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
S5PC100 | Cortex-A8 | VFPv3 NEON | 128bit | 128bit mad | 0.1 | 0.1 | 0.1 | 0.1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
BCM2836 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0.25 | – | – | – | – | – | – | – | – |
Apple S1 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0.25 | – | – | – | – | – | – | – | – |
Apple S2 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 1 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.25 | 0.25 | 0.25 | 0.25 | – | – | – | – | – | – | – | – |
Tegra 2 | Cortex-A9 | VFPv3 | 64bit | 64bit mad | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
Apple A5 | Cortex-A9 | VFPv3 NEON | 128bit | 128bit mad | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Tegra 4 | Cortex-A15 | VFPv4 NEON | 128bit | 64bit fma x2 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
BCM2837 | Cortex-A53 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 845 | (Cortex-A55) | AArch64 ASIMD FP16 | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Tegra X1 | Cortex-A57 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
BCM2711 | Cortex-A72 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 835 | (Cortex-A73) | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 845 | (Cortex-A75) | AArch64 ASIMD FP16 | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Apple A6 | Swift | VFPv4 NEON | 128bit | 128bit fma | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Apple A7 | Cyclone | AArch64 ASIMD | 384bit | 128bit add + 128bit fma x2 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | – | – | – | – | – | – | – | – |
Apple A8 | Typhoon | AArch64 ASIMD | 384bit | 128bit add + 128bit fma x2 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | – | – | – | – | – | – | – | – |
Apple A9 | Twister | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | – | – | – | – | – | – | – | – |
Apple A10 | Hurricane | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | – | – | – | – | – | – | – | – |
Apple A11 | Monsoon | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | – | – | – | – | – | – | – | – |
Apple M1 | Firestorm | AArch64 ASIMD FP16 | 512bit | 128bit add/mul + 128bit fma x3 | 4 | 4 | 3 | 4 | 4 | 4 | 3 | 4 | 4 | 4 | 3 | 4 | – | – | – | – | – | – | – | – |
Apple S6 | Icestorm? | AArch64 ASIMD FP16 | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Tegra K1 | Denver | AArch64 ASIMD | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Snapdragon MSM8250 | Scorpion | VFPv3 NEON | 128bit | 128bit mad | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon S4 Pro MSM8264 | Krait | VFPv4 NEON | 128bit | 128bit fma | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 820 | Kryo | AArch64 ASIMD | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Atom Z2560 | Saltwell | SSSE3 | 192bit | 128bit add + 64bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 0.5 | (0.7) | 0.7 | – | – | – | – | – | – | – | – |
Celeron J1900 | SIlvermont | SSE4.2 | 192bit | 128bit add + 64bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 0.5 | (0.7) | 0.7 | – | – | – | – | – | – | – | – |
Atom x7-Z8700 | Airmont | SSE4.2 | 192bit | 128bit add + 64bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 0.5 | (0.7) | 0.7 | – | – | – | – | – | – | – | – |
Core 2 | Penryn | SSE4.1 | 256bit | 128bit add + 128bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 1 | (1) | 2 | – | – | – | – | – | – | – | – |
Core i7-2700 | SandyBridge | AVX | 512bit | 256bit add + 256bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | – | – | – | – |
Core i7-3615QM | IvyBridge | AVX | 512bit | 256bit add + 256bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | – | – | – | – |
Core i7-4790K | Haswell | AVX2/FMA3 | 512bit | 256bit fma/add + 256bit fma/mul | 1 | 2 | 2 | 2 | – | – | – | – | 1 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | – | – | – | – |
Core i7-6700K | Skylake | AVX2/FMA3 | 512bit | 256bit fma + 256bit fma | 2 | 2 | 2 | 2 | – | – | – | – | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | – | – | – | – |
Core i7-1030NG7 | IceLake | AVX512FVLBWDQ | 512bit | 256bit fma + 256bit fma | 2 | 2 | 2 | 2 | – | – | – | – | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
Athlon 5350 | Jaguar | AVX | 256bit | 128bit add + 128bit mul | 1 | 1 | (1) | 2 | – | – | – | – | 1 | 1 | (1) | 2 | 0.5 | 0.5 | (0.5) | 0.5 | – | – | – | – |
A10-7870K | Steamroller | AVX/FMA3 | 256bit | 128bit fma + 128bit fma | 2 | 2 | 2 | 2 | – | – | – | – | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – |
Ryzen 7 1800X | Zen | AVX2/FMA3 | 512bit | 128bit add x2 + 128bit mul x2 | 2 | 2 | 2 | 3 | – | – | – | – | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | – | – | – | – |
Ryzen 5 3400G | Zen+ APU | AVX2/FMA3 | 512bit | 128bit add x2 + 128bit mul x2 | 2 | 2 | 2 | 3 | – | – | – | – | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | – | – | – | – |
Ryzen 7 PRO 4750G | Zen2 APU | AVX2/FMA3 | 1024bit | 256bit add x2 + 256bit mul x2 | 2 | 2 | 2 | 4 | – | – | – | – | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | – | – | – | – |
Ryzen 9 3950X | Zen2 | AVX2/FMA3 | 1024bit | 256bit add x2 + 256bit mul x2 | 2 | 2 | 2 | 4 | – | – | – | – | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | – | – | – | – |
float 64bit | Scalar (64bit) | SIMD 2 (128bit) | SIMD 4 (256bit) | SIMD 8 (512bit) |
CPU/SoC | CPU core | FPU | SIMD Width | add | mul | mad/fma | total | add | mul | mad/fma | total | add | mul | mad/fma | total | add | mul | mad/fma | total |
BCM2835 | ARM1176JZF-S | VFPv2 | 64bit | 64bit mad | 0.5 | 0.5 | 0.5 | 0.5 | – | – | – | – | – | – | – | – | – | – | – | – |
S5PC100 | Cortex-A8 | VFPv3 NEON | 128bit | 128bit mad | 0.1 | 0.1 | 0.1 | 0.1 | – | – | – | – | – | – | – | – | – | – | – | – |
BCM2836 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 0.2 | 0.2 | 0.2 | – | – | – | – | – | – | – | – | – | – | – | – |
Apple S1 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 0.2 | 0.2 | 0.2 | – | – | – | – | – | – | – | – | – | – | – | – |
Apple S2 | Cortex-A7 | VFPv4 NEON | 32bit | 32bit fma | 1 | 0.2 | 0.2 | 0.2 | – | – | – | – | – | – | – | – | – | – | – | – |
Tegra 2 | Cortex-A9 | VFPv3 | 64bit | 64bit mad | 1 | 0.5 | 0.5 | 0.5 | – | – | – | – | – | – | – | – | – | – | – | – |
Apple A5 | Cortex-A9 | VFPv3 NEON | 128bit | 128bit mad | 1 | 0.5 | 0.5 | 0.5 | – | – | – | – | – | – | – | – | – | – | – | – |
Tegra 4 | Cortex-A15 | VFPv4 NEON | 128bit | 64bit fma x2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – | – | – | – | – |
BCM2837 | Cortex-A53 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 845 | (Cortex-A55) | AArch64 ASIMD FP16 | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Tegra X1 | Cortex-A57 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
BCM2711 | Cortex-A72 | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 835 | (Cortex-A73) | AArch64 ASIMD | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Snapdragon 845 | (Cortex-A75) | AArch64 ASIMD FP16 | 128bit | 64bit fma x2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – |
Apple A6 | Swift | VFPv4 NEON | 128bit | 128bit fma | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – | – | – | – | – |
Apple A7 | Cyclone | AArch64 ASIMD | 384bit | 128bit add + 128bit fma x2 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | – | – | – | – | – | – | – | – |
Apple A8 | Typhoon | AArch64 ASIMD | 384bit | 128bit add + 128bit fma x2 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | – | – | – | – | – | – | – | – |
Apple A9 | Twister | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 2 | 2 | 3 | 3 | 2 | 2 | 3 | – | – | – | – | – | – | – | – |
Apple A10 | Hurricane | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | – | – | – | – | – | – | – | – |
Apple A11 | Monsoon | AArch64 ASIMD | 384bit | 128bit fma x3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | – | – | – | – | – | – | – | – |
Apple M1 | Firestorm | AArch64 ASIMD FP16 | 512bit | 128bit add/mul + 128bit fma x3 | 4 | 4 | 3 | 4 | 4 | 4 | 3 | 4 | – | – | – | – | – | – | – | – |
Apple S6 | Icestorm? | AArch64 ASIMD FP16 | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Tegra K1 | Denver | AArch64 ASIMD | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Snapdragon MSM8250 | Scorpion | VFPv3 NEON | 128bit | 128bit mad | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – | – | – | – | – |
Snapdragon S4 Pro MSM8264 | Krait | VFPv4 NEON | 128bit | 128bit fma | 1 | 1 | 1 | 1 | – | – | – | – | – | – | – | – | – | – | – | – |
Snapdragon 820 | Kryo | AArch64 ASIMD | 256bit | 128bit add + 128bit fma | 2 | 1 | 1 | 2 | 2 | 1 | 1 | 2 | – | – | – | – | – | – | – | – |
Atom Z2560 | Saltwell | SSSE3 | 192bit | 128bit add + 64bit mul | 1 | 0.5 | (0.7) | 0.7 | 0.5 | 0.25 | (0.3) | 0.3 | – | – | – | – | – | – | – | – |
Celeron J1900 | SIlvermont | SSE4.2 | 192bit | 128bit add + 64bit mul | 1 | 0.5 | (0.7) | 0.7 | 0.5 | 0.25 | (0.3) | 0.3 | – | – | – | – | – | – | – | – |
Atom x7-Z8700 | Airmont | SSE4.2 | 192bit | 128bit add + 64bit mul | 1 | 0.5 | (0.7) | 0.7 | 0.5 | 0.25 | (0.3) | 0.3 | – | – | – | – | – | – | – | – |
Core 2 | Penryn | SSE4.1 | 256bit | 128bit add + 128bit mul | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | – | – | – | – | – | – | – | – |
Core i7-2700 | SandyBridge | AVX | 512bit | 256bit add + 256bit mul | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | – | – | – | – |
Core i7-3615QM | IvyBridge | AVX | 512bit | 256bit add + 256bit mul | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | – | – | – | – |
Core i7-4790K | Haswell | AVX2/FMA3 | 512bit | 256bit fma/add + 256bit fma/mul | 1 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | – | – | – | – |
Core i7-6700K | Skylake | AVX2/FMA3 | 512bit | 256bit fma + 256bit fma | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | – | – | – | – |
Core i7-1030NG7 | IceLake | AVX512FVLBWDQ | 512bit | 256bit fma + 256bit fma | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 |
Athlon 5350 | Jaguar | AVX | 256bit | 128bit add + 128bit mul | 1 | 1 | (1) | 2 | 1 | 1 | (1) | 2 | 0.5 | 0.5 | (0.5) | 0.5 | – | – | – | – |
A10-7870K | Steamroller | AVX/FMA3 | 256bit | 128bit fma + 128bit fma | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | – | – | – | – |
Ryzen 7 1800X | Zen | AVX2/FMA3 | 512bit | 128bit add x2 + 128bit mul x2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | – | – | – | – |
Ryzen 5 3400G | Zen+ APU | AVX2/FMA3 | 512bit | 128bit add x2 + 128bit mul x2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 3 | 1 | 1 | 1 | 2 | – | – | – | – |
Ryzen 7 PRO 4750G | Zen2 APU | AVX2/FMA3 | 1024bit | 256bit add x2 + 256bit mul x2 | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | – | – | – | – |
Ryzen 9 3950X | Zen2 | AVX2/FMA3 | 1024bit | 256bit add x2 + 256bit mul x2 | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | 2 | 2 | 2 | 4 | – | – | – | – |
FOP
Scalar
Scalar | single float (32bit x1) | double float (64bit x1) | |
CPU | FPU | mul | add | mad | fma | mul | add | mad | fma | |
XBurst JZ4775 | FPU F32 | 0.1 | 0.1 | 0.07 | – | 0.06 | 0.1 | 0.06 | – | ZWatch |
ARM1176JZF-S | VFPv2 | 0.5 | 0.5 | 1 | – | 0.5 | 0.5 | 1 | – | iPhone 3G , Raspberry Pi |
Cortex-A7 | VFPv4 + NEON | 1 | 1 | 2 | 2 | 0.25 | 1 | 0.5 | 0.4 | Raspberry Pi 2 |
Cortex-A8 | VFPv3 + NEON | 0.14 | 0.14 | 0.18 | – | 0.1 | 0.1 | 0.1 | – | iPhone 3GS, Nexus S |
Cortex-A9 | VFPv3 + NEON | 1 | 1 | 2 | – | 0.5 | 1 | 1 | – | Nexus 7 (2012), iPhone 4, Galaxy Nexus |
Cortex-A15 | VFPv4 + NEON | 1 | 1 | 1.4 | 2 | 1 | 1 | 1.4 | 1.4 | Nexus 10 |
Cortex-A53 64 | AArch64 NEON | 2 | 2 | - | 2 | 2 | 2 | – | 2 | Dragonboard 410c |
Cortex-A57 64 | AArch64 NEON | 2 | 2 | - | 2 | 2 | 2 | – | 2 | SHIELD Android TV |
Cortex-A72 64 | AArch64 NEON | 2 | 2 | - | 2.3 | 2 | 2 | – | 2.3 | Fire TV 2015 |
Scorpion | VFPv3 + NEON | 1 | 1 | 2 | – | 0.5 | 1 | 1 | – | Nexus One |
Krait (400) | VFPv4 + NEON | 1 | 1 | 2 | 2 | 1 | 1 | 1.6 | 2 | Nexus 4/5, Nexus 7 (2013) |
Kryo 64 | AArch64 NEON | 1 | 2 | - | 2 | 1 | 2 | – | 2 | HTC 10 |
A6 Swift | VFPv4 + NEON | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | iPhone 5, iPad 4 |
A7 Cyclone 32 | AArch32 NEON | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | iPhone 5s, iPad Air |
A7 Cyclone 64 | AArch64 NEON | 2 | 3 | – | 4 | 2 | 3 | – | 1.6 | iPhone 5s, iPad Air |
A8X Typhoon 64 | AArch64 NEON | 2 | 3 | – | 4 | 2 | 3 | – | 4 | iPad Air 2 |
A9 Twister 64 | AArch64 NEON | 3 | 3 | – | 6 | 2 | 3 | – | 4 | iPhone SE |
Denver 64 | AArch64 NEON | 1 | 2 | – | 2 | 1 | 2 | – | 2 | Nexus 9 |
Atom Bonnell 32 | SSSE3 | 1 | 1 | (2) | – | 0.5 | 1 | (1.5) | – | |
Atom Silvermont 64 | SSE4.2 | 1 | 1 | ? | – | 0.5 | 1 | ? | – | BayTrail |
AMD Jaguar | SSE4.2/AVX | 1 | 1 | (2) | – | 0.5 | 1 | ? | – | Athlon 5350 (Kabini) |
Core2 Penryn 64 | SSE4.1 | 1 | 1 | (2) | – | 1 | 1 | (2) | – | |
Core i7 Sandy 64 | SSE4.2/AVX | 1 | 1 | (2) | – | 1 | 1 | (2) | – | |
Core i7 Ivy 64 | SSE4.2/AVX | 1 | 1 | (2) | – | 1 | 1 | (2) | – | |
Core i7 Haswell 64 | SSE4.2/AVX2/FMA3 | 1.6 | 1 | – | 3.2 | 1.6 | 1 | – | 3.2 | Core i7-4790K |
Celeron Haswell 64 | SSE4.2 | 1.6 | 1 | (1.6) | – | 1.6 | 1 | (1.6) | – | Celeron 2955U |
Core i7 Skylake 64 | SSE4.2/AVX2/FMA3 | 2 | 2 | – | 4 | 2 | 2 | – | 4 | Core i7-6700K |
Ryzen 7 1800X 64 | SSE4.2/AVX2/FMA3 | 2 | 2 | – | 3.2 | 2 | 2 | – | 3.2 | Ryzen 7 1800X |
↑ core あたりの演算能力 (Scalar)
数値は 1 cycle で実行できる演算個数。数値が大きい方が高速
ARM: mad は旧積和命令、fma は Fused multiply add 命令です。fma 対応は VFPv4 以降、AArch64 では fma のみとなっています。
Intel: mad は単独の積和命令ではなく add, mul の interleave 時の数値となっています。 区別するため括弧がついています。
SIMD sp
SIMD (Vector) sp | SIMD2 single fp (32bit x2) | SIMD4 single fp (32bit x4) | SIMD8 single fp (32bit x8) |
CPU | FPU | mul | add | mad | fma | mul | add | mad | fma | mul | add | mad | fma |
Cortex-A7 | VFPv4 + NEON | 1 | 1 | 2 | 2 | 1 | 1 | 2 | 2 | – | – | – | – |
Cortex-A8 | VFPv3 + NEON | 2 | 2 | 4 | – | 2 | 2 | 4 | – | – | – | – | – |
Cortex-A9 | VFPv3 + NEON | 2 | 2 | 4 | – | 2 | 2 | 4 | – | – | – | – | – |
Cortex-A15 | VFPv4 + NEON | 4 | 4 | 8 | 8 | 4 | 4 | 8 | 8 | – | – | – | – |
Cortex-A53 | AArch64 NEON | 4 | 4 | – | 8 | 4 | 4 | – | 8 | – | – | – | – |
Cortex-A57 | AArch64 NEON | 4 | 4 | – | 8 | 4 | 4 | – | 8 | – | – | – | – |
Cortex-A72 | AArch64 NEON | 4 | 4 | – | 8 | 4 | 4 | – | 8 | – | – | – | – |
Scorpion | VFPv3 + NEON | 2 | 2 | 4 | – | 4 | 4 | 8 | – | – | – | – | – |
Krait 400 | VFPv4 + NEON | 2 | 2 | 4 | 4 | 4 | 4 | 8 | 8 | – | – | – | – |
Kyro | AArch64 NEON | 2 | 4 | – | 4 | 2 | 4 | – | 4 | – | – | – | – |
A6 Swift | VFPv4 + NEON | 2 | 2 | 4 | 4 | 4 | 4 | 8 | 8 | – | – | – | – |
A7 Cyclone 32 | AArch32 NEON | 4 | 6 | 8 | 8 | 8 | 12 | 16 | 16 | – | – | – | – |
A7 Cyclone 64 | AArch64 NEON | 4 | 6 | – | 8 | 8 | 12 | – | 16 | – | – | – | – |
A8X Typhoon 64 | AArch64 NEON | 4 | 6 | – | 8 | 8 | 12 | – | 16 | – | – | – | – |
A9 Twister 64 | AArch64 NEON | 6 | 6 | – | 12 | 12 | 12 | – | 24 | – | – | – | – |
Denver 64 | AArch64 NEON | 2 | 3 | – | 4 | 4 | 6 | – | 8 | – | – | – | – |
Atom Bonnell 32 | SSSE3 | – | – | – | – | 2 | 4 | (6) | – | – | – | – | – |
Atom Silvermont 64 | SSE4.2 | – | – | – | – | 2 | 4 | (6) | – | – | – | – | – |
AMD Jaguar 64 | SSE4.2/AVX | – | – | – | – | 4 | 4 | (8) | – | 4 | 4 | (8) | – |
Core2 Penryn 64 | SSE4.1 | – | – | – | – | 4 | 4 | (8) | – | – | – | – | – |
Core i7 Sandy 64 | SSE4.2/AVX | – | – | – | – | 4 | 4 | (8) | – | 8 | 8 | (16) | – |
Core i7 Ivy 64 | SSE4.2/AVX | – | – | – | – | 4 | 4 | (8) | – | 8 | 8 | (16) | – |
Core i7 Haswell 64 | SSE4.2/AVX2/FMA3 | – | – | – | – | 8 | 4 | (8) | 16 | 16 | 8 | (16) | 32 |
Celeron Haswell 64 | SSE4.2 | – | – | – | – | 8 | 4 | (8) | – | – | – | – | – |
Core i7 Skylake 64 | SSE4.2/AVX2/FMA3 | – | – | – | – | 8 | 8 | (8) | 16 | 16 | 16 | (16) | 32 |
Ryzen 7 1800X 64 | SSE4.2/AVX2/FMA3 | – | – | – | – | 8 | 8 | (12) | 12 | 8 | 8 | (16) | 16 |
↑ core あたりの演算能力 (Vector) sp
数値は 1 cycle で実行できる演算数。数値が大きいほうが高速
括弧は専用の積和命令を持っていないが加算と乗算命令をペアリングなことを意味しています。
SIMD dp
SIMD (Vector) dp | SIMD2 double fp (64bit x2) | SIMD4 double fp (64bit x4) |
CPU | FPU | mul | add | mad | fma | mul | add | mad | fma |
Cortex-A53 | AArch64 NEON | 2 | 2 | – | 4 | – | – | – | – |
Cortex-A57 | AArch64 NEON | 2 | 2 | – | 4 | – | – | – | – |
Cortex-A72 | AArch64 NEON | 2 | 2 | – | 4 | – | – | – | – |
A7 Cyclone 64 | AArch64 NEON | 4 | 6 | – | 8 | – | – | – | – |
A8 Typhoon 64 | AArch64 NEON | 4 | 6 | – | 8 | – | – | – | – |
A9 Twister 64 | AArch64 NEON | 4 | 6 | – | 8 | – | – | – | – |
Kyro 64 | AArch64 NEON | 1 | 2 | – | 2 | – | – | – | – |
Denver 64 | AArch64 NEON | 2 | 3 | – | 4 | – | – | – | – |
Atom Bonnell 32 | SSSE3 | 0.4 | 0.5 | – | – | – | – | – | – |
Atom Silvermont 64 | SSE4.2 | 0.5 | 1 | (1.5) | – | – | – | – | – |
AMD Jaguar | SSE4.2/AVX | 1 | 2 | (3) | – | 1 | 2 | (3) | – |
Core2 Penryn 64 | SSE4.1 | 2 | 2 | (3?) | – | – | – | – | – |
Core i7 Sandy 64 | SSE4.2/AVX | 2 | 2 | (4) | – | 4 | 4 | (8) | – |
Core i7 Ivy 64 | SSE4.2/AVX | 2 | 2 | (4) | – | 4 | 4 | (8) | – |
Core i7 Haswell 64 | SSE4.2/AVX2/FMA3 | 4 | 2 | (4) | 8? | 8 | 4 | (8) | 16? |
Celeron Haswell 64 | SSE4.2 | 4 | 2 | (4) | – | – | – | – | – |
Core i7 Skylake 64 | SSE4.2/AVX2/FMA3 | 4 | 4 | – | 8 | 8 | 8 | – | 16 |
Ryzen 7 1800X 64 | SSE4.2/AVX2/FMA3 | 4 | 4 | – | 6.3 | 4 | 4 | (8) | 8 |