文書の過去の版を表示しています。

CPU の浮動小数点演算能力の詳細

それぞれの演算命令で、1cycle に実行できる演算の数を割り出したものです。

Scalar

↑ core あたりの演算能力 (Scalar)
数値は 1 cycle で実行できる演算個数。数値が大きい方が高速
- 例: mad/fma は 1命令 = 2fop なので、2 のとき mad/fma 命令を 1cycle で実行できることになります。
- 同じように mad/fma が 4 のとき、2命令同時実行可能であることを意味しています。
ARM: mad は旧積和命令、fma は Fused multiply add 命令です。fma 対応は VFPv4 以降、AArch64 では fma のみとなっています。
Intel: mad は単独の積和命令ではなく add, mul の interleave 時の数値となっています。区別するため括弧がついています。

SIMD (Vector) sp		SIMD2 single fp (32bit x2)				SIMD4 single fp (32bit x4)				SIMD8 single fp (32bit x8)
CPU	FPU	mul	add	mad	fma	mul	add	mad	fma	mul	add	mad	fma
Cortex-A7	VFPv4 + NEON	1	1	2	2	1	1	2	2	–	–	–	–
Cortex-A8	VFPv3 + NEON	2	2	4	–	2	2	4	–	–	–	–	–
Cortex-A9	VFPv3 + NEON	2	2	4	–	2	2	4	–	–	–	–	–
Cortex-A15	VFPv4 + NEON	4	4	8	8	4	4	8	8	–	–	–	–
Cortex-A53	AArch64 NEON	4	4	–	8	4	4	–	8	–	–	–	–
Cortex-A57	AArch64 NEON	4	4	–	8	4	4	–	8	–	–	–	–
Cortex-A72	AArch64 NEON	4	4	–	8	4	4	–	8	–	–	–	–
Scorpion	VFPv3 + NEON	2	2	4	–	4	4	8	–	–	–	–	–
Krait 400	VFPv4 + NEON	2	2	4	4	4	4	8	8	–	–	–	–
Kyro	AArch64 NEON	2	4	–	4	2	4	–	4	–	–	–	–
A6 Swift	VFPv4 + NEON	2	2	4	4	4	4	8	8	–	–	–	–
A7 Cyclone 32	AArch32 NEON	4	6	8	8	8	12	16	16	–	–	–	–
A7 Cyclone 64	AArch64 NEON	4	6	–	8	8	12	–	16	–	–	–	–
A8X Typhoon 64	AArch64 NEON	4	6	–	8	8	12	–	16	–	–	–	–
A9 Twister 64	AArch64 NEON	6	6	–	12	12	12	–	24	–	–	–	–
Denver 64	AArch64 NEON	2	3	–	4	4	6	–	8	–	–	–	–
Atom Bonnell 32	SSSE3	–	–	–	–	2	4	(6)	–	–	–	–	–
Atom Silvermont 64	SSE4.2	–	–	–	–	2	4	(6)	–	–	–	–	–
AMD Jaguar 64	SSE4.2/AVX	–	–	–	–	4	4	(8)	–	4	4	(8)	–
Core2 Penryn 64	SSE4.1	–	–	–	–	4	4	(8)	–	–	–	–	–
Core i7 Sandy 64	SSE4.2/AVX	–	–	–	–	4	4	(8)	–	8	8	(16)	–
Core i7 Ivy 64	SSE4.2/AVX	–	–	–	–	4	4	(8)	–	8	8	(16)	–
Core i7 Haswell 64	SSE4.2/AVX2/FMA3	–	–	–	–	8	4	(8)	16	16	8	(16)	32
Celeron Haswell 64	SSE4.2	–	–	–	–	8	4	(8)	–	–	–	–	–

↑ core あたりの演算能力 (Vector) sp
数値は 1 cycle で実行できる演算数。数値が大きいほうが高速
- 例: SIMD4 add は 1命令 = 4fop なので、4 のとき 1cycle で実行できることになります。
- 例: SIMD4 mad/fma は 1命令 = 8fop なので、8 のとき 1cycle で実行できることになります。
括弧は専用の積和命令を持っていないが加算と乗算命令をペアリングなことを意味しています。