Nemotron 3 Super (120b)

Nemotron 3 Super を Local PC で動かす

Active パラメータ数が多いためか qwen3.5 122b-a10b よりも速度は落ちます。

量子化	Size	OS	CPU	RAM	RAM	GPU	ctx	token/s	software
Q4_K_M	80 GB	Windows11	Ryzen 7 9700X	DDR5-5600	128GB	GeForce RTX 5060Ti 16GB	4096	11.18 tps	LMStudio 0.4.6 CUDA 12 v2.7.0	PC1 cpu=82
Q4_K_M	80 GB	Windows11	Ryzen 7 9700X	DDR5-5600	128GB	GeForce RTX 5060Ti 16GB	4096	11.43 tps	llama.cpp CUDA 12 b8303	PC1
UD_Q4_K_XL	78 GB	Windows11	Ryzen 7 9700X	DDR5-5600	128GB	GeForce RTX 5060Ti 16GB	4096	10.89 tps	llama.cpp CUDA 12 b8303	PC1
UD_Q6_K_XL	109 GB	Windows11	Ryzen 7 9700X	DDR5-5600	128GB	GeForce RTX 5060Ti 16GB	4096	7.92 tps	llama.cpp CUDA 12 b8303	PC1
UD_Q4_K_XL	78 GB	Ubuntu 24.04	Ryzen 9 3950X (65W)	DDR4-3200	128GB	GeForce RTX 4060Ti 16GB	4096	8.3 tps	llama.cpp CUDA 12 b8319	PC2
Q4_K_M	80 GB	Ubuntu 24.04	Ryzen 9 3950X	DDR4-3200	128GB	GeForce RTX 4060Ti 16GB	4096	9.70 tps	llama.cpp CUDA 12 b8482	PC2
Q4_K_M	80 GB	Ubuntu 24.04	Core i7-13700	DDR5-5600	96GB	GeForce RTX 4060Ti 16GB	4096	13.12 tps	llama.cpp CUDA 12 b8446	PC3
Q4_K_M	80 GB	Windows11	Ryzen AI Max+ 395	LPDDR5-8000	128GB	Radeon 8060S	4096	14.48 tps	LMStudio 0.4.6 Vulkan v2.7.1	PC4 EVO-X2
Q4_K_M	80 GB	Windows11	Ryzen AI Max+ 395	LPDDR5-8000	128GB	Radeon 8060S	4096	15.73 tps	llama.cpp Vulkan b8429	PC4 EVO-X2
Q4_K_M	80 GB	Windows11	Ryzen 7 5700X	DDR4-3200	96GB	Radeon RX 9060 XT 16GB	4096	6.97 tps	llama.cpp Vulkan b8429	PC5

PC 1
- CPU: Ryzen 7 9700X (65W Default)
- RAM: DDR5-5600 128GB (128bit)
- GPU: GeForce RTX 5060Ti 16GB
PC 2
- CPU: Ryzen 9 3950X
- RAM: DDR4-3200 128GB (128bit)
- GPU: GeForce RTX 4060Ti 16GB
PC 3
- CPU: Core i7-13700
- RAM: DDR5-5600 96GB (128bit)
- GPU: GeForce RTX 4060Ti 16GB
PC 4
- CPU: Ryzen AI Max+ 395
- RAM: LPDDR5-8000 128GB (256bit)
- GPU: Radeon 8060S
PC 5
- CPU: Ryzen 7 5700X
- RAM: DDR4-3200 96GB (128bit)
- GPU: Radeon RX 9060 XT 16GB

設定など

注意: 計測時は、OS の電力設定を Performance (Windows の場合は最適なパフォーマンス) にしています。

llama.cpp

PC1 Ryzen 7 9700X + DDR5-5600 128GB + GeForce RTX 5060Ti 16GB + Windows 11

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 4096 -t 16 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 1.0

PC2 Ryzen 9 3950X + DDR4-3200 128GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 4096 -t 16 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 1.0

PC3 Core i7-13700 + DDR5-5600 96GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 4096 -t 16 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 1.0

PC4 EVO-X2 Ryzen AI Max+ 395 + LPDDR5-8000 128GB + Radeon 8060S + Windows 11

BIOS (UEFI) で VRAM 割当を 96GB に変更しています。この場合 –no-mmap が必須です。

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 4096 -t 16 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 1.0 --no-mmap

PC5 Ryzen 7 5700X + DDR4-3200 96GB + Radeon RX 9060 XT 16GB + Windows 11

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 4096 -t 16 --host 0.0.0.0 --port 8080 --temp 1.0 --top-p 1.0

Agent 用設定例

PC1 Ryzen 7 9700X + DDR5-5600 128GB + GeForce RTX 5060Ti 16GB + Windows 11

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 65536 -t 16 --host 0.0.0.0 --port 8080 --temp 0.6 --top-p 0.95

PC3 Core i7-13700 + DDR5-5600 96GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS

llama-server --model NVIDIA-Nemotron-3-Super-120B-A12B-Q4_K_M-00001-of-00003.gguf --alias NVIDIA-Nemotron-3-Super-120B-A12B --ctx-size 65536 -t 16 --host 0.0.0.0 --port 8080 --temp 0.6 --top-p 0.95

目次

Nemotron 3 Super (120b)

Nemotron 3 Super を Local PC で動かす

設定など

llama.cpp

PC1 Ryzen 7 9700X + DDR5-5600 128GB + GeForce RTX 5060Ti 16GB + Windows 11

PC2 Ryzen 9 3950X + DDR4-3200 128GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS

PC3 Core i7-13700 + DDR5-5600 96GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS

PC4 EVO-X2 Ryzen AI Max+ 395 + LPDDR5-8000 128GB + Radeon 8060S + Windows 11

PC5 Ryzen 7 5700X + DDR4-3200 96GB + Radeon RX 9060 XT 16GB + Windows 11

Agent 用設定例

PC1 Ryzen 7 9700X + DDR5-5600 128GB + GeForce RTX 5060Ti 16GB + Windows 11

PC3 Core i7-13700 + DDR5-5600 96GB + GeForce RTX 4060Ti 16GB + Linux Ubuntu 24.04LTS