Ollama でマルチ GPU 推論
ollama を使用して普通の PC 上でローカル LLM の推論を行っています。
Multi GPU : モデルサイズごとの比較
70b (llama3.3:70b)
● Linux を直接インストールした場合
VRAM | | Processor | VRAM | MEM | CPU | GPU | token/s | host |
| CPU | Ryzen 7 3950X (Zen2) | | 46GB | 100% | 0% | 1.01 tps | 3950X |
4GB | GPUx1 | RADEON RX 6400 | 4GB | 47GB | 92% | 8% | 1.03 tps | 3950X |
8GB | GPUx2 | RADEON RX 6400 + RX 6400 | 4GB+4GB | 49GB | 84% | 16% | 1.05 tps | 3950X |
8GB | GPUx1 | RADEON RX 7600 | 8GB | 47GB | 83% | 17% | 1.12 tps | 3950X |
| CPU | Ryzen 7 9700X (Zen5) | | 46GB | 100% | 0% | 1.22 tps | 9700X |
8GB | GPUx1 | GeForce GTX 1070 | 8GB | 47GB | 83% | 17% | 1.31 tps | 3950X |
8GB | GPUx1 | GeForce GTX 1080 | 8GB | 47GB | 83% | 17% | 1.32 tps | 9700X |
8GB | GPUx1 | RADEON RX Vega 64 | 8GB | 47GB | 83% | 17% | 1.34 tps | 9700X |
16GB | GPUx2 | GeForce GTX 1080 + GTX 1070 | 8GB+8GB | 48GB | 67% | 33% | 1.45 tps | 9700X |
24GB | GPUx3 | RADEON RX 7600 + RADEON RX Vega 64 + RX Vega 56 | 8GB+8GB+8GB | 50GB | 52% | 48% | 1.51 tps | 3950X |
16GB | GPUx2 | RADEON RX Vega 64 + RX Vega 56 | 8GB+8GB | 48GB | 65% | 35% | 1.55 tps | 9700X |
32GB | GPUx2 | GeForce RTX 4060Ti + RTX 4060Ti | 16GB+16GB | 47GB | 32% | 68% | 2.22 tps | 3950X |
40GB | GPUx3 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S | 16GB+16GB+8GB | 49GB | 17% | 83% | 3.08 tps | 3950X |
48GB | GPUx4 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1080 | 16GB+16GB+8GB+8GB | 51GB | 6% | 94% | 4.25 tps | 3950X |
56GB | GPUx5 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1080 + GTX 1070 | 16GB+16GB+8GB+8GB+8GB | 55GB | 0% | 100% | 5.10 tps | 3950X |
● Proxmox 上の VM に GPU パススルーした場合
内容を見る
VRAM | | Processor | OS | VRAM | MEM | CPU | GPU | token/s | host |
| CPU | Ryzen 9 3950X (Zen2) | VM | DDR4-3200 96GB | 46GB | 100% | 0% | 0.69 tps | 3950X |
| CPU | Ryzen 7 9700X (Zen5) | VM | DDR5-5600 96GB | 46GB | 100% | 0% | 0.88 tps | 9700X |
16GB | GPUx2 | GeForce GTX 1080 + GTX 1070 | VM | 8GB+8GB | 48GB | 67% | 33% | 1.07 tps | 3950X |
24GB | GPUx3 | GeForce RTX 2070S + GTX 1080 + GTX 1070 | VM | 8GB+8GB+8GB | 50GB | 52% | 48% | 1.09 tps | 3950X |
24GB | GPUx2 | GeForce RTX 4060Ti + RTX 2070S | VM | 16GB+8GB | 48GB | 49% | 51% | 1.46 tps | 3950X |
32GB | GPUx3 | GeForce RTX 4060Ti + RTX 2070S + GTX 1080 | VM | 16GB+8GB+8GB | 49GB | 35% | 65% | 1.69 tps | 3950X |
40GB | GPUx4 | GeForce RTX 4060Ti + RTX 2070S + GTX 1080 + GTX 1070 | VM | 16GB+8GB+8GB+8GB | 51GB | 21% | 79% | 2.22 tps | 3950X |
32GB | GPUx2 | GeForce RTX 4060Ti + RTX 4060Ti | VM | 16GB+16GB | 49GB | 17% | 83% | 2.76 tps | 3950X |
40GB | GPUx3 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S | VM | 16GB+16GB+8GB | 49GB | 17% | 83% | 2.76 tps | 3950X |
48GB | GPUx4 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1070 | VM | 16GB+16GB+8GB+8GB | 51GB | 5% | 95% | 3.94 tps | 3950X |
56GB | GPUx5 | GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1080 + GTX 1070 | VM | 16GB+16GB+8GB+8GB+8GB | 55GB | 0% | 100% | 4.09 tps | 3950X |
32b (qwen2.5:32b)
● Linux を直接インストールした場合
VRAM | | Processor | OS | VRAM | MEM | CPU | GPU | token/s | host |
| CPU | Ryzen 9 3950X (Zen2) | UB | DDR4-3200 96GB | 22GB | 100% | 0% | 2.16 tps | 3950X |
| CPU | Ryzen 7 9700X (Zen5) | UB | DDR5-5600 96GB | 22GB | 100% | 0% | 2.62 tps | 9700X |
8GB | GPUx2 | RADEON RX 6400 + RX 6400 | UB | 4GB+4GB | 24GB | 66% | 34% | 2.42 tps | 3950X |
8GB | GPUx1 | RADEON RX 7600 | UB | 8GB | 22GB | 63% | 37% | 2.86 tps | 3950X |
8GB | GPUx1 | GeForce GTX 1080 | UB | 8GB | 22GB | 64% | 36% | 3.27 tps | 9700X |
8GB | GPUx1 | RADEON RX Vega 64 | UB | 8GB | 22GB | 63% | 37% | 3.44 tps | 9700X |
16GB | GPUx2 | RADEON RX 7600 + RX Vega 64 | UB | 8GB+8GB | 23GB | 31% | 69% | 4.34 tps | 3950X |
16GB | GPUx2 | GeForce GTX 1080 + GTX 1070 | UB | 8GB+8GB | 23GB | 31% | 69% | 4.41 tps | 9700X |
16GB | GPUx2 | RADEON RX Vega 64 + RX Vega 56 | UB | 8GB+8GB | 23GB | 30% | 70% | 5.10 tps | 9700X |
24GB | GPUx3 | RADEON RX 7600 + RX Vega 64 + RX Vega 56 | UB | 8GB+8GB+8GB | 25GB | 3% | 97% | 9.12 tps | 3950X |
32GB | GPUx2 | GeForce RTX 4060Ti + RTX 4060Ti | UB | 16GB+16GB | 25GB | 0% | 100% | 12.74 tps | 3950X |
● Proxmox 上の VM に GPU パススルー
内容を見る
VRAM | | Processor | OS | VRAM | MEM | CPU | GPU | token/s | host |
| CPU | Ryzen 9 3950X (Zen2) | VM | DDR4-3200 96GB | 22GB | 100% | 0% | 1.23 tps | 3950X |
| CPU | Ryzen 7 9700X (Zen5) | VM | DDR5-5600 96GB | 22GB | 100% | 0% | 1.77 tps | 9700X |
16GB | GPUx2 | GeForce GTX 1070 + GTX 1080 | VM | 8GB+8GB | 23GB | 30% | 70% | 3.37 tps | 3950X |
24GB | GPUx3 | GeForce RTX 2070S + GTX 1080 + GTX 1070 | VM | 8GB+8GB+8GB | 25GB | 3% | 97% | 7.69 tps | 3950X |
24GB | GPUx2 | GeForce RTX 4060Ti + RTX 2070S | VM | 16GB+8GB | 23GB | 0% | 100% | 13.56 tps | 3950X |
32GB | GPUx3 | GeForce RTX 4060Ti + RTX 2070S + GTX 1080 | VM | 16GB+8GB+8GB | 26GB | 0% | 100% | 11.54 tps | 3950X |
32GB | GPUx2 | GeForce RTX 4060Ti + RTX 4060Ti | VM | 16GB+16GB | 25GB | 0% | 100% | 12.46 tps | 3950X |
Multi GPU : 動作環境ごとの比較
Linux を直接インストール
CPU Only : Ryzen 9 3950X |
Model | Memory | CPU | GPU | token/s | host |
llama3.3:70b | 46GB | 100% | 0% | 1.01 tps | 3950X |
qwen2.5:32b | 22GB | 100% | 0% | 2.16 tps | 3950X |
gemma2:27b | 18GB | 100% | 0% | 2.54 tps | 3950X |
phi4:14b | 10GB | 100% | 0% | 4.63 tps | 3950X |
gemma2:9b | 7.7GB | 100% | 0% | 6.94 tps | 3950X |
qwen2.5:7b | 4.8GB | 100% | 0% | 9.27 tps | 3950X |
gemma2:2b | 2.1GB | 100% | 0% | 19.73 tps | 3950X |
CPU Only : Ryzen 7 9700X |
Model | Memory | CPU | GPU | token/s | host |
llama3.3:70b | 46GB | 100% | 0% | 1.22 tps | 9700X |
qwen2.5:32b | 22GB | 100% | 0% | 2.62 tps | 9700X |
gemma2:27b | 18GB | 100% | 0% | 3.14 tps | 9700X |
phi4:14b | 10GB | 100% | 0% | 5.73 tps | 9700X |
gemma2:9b | 7.7GB | 100% | 0% | 8.71 tps | 9700X |
qwen2.5:7b | 4.8GB | 100% | 0% | 11.40 tps | 9700X |
gemma2:2b | 2.1GB | 100% | 0% | 26.96 tps | 9700X |
以前のデータ (proxmox vm)
CPU Only: Ryzen 7 9700X (Zen5) DDR5-5600 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 46GB | | 100% | | 0.88 tps | 9700X |
qwen2.5:32b | 22GB | | 100% | | 1.77 tps | 9700X |
gemma2:27b | 19GB | | 100% | | 1.97 tps | 9700X |
phi4:14b | 11GB | | 100% | | 3.06 tps | 9700X |
gemma2:9b | 9.0GB | | 100% | | 5.01 tps | 9700X |
gemma2:2b | 3.1GB | | 100% | | 18.76 tps | 9700X |
CPU Only: Ryzen 9 3950X (Zen2) DDR4-3200 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 46GB | | 100% | | 0.69 tps | 3950X |
qwen2.5:32b | 22GB | | 100% | | 1.23 tps | 3950X |
gemma2:27b | 19GB | | 100% | | 1.47 tps | 3950X |
phi4:14b | 11GB | | 100% | | 2.25 tps | 3950X |
gemma2:9b | 9.0GB | | 100% | | 3.15 tps | 3950X |
gemma2:2b | 3.1GB | | 100% | | 7.83 tps | 3950X |
GPU x2 (8+8=16GB): GeForce GTX 1070 + GeForce GTX 1080 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 48GB | 6.2, 6.5 | 67% | 33% | 1.07 tps | 3950X |
qwen2.5:32b | 23GB | 6.6, 7.2 | 30% | 70% | 3.37 tps | 3950X |
gemma2:27b | 21GB | 6, 7.3 | 21% | 79% | 4.08 tps | 3950X |
phi4:14b | 14GB | 6, 6 | | 100% | 25.33 tps | |
gemma2:9b | 7.3GB | 6.4 | | 100% | 25.33 tps | (1070) |
gemma2:9b | 7.3GB | 6.4 | | 100% | 28.28 tps | (1080) |
gemma2:2b | 3.6GB | 3 | | 100% | 48.92 tps | (1070) |
gemma2:2b | 3.6GB | 3 | | 100% | 57.77 tps | (1080) |
GPU x2 (8+8=16GB): GeForce RTX 2070 Super + GeForce GTX 1070 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
phi4:14b | 16GB | 6, 6 | | 100% | 19.31 tps | |
GPU x3 (8+8+8=24GB): GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 50GB | 6.7, 6, 6.3 | 52% | 48% | 1.09 tps | 3950X |
qwen2.5:32b | 25GB | 6.7, 7.5, 6 | 3% | 97% | 7.69 tps | 3950X |
gemma2:27b | 23GB | 5.5, 6, 5.5 | | 100% | 12.89 tps | |
phi4:14b | 16GB | 4.2, 4.2, 4.2 | | 100% | 17.38 tps | |
gemma2:9b | 7.3GB | 6.5 | | 100% | 25.03 tps | |
gemma2:2b | 3.6GB | 3 | | 100% | 58.12 tps | |
GPU x2 (16+8=24GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 48GB | 15, 6.2 | 49% | 51% | 1.46 tps | 3950X |
qwen2.5:32b | 23GB | 13, 7 | | 100% | 13.56 tps | |
gemma2:27b | 23GB | 12.5, 7 | | 100% | 15.94 tps | |
phi4:14b | 14GB | 14 | | 100% | 37.38 tps | |
gemma2:9b | 9.4GB | 10 | | 100% | 35.74 tps | |
gemma2:2b | 3.6GB | 3 | | 100% | 86.07 tps | |
GPU x3 (16+8+8=32GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super + GeForce GTX 1080 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 49GB | 14.4, 5.8, 6.7 | 35% | 65% | 1.69 tps | 3950X |
qwen2.5:32b | 26GB | 9, 7.6, 6.7 | | 100% | 11.54 tps | |
gemma2:27b | 25GB | 7, 7, 6.3 | | 100% | 14.69 tps | |
phi4:14b | 12GB | 10 | | 100% | 26.67 tps | |
gemma2:9b | 9.4GB | 8.5 | | 100% | 38.08 tps | |
gemma2:2b | 3.6GB | 3 | | 100% | 92.48 tps | |
GPU x2 (16+16=32GB): GeForce RTX 4060 Ti x2 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 49GB | 14.4, 14 | 17% | 83% | 2.76 tps | 3950X |
qwen2.5:32b | 25GB | 11.2, 11.2 | | 100% | 12.46 tps | |
gemma2:27b | 23GB | 9.7, 9.7 | | 100% | 26.75 tps | |
GPU x4 (16+8+8+8=40GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 51GB | 13.5, 5.3, 5.4, 6.2 | 21% | 79% | 2.22 tps | 3950X |
GPU x3 (16+16+8=40GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 49GB | 14.6, 6.2, 14 | 17% | 83% | 2.76 tps | 3950X |
GPU x4 (16+16+8+8=48GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super + GeForce GTX 1070 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 51GB | 14.6, 6, 13.6, 7 | 5% | 95% | 3.94 tps | 3950X |
GPU x5 (16+16+8+8+8=56GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070 |
Model | Memory | VRAM | CPU | GPU | token/s | CPU |
llama3.3:70b | 55GB | 13.6, 7.7, 14.2, 7, 7 | | 100% | 4.09 tps | |
qwen2.5:32b | 30GB | 5, 5.4, 5, 5, 5 | | 100% | 10.15 tps | |
使用した PC の Spec
PC
CPU | Ryzen 9 3950X | Ryzen 7 9700X (105W mode) |
RAM | DDR4-3200 96GB | DDR5-5600 96GB |
MOTHER | TUF GAMING X570-Plus | TUF GAMING B650M-E |
GPU1 | PCIe 4.0 x16 | PCIe 4.0 x16 |
GPU2 | PCIe 4.0 x4 | PCIe 4.0 x4 M.2 DEG1 OCulink |
GPU3 | PCIe 4.0 x4 M.2 DEG1 OCulink | PCIe 4.0 x4 M.2 DEG1 OCulink |
GPU4 | PCIe 4.0 x4 M.2 DEG1 OCulink | |
GPU5 | PCIe 3.0 x1 | |
GPU6 | PCIe 3.0 x1 | |
OS | Ubuntu 22.04/24.04 | Ubuntu 22.04 |
GPU
GPU | | | VRAM | clock | Mem B/W | PCIe | SM/CU | sp | Shader fp32 | TensorCore | TensorCore int8 |
GeForce RTX 4060 Ti | Ada Lovelace | | 16GB | 2540 MHz | 288 GB/s | PCIe 4.0 x8 | 32 | 4352 sp | 22108.16 GFLOPS | 136 | 176.87 TOPS |
GeForce RTX 2080 Ti | Turing | | 11GB | 1545 MHz | 616 GB/s | PCIe 3.0 x16 | 68 | 4352 sp | 13447.68 GFLOPS | 544 | 215.16 TOPS |
GeForce RTX 2070 Super | Turing | | 8GB | 1770 MHz | 448 GB/s | PCIe 3.0 x16 | 40 | 2560 sp | 9062.40 GFLOPS | 320 | 145.00 TOPS |
GeForce GTX 1080 | Pascal | | 8GB | 1733 MHz | 256 GB/s | PCIe 3.0 x16 | 20 | 2560 sp | 8872.96 GFLOPS | | |
GeForce GTX 1070 | Pascal | | 8GB | 1683 MHz | 256 GB/s | PCIe 3.0 x16 | 15 | 1920 sp | 6462.72 GFLOPS | | |
RADEON RX 7600 | RDNA3 | navi33 gfx1102 | 8GB | 2655 MHz | 288 GB/s | PCIe 4.0 x8 | 32 | 2048 sp | 21749.76 GFLOPS | | |
RADEON RX 6400 | RDNA2 | navi24 gfx1034 | 4GB | 2321 MHz | 128 GB/s | PCIe 4.0 x4 | 12 | 768 sp | 3565.06 GFLOPS | | |
RADEON 610M | RDNA2 | gfx1037 | 8GB | 1900 MHz | 90 GB/s | | 4 | 128 sp | 486.40 GFLOPS | | |
RADEON RX Vega 64 | GCN5 | vega10 gfx900 | 8GB | 1546 MHz | 484 GB/s | PCIe 3.0 x16 | 64 | 4096 sp | 12664.83 GFLOPS | | |
RADEON RX Vega 56 | GCN5 | vega10 gfx900 | 8GB | 1471 MHz | 410 GB/s | PCIe 3.0 x16 | 56 | 3584 sp | 10544.13 GFLOPS | | |