ユーザ用ツール

サイト用ツール


ai:ollama

Ollama でマルチ GPU 推論

ollama を使用して普通の PC 上でローカル LLM の推論を行っています。

Multi GPU : モデルサイズごとの比較

  • モデルはすべて Q4

70b (llama3.3:70b)

● Linux を直接インストールした場合

VRAM Processor VRAM MEM CPU GPU token/s host
CPU Ryzen 7 3950X (Zen2) 46GB 100% 0% 1.01 tps 3950X
4GB GPUx1 RADEON RX 6400 4GB 47GB 92% 8% 1.03 tps 3950X
8GB GPUx2 RADEON RX 6400 + RX 6400 4GB+4GB 49GB 84% 16% 1.05 tps 3950X
8GB GPUx1 RADEON RX 7600 8GB 47GB 83% 17% 1.12 tps 3950X
CPU Ryzen 7 9700X (Zen5) 46GB 100% 0% 1.22 tps 9700X
8GB GPUx1 GeForce GTX 1070 8GB 47GB 83% 17% 1.31 tps 3950X
8GB GPUx1 GeForce GTX 1080 8GB 47GB 83% 17% 1.32 tps 9700X
8GB GPUx1 RADEON RX Vega 64 8GB 47GB 83% 17% 1.34 tps 9700X
16GB GPUx2 GeForce GTX 1080 + GTX 1070 8GB+8GB 48GB 67% 33% 1.45 tps 9700X
24GB GPUx3 RADEON RX 7600 + RADEON RX Vega 64 + RX Vega 56 8GB+8GB+8GB 50GB 52% 48% 1.51 tps 3950X
16GB GPUx2 RADEON RX Vega 64 + RX Vega 56 8GB+8GB 48GB 65% 35% 1.55 tps 9700X
32GB GPUx2 GeForce RTX 4060Ti + RTX 4060Ti 16GB+16GB 47GB 32% 68% 2.22 tps 3950X
40GB GPUx3 GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S 16GB+16GB+8GB 49GB 17% 83% 3.08 tps 3950X
48GB GPUx4 GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1080 16GB+16GB+8GB+8GB 51GB 6% 94% 4.25 tps 3950X
56GB GPUx5 GeForce RTX 4060Ti + RTX 4060Ti + RTX 2070S + GTX 1080 + GTX 1070 16GB+16GB+8GB+8GB+8GB 55GB 0% 100% 5.10 tps 3950X
  • OS は Ubuntu 22.04/24.04 を使用
  • 使用した PC が 2種類あるので注意。GPU の割合が低い場合は Host PC による差が生じる可能性あり
    • host = 3950X : Ryzen 9 3950X (Zen2), 16C32T, DDR4-3200 51.2GB/s
    • host = 9700X : Ryzen 7 9700X (Zen5), 8C16T, DDR5-5600 89.6GB/s

● Proxmox 上の VM に GPU パススルーした場合

内容を見る


32b (qwen2.5:32b)

● Linux を直接インストールした場合

VRAM Processor OS VRAM MEM CPU GPU token/s host
CPU Ryzen 9 3950X (Zen2) UB DDR4-3200 96GB 22GB 100% 0% 2.16 tps 3950X
CPU Ryzen 7 9700X (Zen5) UB DDR5-5600 96GB 22GB 100% 0% 2.62 tps 9700X
8GB GPUx2 RADEON RX 6400 + RX 6400 UB 4GB+4GB 24GB 66% 34% 2.42 tps 3950X
8GB GPUx1 RADEON RX 7600 UB 8GB 22GB 63% 37% 2.86 tps 3950X
8GB GPUx1 GeForce GTX 1080 UB 8GB 22GB 64% 36% 3.27 tps 9700X
8GB GPUx1 RADEON RX Vega 64 UB 8GB 22GB 63% 37% 3.44 tps 9700X
16GB GPUx2 RADEON RX 7600 + RX Vega 64 UB 8GB+8GB 23GB 31% 69% 4.34 tps 3950X
16GB GPUx2 GeForce GTX 1080 + GTX 1070 UB 8GB+8GB 23GB 31% 69% 4.41 tps 9700X
16GB GPUx2 RADEON RX Vega 64 + RX Vega 56 UB 8GB+8GB 23GB 30% 70% 5.10 tps 9700X
24GB GPUx3 RADEON RX 7600 + RX Vega 64 + RX Vega 56 UB 8GB+8GB+8GB 25GB 3% 97% 9.12 tps 3950X
32GB GPUx2 GeForce RTX 4060Ti + RTX 4060Ti UB 16GB+16GB 25GB 0% 100% 12.74 tps 3950X

● Proxmox 上の VM に GPU パススルー

内容を見る


Multi GPU : 動作環境ごとの比較

Linux を直接インストール

CPU Only : Ryzen 9 3950X
Model Memory CPU GPU token/s host
llama3.3:70b 46GB 100% 0% 1.01 tps 3950X
qwen2.5:32b 22GB 100% 0% 2.16 tps 3950X
gemma2:27b 18GB 100% 0% 2.54 tps 3950X
phi4:14b 10GB 100% 0% 4.63 tps 3950X
gemma2:9b 7.7GB 100% 0% 6.94 tps 3950X
qwen2.5:7b 4.8GB 100% 0% 9.27 tps 3950X
gemma2:2b 2.1GB 100% 0% 19.73 tps 3950X
CPU Only : Ryzen 7 9700X
Model Memory CPU GPU token/s host
llama3.3:70b 46GB 100% 0% 1.22 tps 9700X
qwen2.5:32b 22GB 100% 0% 2.62 tps 9700X
gemma2:27b 18GB 100% 0% 3.14 tps 9700X
phi4:14b 10GB 100% 0% 5.73 tps 9700X
gemma2:9b 7.7GB 100% 0% 8.71 tps 9700X
qwen2.5:7b 4.8GB 100% 0% 11.40 tps 9700X
gemma2:2b 2.1GB 100% 0% 26.96 tps 9700X

以前のデータ (proxmox vm)

CPU Only: Ryzen 7 9700X (Zen5) DDR5-5600
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 46GB 100% 0.88 tps 9700X
qwen2.5:32b 22GB 100% 1.77 tps 9700X
gemma2:27b 19GB 100% 1.97 tps 9700X
phi4:14b 11GB 100% 3.06 tps 9700X
gemma2:9b 9.0GB 100% 5.01 tps 9700X
gemma2:2b 3.1GB 100% 18.76 tps 9700X
CPU Only: Ryzen 9 3950X (Zen2) DDR4-3200
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 46GB 100% 0.69 tps 3950X
qwen2.5:32b 22GB 100% 1.23 tps 3950X
gemma2:27b 19GB 100% 1.47 tps 3950X
phi4:14b 11GB 100% 2.25 tps 3950X
gemma2:9b 9.0GB 100% 3.15 tps 3950X
gemma2:2b 3.1GB 100% 7.83 tps 3950X
GPU x2 (8+8=16GB): GeForce GTX 1070 + GeForce GTX 1080
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 48GB 6.2, 6.5 67% 33% 1.07 tps 3950X
qwen2.5:32b 23GB 6.6, 7.2 30% 70% 3.37 tps 3950X
gemma2:27b 21GB 6, 7.3 21% 79% 4.08 tps 3950X
phi4:14b 14GB 6, 6 100% 25.33 tps
gemma2:9b 7.3GB 6.4 100% 25.33 tps (1070)
gemma2:9b 7.3GB 6.4 100% 28.28 tps (1080)
gemma2:2b 3.6GB 3 100% 48.92 tps (1070)
gemma2:2b 3.6GB 3 100% 57.77 tps (1080)
GPU x2 (8+8=16GB): GeForce RTX 2070 Super + GeForce GTX 1070
Model Memory VRAM CPU GPU token/s CPU
phi4:14b 16GB 6, 6 100% 19.31 tps
GPU x3 (8+8+8=24GB): GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 50GB 6.7, 6, 6.3 52% 48% 1.09 tps 3950X
qwen2.5:32b 25GB 6.7, 7.5, 6 3% 97% 7.69 tps 3950X
gemma2:27b 23GB 5.5, 6, 5.5 100% 12.89 tps
phi4:14b 16GB 4.2, 4.2, 4.2 100% 17.38 tps
gemma2:9b 7.3GB 6.5 100% 25.03 tps
gemma2:2b 3.6GB 3 100% 58.12 tps
GPU x2 (16+8=24GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 48GB 15, 6.2 49% 51% 1.46 tps 3950X
qwen2.5:32b 23GB 13, 7 100% 13.56 tps
gemma2:27b 23GB 12.5, 7 100% 15.94 tps
phi4:14b 14GB 14 100% 37.38 tps
gemma2:9b 9.4GB 10 100% 35.74 tps
gemma2:2b 3.6GB 3 100% 86.07 tps
GPU x3 (16+8+8=32GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super + GeForce GTX 1080
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 49GB 14.4, 5.8, 6.7 35% 65% 1.69 tps 3950X
qwen2.5:32b 26GB 9, 7.6, 6.7 100% 11.54 tps
gemma2:27b 25GB 7, 7, 6.3 100% 14.69 tps
phi4:14b 12GB 10 100% 26.67 tps
gemma2:9b 9.4GB 8.5 100% 38.08 tps
gemma2:2b 3.6GB 3 100% 92.48 tps
GPU x2 (16+16=32GB): GeForce RTX 4060 Ti x2
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 49GB 14.4, 14 17% 83% 2.76 tps 3950X
qwen2.5:32b 25GB 11.2, 11.2 100% 12.46 tps
gemma2:27b 23GB 9.7, 9.7 100% 26.75 tps
GPU x4 (16+8+8+8=40GB): GeForce RTX 4060Ti 16GB + GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 51GB 13.5, 5.3, 5.4, 6.2 21% 79% 2.22 tps 3950X
GPU x3 (16+16+8=40GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 49GB 14.6, 6.2, 14 17% 83% 2.76 tps 3950X
GPU x4 (16+16+8+8=48GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super + GeForce GTX 1070
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 51GB 14.6, 6, 13.6, 7 5% 95% 3.94 tps 3950X
GPU x5 (16+16+8+8+8=56GB): GeForce RTX 4060Ti 16GB x2 + GeForce RTX 2070 Super + GeForce GTX 1080 + GeForce GTX 1070
Model Memory VRAM CPU GPU token/s CPU
llama3.3:70b 55GB 13.6, 7.7, 14.2, 7, 7 100% 4.09 tps
qwen2.5:32b 30GB 5, 5.4, 5, 5, 5 100% 10.15 tps

使用した PC の Spec

PC

CPU Ryzen 9 3950X Ryzen 7 9700X (105W mode)
RAM DDR4-3200 96GB DDR5-5600 96GB
MOTHER TUF GAMING X570-Plus TUF GAMING B650M-E
GPU1 PCIe 4.0 x16 PCIe 4.0 x16
GPU2 PCIe 4.0 x4 PCIe 4.0 x4 M.2 DEG1 OCulink
GPU3 PCIe 4.0 x4 M.2 DEG1 OCulink PCIe 4.0 x4 M.2 DEG1 OCulink
GPU4 PCIe 4.0 x4 M.2 DEG1 OCulink
GPU5 PCIe 3.0 x1
GPU6 PCIe 3.0 x1
OS Ubuntu 22.04/24.04 Ubuntu 22.04

GPU

GPU VRAM clock Mem B/W PCIe SM/CU sp Shader fp32 TensorCore TensorCore int8
GeForce RTX 4060 Ti Ada Lovelace 16GB 2540 MHz 288 GB/s PCIe 4.0 x8 32 4352 sp 22108.16 GFLOPS 136 176.87 TOPS
GeForce RTX 2080 Ti Turing 11GB 1545 MHz 616 GB/s PCIe 3.0 x16 68 4352 sp 13447.68 GFLOPS 544 215.16 TOPS
GeForce RTX 2070 Super Turing 8GB 1770 MHz 448 GB/s PCIe 3.0 x16 40 2560 sp 9062.40 GFLOPS 320 145.00 TOPS
GeForce GTX 1080 Pascal 8GB 1733 MHz 256 GB/s PCIe 3.0 x16 20 2560 sp 8872.96 GFLOPS
GeForce GTX 1070 Pascal 8GB 1683 MHz 256 GB/s PCIe 3.0 x16 15 1920 sp 6462.72 GFLOPS
RADEON RX 7600 RDNA3 navi33 gfx1102 8GB 2655 MHz 288 GB/s PCIe 4.0 x8 32 2048 sp 21749.76 GFLOPS
RADEON RX 6400 RDNA2 navi24 gfx1034 4GB 2321 MHz 128 GB/s PCIe 4.0 x4 12 768 sp 3565.06 GFLOPS
RADEON 610M RDNA2 gfx1037 8GB 1900 MHz 90 GB/s 4 128 sp 486.40 GFLOPS
RADEON RX Vega 64 GCN5 vega10 gfx900 8GB 1546 MHz 484 GB/s PCIe 3.0 x16 64 4096 sp 12664.83 GFLOPS
RADEON RX Vega 56 GCN5 vega10 gfx900 8GB 1471 MHz 410 GB/s PCIe 3.0 x16 56 3584 sp 10544.13 GFLOPS
ai/ollama.txt · 最終更新: 2025/02/19 15:17 by oga

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki