@CodeMercenary Did some tests with the A2000 and as expected, the 12GB VRAM is the biggest limitation. Used vanilla installations of Ollama and ComfyUI with no tweaking or optimization. Especially in stable diffusion, the A2000 is about three times faster compared to the P40, but that is to be expected. I have added some results below.
Stable Diffusion tests
A2000
1024x1024, batch 1, iterations 30, cfg 4.0, euler 1.4s/it
1024x1024, batch 4, iterations 30, cfg 4.0, euler 2.5s/it = 0.6s/it
P40
1024x1024, batch 1, iterations 30, cfg 4.0, euler 2.8s/it
1024x1024, batch 4, iterations 30, cfg 4.0, euler 12.1s/it = 3s/it
Inference tests
A2000
qwen2.5:14b 21 token/sec
qwen2.5-coder:14b 21 token/sec
llama3.2:3b-Q8 50 token/sec
llama3.2:3b-Q4 60 token/sec
P40
qwen2.5:14b 17 token/sec
qwen2.5-coder:14b 17 token/sec
llama3.2:3b-Q8 40 token/sec
llama3.2:3b-Q4 48 token/sec
During heavy testing, the A2000 reached 70°C and the P40 reached 60°C both with the Dell R720 set to automatic fan control.