Part 2: Hardware Selection and Component Overview

Part 2 Illustration

Building a powerful and stable AI server starts with selecting the right components. Each part plays a critical role in the overall performance and reliability of the system, especially when dealing with the unique demands of GPU-accelerated virtualization. This section provides a detailed breakdown of the recommended hardware for this build.

Core Components

The following table outlines the key components for your home AI server. The selections are based on a balance of performance, compatibility for virtualization and GPU passthrough, and value.

Component Recommendation Key Considerations
GPU 1 (Training) NVIDIA Tesla P100 (16GB PCIe) Excellent FP16/FP32 performance, HBM2 memory for high bandwidth. Ideal for model training.
GPU 2 (Inference) NVIDIA Tesla P40 (24GB PCIe)Massive 24GB VRAM, strong INT8 performance. Perfect for running large models.
CPU Intel Xeon E5-2600 v3/v4 or AMD EPYC 7001/7002 SeriesHigh core count, ample PCIe lanes, and robust IOMMU support are critical for virtualization.
Motherboard Supermicro, ASRock Rack, or Tyan Server MotherboardMust have sufficient PCIe x16 slots for dual GPUs, and strong VT-d/AMD-Vi support.
Memory (RAM)128GB DDR4 ECC RegisteredECC (Error-Correcting Code) memory is crucial for server stability, especially during long training runs.
Storage (OS)2x 512GB NVMe SSD (RAID 1)A mirrored pair for the Proxmox host OS ensures redundancy and fast boot/response times.
Storage (VMs)4x 2.5TB SATA SSDs (RAID 10)A RAID 10 array of SSDs will provide the 10TB of fast, redundant storage for the Ubuntu VM.
Power Supply (PSU)1200W+ 80+ Gold/PlatinumA high-quality, high-wattage PSU is non-negotiable to handle the power draw of two 250W GPUs plus the rest of the system.
Chassis 4U Rackmount or Large TowerMust have excellent airflow to cool the passively-cooled Tesla GPUs as well as space for your GPUs (see below)

Understanding the GPU Choices

The selection of the NVIDIA Tesla P100 and P40 is the cornerstone of this build. By no means will these be lightening fast, but at the time of my research for the build, both were fairly inexpensive and sourced from eBay. I did also nab a NVIDIA K80, but had no room in the chassis for a 3rd GPU, so beware and plan your case needs accordingly. As detailed in their respective datasheets, the P100’s architecture is optimized for raw computational throughput, making it a beast for training models from scratch [3]. Its 16GB of HBM2 memory delivers a staggering 732 GB/s of bandwidth, feeding the 3584 CUDA cores with data at an incredible rate.

Conversely, the Tesla P40 is an inference champion. While its single-precision performance is lower than the P100’s, its 24GB of VRAM allows it to load and run enormous models that would simply not fit on other cards [1]. Its INT8 capabilities provide a significant speedup for inference tasks where slightly lower precision is acceptable.

CPU and Motherboard Synergy

For a virtualization server with GPU passthrough, the CPU and motherboard are just as important as the GPUs. You need a platform that provides a high number of PCIe lanes to run both GPUs at their full x16 bandwidth, and robust IOMMU (Input-Output Memory Management Unit) support. The IOMMU is the hardware feature that allows the host (Proxmox) to pass direct control of a physical device, like a GPU, to a virtual machine. Intel’s implementation is called VT-d, and AMD’s is AMD-Vi. Server-grade platforms from Intel (Xeon) and AMD (EPYC) are designed for these scenarios and are highly recommended over consumer-grade hardware.

In the next part, we will delve deeper into the critical topics of power and cooling, which are paramount when running two high-power, passively cooled GPUs. This lead to down a rabbit hole of designing fan cooling shrouds for the passively cooled GPUs and 3D printing them.