The Modern Digital World
This is a work-in-progress.
This is an attempt for me to go over all of the players in what is being defined as the modern digital world [in one stack]. Everything I care about fits into this system:
- Energy & Physics
- Semiconductors
- Compute Hardware (CPU / GPU)
- Memory & Interconnect
- Servers
- Data Centers
- Networking & Internet
- Cloud & Virtualization
- Distributed Systems
- ML Frameworks
- Models
- AI Applications
Energy
Everything starts with energy. Fundamentally, computation is turning electricity into heat in a controlled way. Importantly, for every extra unit of intelligence it costs more energy to be burned. As a case in point, if a GPU is rated at 400W, then it consumes 400 joules of energy per second, all of which becomes heat. Technically, GPUs use energy to move and flip charge – heat is the unavoidable byproduct (Landauer's principle).
Transistors
A transistor is a microscopic on/off switch. It's controlled by voltage and etched into silicon. Modern silicon chips have billions of transistors, thus switching billions of times per second. This allows logic (AND, OR, NOT), memory, and arithmetic operations. We use silicon because it's cheap, abundant, has predictable electrical behavior, and is manufacturable at nanometer scale. Historically, Moore's Law explained that transistors doubled every ~2 years, enabling exponential compute growth. That rate is now slowing, which is prompting AI hardware to change first. Parallelism (GPUs), specialization (TPUs, ASICs), and scale (clusters) all help compensate for Moore's Law slowing. This means that performance gains will increasingly come from architectural choices…not smaller transistors.
CPU vs. GPU
A central processing unit (CPU) is designed for control flow, branching, and sequential logic. It has few cores (in the ballpark of 8–64), high intelligence per core, large caches, and high clock speeds. We use CPUs for operating systems, databases, web servers, and orchestration. Graphics processing units (GPUs) are designed for massively parallel math. They have thousands of simple cores, are optimized for matrix operations (e.g., matrix multiplication), and have very high throughput but weaker control flow. We use GPUs for graphics, ML, scientific computing, and [historically] crypto. An easy mental model is where a CPU is the orchestra conductor, the GPU is the massive factory floor.
Memory
Compute is useless without memory. From fast to slow, the memory hierarchy is something along the lines of registers (inside CPU/GPU), cache (L1, L2, L3), RAM, NVMe SSD, and disk/object storage. The key constraint is that moving data costs more energy than computing on it – it's bandwidth and latency that dominate performance. For modern AI workloads, memory bandwidth is often the dominant bottleneck, not raw FLOPS. GPUs specially require high bandwidth memory (HBM) because AI models are huge and parameters must be accessed constantly. Essentially, HBM is vertically stacked memory placed extremely close to the GPU die, providing massive bandwidth at lower energy per bit moved.
Interconnects
Modern AI is like hundreds of thousands of GPUs acting like one machine. To connect components within a server, we use PCIe or NVLink (from Nvidia). To connect across servers, some options are ethernet or InfiniBand (AI clusters). Notably, networking must be fast and reliable for training to thrive.
Servers
A server is made up of CPU, RAM, GPUs (optionally), storage, network cards, and a power supply. AI servers are exceptionally power-dense, usually have 4–8 GPUs per node (nodes are packed into racks), and have power levels up to 10kW. Power density compounds at the rack level.
Data Centers
A data center is an industrial power and cooling facility. A center layout follows servers, racks, rows, pods, and facilities. Racks draw 20–100kW. Power moves from the grid, to the substation, transformers, UPS, PDUs, and finally to the server. As energy becomes heat, cooling is an essential part of data centers – air cooling has been the legacy method, but liquid cooling is generally considered the method for an AI future (note: immersion cooling is also being explored). We use power usage effectiveness (PUE = total facility power / IT power. Old DCs were at 1.5+, hyperscalers are at 1.1–1.2, and now we're pushing towards 1. In general, these data centers have power and cooling bottlenecks, not compute.
AI Breaks the Data Center Model
AI workloads run 24/7, max out GPUs constantly, and create extreme heat density. Consequently, there are pushes towards new data center designs. This also suggests that proximity to cheap power matters, long-term power contracts become strategic assets, and permitting/grid access is some kind of a moat. This is why we see Microsoft partnering on nuclear, AWS building custom substations, and OAI caring so much about infrastructure. This is also why so many firms have growing involvement in the digital infrastructure space, e.g., Blackstone's huge Infrastructure PE practice or Goldman's TMT restructuring to explicitly address digital infrastructure.
Cloud, Virtualization, and Containers
The cloud is a set of software abstractions that lets us rent compute, storage, and networking without managing physical hardware. Back when I resold sneakers, I would pay for an AWS EC2 instance on my home iMac to support the compute needed for running sneaker bots.
Virtual machines (VMs) rely on one physical server. Each VM thinks that it has its own CPU, memory, disk, etc. VMs are monitored/created by a hypervisor, which ensures resources are accordingly allocated to VMs.