The price of compute refers to the cost of performing computational operations, most relevant today in terms of FLOPS (floating-point operations per second) per dollar for AI workloads, GPU/cloud rental rates, or total cost of ownership (TCO) including hardware, power, and infrastructure. It has declined dramatically over decades due to technological progress but faces new constraints from AI-driven demand, chip supply, energy, and infrastructure limits. Historical Trends: Exponential Declines Followed by SpecializationCompute costs have followed a long-term downward trajectory, often linked to (but distinct from) Moore's Law, which observed transistor counts doubling roughly every two years with minimal cost increase. For GPUs and AI-relevant performance:
These gains came from process shrinks, architectural improvements (e.g., tensor cores for AI), and economies of scale. However, general-purpose compute improvements have slowed in recent years as the industry shifted to specialized AI accelerators. Training costs for large ML models have still risen overall because compute demand has grown faster than efficiency gains (e.g., ~0.5 orders of magnitude per year in dollar training costs despite hardware improvements).
Cloud/GPU rental perspective: Prices have varied with supply. As of mid-2026, H100 GPUs (a current AI workhorse) rent from ~$0.47–$14.90/hr per GPU across providers, with competitive rates around $1.38–$2.50/hr on specialized platforms (vs. much higher on major hyperscalers like AWS/Azure). A100s are cheaper (e.g., $0.60–$1+/hr). Newer Blackwell B200s start higher, often $2.99–$6+/hr, but are expected to decline with supply ramps. What Determines the Price of Compute?Several interlocking factors drive costs:
In summary, compute has never been cheaper in absolute historical terms, enabling the AI boom, but marginal costs for frontier capabilities are rising due to scale and bottlenecks. The "price of compute" is increasingly a story of energy, infrastructure, and supply chain mastery as much as silicon innovation.
Expect volatility in the near term and massive investment shaping the 2030 landscape. The winners will optimize not just for FLOPS, but for FLOPS per dollar per watt in real-world deployments.
- Early eras: In 1961, costs were around $18 billion per GFLOPS (inflation-adjusted extremes in the trillions). By the 2000s–2010s, GPU price-performance improved rapidly.
- 2006–2021 data: FLOP/s per dollar for GPUs roughly doubled every ~2.5 years overall. ML-specific GPUs improved faster (~every 2.07 years), while top GPUs were slower (~2.95 years). This is close to but somewhat slower than classic Moore's Law doubling.
- Recent GPU trends: Price per FP32 FLOP has trended downward, with 2025 prices around 26% of 2019 levels in some analyses (a ~74% drop). Single-precision FLOPS costs fell ~17% per year in earlier periods.
Cloud/GPU rental perspective: Prices have varied with supply. As of mid-2026, H100 GPUs (a current AI workhorse) rent from ~$0.47–$14.90/hr per GPU across providers, with competitive rates around $1.38–$2.50/hr on specialized platforms (vs. much higher on major hyperscalers like AWS/Azure). A100s are cheaper (e.g., $0.60–$1+/hr). Newer Blackwell B200s start higher, often $2.99–$6+/hr, but are expected to decline with supply ramps. What Determines the Price of Compute?Several interlocking factors drive costs:
- Chip Availability and Pricing: NVIDIA dominates AI GPUs. High demand for H100/H200/B200 creates shortages and premiums. Production ramps (e.g., TSMC capacity), export restrictions, and yields affect supply. New generations (Blackwell) offer better performance/watt but command initial premiums ($30k–$50k+ per GPU purchase estimates). Custom ASICs (e.g., from Google, Amazon) can lower long-term costs for specific workloads.
- Electricity Availability and Price: AI data centers are power-hungry. GPUs consume significant energy; TCO includes power (often 30–50%+ of costs). Global data center electricity use is projected to double to ~945 TWh by 2030 (AI as the main driver), rivaling major countries. Wholesale prices have spiked in data center hubs (up to 267% in some U.S. areas). Constraints on grid capacity, permitting, and renewables/gas/nuclear buildout raise effective costs. Efficient cooling and location (cheap power) matter hugely.
- Infrastructure and "Database Construction" (Data Centers): Building facilities is capital-intensive. McKinsey projects ~$5.2T for AI data centers by 2030 (part of $6.7T–$7T total), covering land, power generation/transmission (~150–200 GW needed), cooling, networking, etc. Delays from power queues and construction push timelines and costs up. Utilization rates, networking (e.g., NVLink), and software efficiency also factor in.
- 2026: Continued tightness for leading GPUs (H100/B200). Rental prices may fluctuate with supply (recent surges noted in some segments, but spot/neocloud often cheaper). Blackwell ramp-up should ease some pressure later in the year, potentially lowering effective $/FLOP. Hyperscaler on-demand remains premium; specialized providers offer discounts.
- Electricity and power constraints will keep TCO elevated in hot regions. Capex boom: AI data center spending ~$400B+ in 2026.
- Efficiency gains (quantization, better algorithms, inference optimization) could moderate effective costs even if raw hardware stays pricey.
- Continued but uneven declines in raw compute price: GPU price-performance should improve with new nodes (e.g., 2nm+), 3D stacking, and specialized chips, though physical limits (e.g., power density) slow Moore-like gains. Expect FLOP/s/$ to keep improving, potentially at 1.5–2.5 year doublings depending on segment.
- TCO pressures: Energy and infrastructure could dominate. Projections show data center power demand surging; without massive generation buildout (renewables, gas, nuclear/SMRs), costs rise. McKinsey/Goldman-style forecasts imply trillions in cumulative spend, with annual AI capex potentially reaching $1T+ levels.
- Scenarios: High-efficiency case (better models/hardware) could cut demand 20%+. Supply bottlenecks or slower AI adoption moderate it. Inference (growing share) may be more cost-sensitive than training.
- Financial projections (rough, illustrative): Assume effective AI GPU effective cost (hardware + power + infra amortized) declines 20–40% per generation but demand grows 4–5x/year in leading models. Training a frontier model today might cost tens–hundreds of millions; by 2030, on trends, 1000x more compute could mean billions unless efficiencies offset. Cloud rental for H100-class might trend toward $1/hr or below in competitive markets, but premium clusters stay higher. Overall market for AI chips could reach hundreds of billions annually.
In summary, compute has never been cheaper in absolute historical terms, enabling the AI boom, but marginal costs for frontier capabilities are rising due to scale and bottlenecks. The "price of compute" is increasingly a story of energy, infrastructure, and supply chain mastery as much as silicon innovation.
Expect volatility in the near term and massive investment shaping the 2030 landscape. The winners will optimize not just for FLOPS, but for FLOPS per dollar per watt in real-world deployments.
The Price of Compute: Detailed Financial Projections (2025–2035)
The price of compute is tracked through several metrics: raw hardware price-performance (e.g., FLOPS per dollar), cloud/GPU rental rates (effective cost per hour or per FLOP), and total cost of ownership (TCO) including power, infrastructure, and amortization. Projections are illustrative, based on historical trends (FLOPS/$ roughly doubling every 2–3 years), current 2025–2026 data, and industry forecasts from McKinsey, Goldman Sachs, Epoch AI, IEA, and others.
Key Assumptions:
These projections are not financial advice and carry high uncertainty due to rapid technological and market changes. Actual outcomes depend on innovation pace, policy, and economic returns from AI. For custom modeling (e.g., specific workload TCO), more granular inputs would refine these further.
The price of compute is tracked through several metrics: raw hardware price-performance (e.g., FLOPS per dollar), cloud/GPU rental rates (effective cost per hour or per FLOP), and total cost of ownership (TCO) including power, infrastructure, and amortization. Projections are illustrative, based on historical trends (FLOPS/$ roughly doubling every 2–3 years), current 2025–2026 data, and industry forecasts from McKinsey, Goldman Sachs, Epoch AI, IEA, and others.
Key Assumptions:
- Hardware efficiency continues with generational improvements (e.g., Blackwell → Rubin), but at a moderated pace due to physical limits.
- Demand growth outpaces some efficiency gains, pressuring TCO.
- Electricity prices rise 5–10% annually in high-demand regions; PUE improves modestly.
- Capex scales with GW buildout; ~$5.2T AI-specific cumulative by 2030 (McKinsey baseline).
- Volatility from supply ramps, geopolitics, and utilization rates (assume 60–80% for projections).
- Inflation-adjusted where possible; ranges reflect scenarios (base/moderate growth vs. accelerated/constrained).
- 2025: H100 purchase ~$25k–$40k; rental $2–$4+/hr (spot lower ~$1–2.5).
- 2026: B200 purchase ~$30k–$50k+; rental $4–$6+/hr initially, trending down with ramp.
- Longer term: Effective $/FLOP declines 20–40% per major generation.
- Base Case: Moderate demand growth; efficiencies (e.g., better models, liquid cooling, custom chips) offset some costs. Compute price continues long-term decline in $/FLOP, but TCO rises due to energy/infra.
- Accelerated: Higher AI adoption → $7.9T+ capex by 2030; tighter supply keeps rentals elevated longer.
- Constrained: Slower ROI or bottlenecks → $3.7T capex; faster price drops from oversupply.
- Uncertainties: Electricity prices/grid delays (+costs), chip geopolitics, breakthroughs (e.g., optical/neuromorphic computing could accelerate declines), utilization rates, and AI monetization (must justify trillions in spend).
These projections are not financial advice and carry high uncertainty due to rapid technological and market changes. Actual outcomes depend on innovation pace, policy, and economic returns from AI. For custom modeling (e.g., specific workload TCO), more granular inputs would refine these further.
AI Energy Consumption Trends: Rapid Growth Driven by Data Centers and Workloads
AI energy consumption is surging, primarily through data centers powering training and especially inference. While AI currently represents a modest share of global electricity, its growth rate far outpaces overall demand, raising infrastructure, grid, and sustainability challenges. Current Levels (2024–2026)
Regional concentration:
Key drivers:
Frontier training: Power for largest models growing >2x per year; could reach multi-GW scale by 2030 for single runs. Scenario Ranges and Uncertainties
Mitigation factors:
Risks: Delayed projects, higher electricity prices, water use for cooling, and emissions if fossil-heavy.
Opportunities: AI could optimize energy systems elsewhere (e.g., grid management), potentially offsetting some impacts.
In summary, AI energy trends show a clear break from past efficiency-dominated flat demand—now entering a high-growth phase led by inference at scale. The trajectory through 2030 is roughly a doubling of data center power needs, with AI as the primary accelerator. Monitoring real-world utilization, efficiency breakthroughs, and power buildout will be key to refining these trends.
AI energy consumption is surging, primarily through data centers powering training and especially inference. While AI currently represents a modest share of global electricity, its growth rate far outpaces overall demand, raising infrastructure, grid, and sustainability challenges. Current Levels (2024–2026)
- Global data centers: Approximately 415 TWh in 2024, accounting for ~1.5% of global electricity consumption. This has grown at ~12% CAGR since 2017, more than four times faster than total global electricity demand.
- 2025 updates: Data center electricity demand grew ~17% in 2025, with AI-focused facilities surging ~50%. Global estimates for 2025 hover around 460–490 TWh.
- AI-specific share: AI/accelerated servers (mainly GPUs) drive a growing portion. Estimates for AI-related consumption in 2024 range from ~53–85 TWh globally (roughly 13–20% of data center use), with projections showing it rising quickly. Non-AI data centers still consume more overall, but AI accounts for a large share of net growth.
- A single advanced generative AI query (e.g., ChatGPT-like) uses ~0.24–2.9 Wh, vs. ~0.3 Wh for a traditional search (though efficiency has improved).
- Training frontier models is energy-intensive (gigawatt-hours for large runs), but inference dominates long-term use (often estimated at 80–90% of total AI compute energy).
- US: ~45% of global data center electricity in 2024; demand projected to rise sharply (e.g., from ~176 TWh in 2023 to 325–580 TWh by 2028, or 6.7–12% of US electricity).
- China and Europe follow as major players.
- Accelerated servers (AI): Electricity use growing ~30% annually (IEA Base Case), vs. ~9% for conventional servers. AI/accelerated servers could account for nearly half the net increase in global data center consumption through 2030.
- Power density: Traditional data centers: 10–30 kW/rack. AI-optimized: 50–150+ kW/rack. Hyperscale AI campuses can reach hundreds of MW to GW scale.
- Historical efficiency gains previously kept demand flat despite workload growth, but AI scale (more models, users, multimodal) has reversed this. GPU/accelerator efficiency improves (~26–40% per year in some metrics), but demand outpaces it.
- Explosive adoption of generative AI (training + massive inference scaling).
- Larger models, reasoning chains, and multimodal capabilities.
- Hyperscaler capex boom (hundreds of billions annually).
- 2030: ~945 TWh (roughly double 2024 levels), ~3% of global electricity.
- 2035: ~1,200 TWh in base scenarios.
- Other estimates: 980 TWh (Gartner), ~1,000–1,065 TWh (various). AI-optimized servers could reach ~44% of data center power by 2030.
- Significant growth to 8–12%+ of national electricity in some forecasts by 2030; power demand from AI data centers potentially 30x higher by 2035 in aggressive scenarios.
- Base/moderate: Doubling by 2030, with efficiencies (better chips, quantization, liquid cooling, custom ASICs) and utilization helping moderate growth.
- High/accelerated: Faster adoption could push higher (e.g., data centers rivaling major countries' total use).
- Constraints: Grid bottlenecks, power generation delays, permitting, and chip supply could temper near-term growth. Regional concentration (e.g., Virginia, Ireland, Frankfurt) creates local strains (up to 20–80% of grid in hotspots).
- Hardware efficiency gains.
- Software optimizations (model compression, sparse inference).
- Renewables + on-site generation (hyperscalers signing PPAs).
- Location strategies (cheaper power regions) and advanced cooling.
Risks: Delayed projects, higher electricity prices, water use for cooling, and emissions if fossil-heavy.
Opportunities: AI could optimize energy systems elsewhere (e.g., grid management), potentially offsetting some impacts.
In summary, AI energy trends show a clear break from past efficiency-dominated flat demand—now entering a high-growth phase led by inference at scale. The trajectory through 2030 is roughly a doubling of data center power needs, with AI as the primary accelerator. Monitoring real-world utilization, efficiency breakthroughs, and power buildout will be key to refining these trends.
AI Cooling Technologies: From Air Limits to Liquid Dominance and Beyond
AI workloads, particularly training and inference on dense GPU/accelerator clusters, generate extreme heat densities (racks often 50–150+ kW, with next-gen chips exceeding 1,000W TDP). Traditional air cooling is hitting physical limits, driving a rapid shift to advanced liquid-based and hybrid solutions. This transition is critical for sustaining AI growth while managing energy consumption, water use, and Power Usage Effectiveness (PUE). Traditional Air Cooling: Still Dominant but Insufficient for AI
Key Variants:
Efficiency Gains:
In summary, AI is forcing a cooling revolution. Liquid and hybrid technologies are no longer optional for frontier workloads—they are foundational to scaling compute sustainably. Continued innovation in AI-optimized controls, materials, and integration with power sources (e.g., on-site nuclear/SMRs) will determine how effectively the industry manages the heat of intelligence. Expect rapid evolution through 2030 as densities climb further.
AI workloads, particularly training and inference on dense GPU/accelerator clusters, generate extreme heat densities (racks often 50–150+ kW, with next-gen chips exceeding 1,000W TDP). Traditional air cooling is hitting physical limits, driving a rapid shift to advanced liquid-based and hybrid solutions. This transition is critical for sustaining AI growth while managing energy consumption, water use, and Power Usage Effectiveness (PUE). Traditional Air Cooling: Still Dominant but Insufficient for AI
- How it works: CRAC/CRAH units, fans, hot/cold aisle containment, and raised floors circulate chilled air over servers.
- Pros: Mature, low upfront cost, easy maintenance, suitable for lower-density IT.
- Cons for AI: Poor thermal conductivity of air limits heat dissipation. High fan power consumption (up to 40% of data center energy). Hotspots, noise, and scaling issues at high densities. Cannot efficiently handle modern AI racks without massive airflow (effectively requiring "wind tunnel" conditions).
- Efficiency: Contributes to average PUE ~1.5–1.6 globally. Improvements via containment and free cooling help modestly.
Key Variants:
- Rear-Door Heat Exchangers (RDHx): Passive or active doors on racks that capture heat from exhaust air into liquid loops. Good retrofit option.
- Moderate density support; lower disruption.
- Direct-to-Chip (DTC) / Direct Liquid Cooling (DLC) / Cold Plates: Liquid flows through plates directly attached to CPUs, GPUs, and high-heat components.
- Advantages: Excellent for high-density AI (handles 100–600+ kW/rack). Reduces cooling energy by 30–50%+ vs. air. Scalable with hybrids. Many hyperscalers deploying (e.g., Microsoft on Azure).
- Challenges: Requires plumbing, leak detection, and server modifications. Higher upfront complexity.
- Immersion Cooling:
- Single-phase: Servers submerged in dielectric fluid; fluid circulates and is cooled externally.
- Two-phase: Fluid boils on hot components for even higher efficiency.
- Advantages: Superior heat transfer, high rack densities, PUE as low as 1.03–1.1, minimal/no fans, reduced water use with closed loops. Quiet operation.
- Challenges: Fluid cost and compatibility, maintenance (draining for repairs), facility redesign for greenfield sites. Growing fast (highest projected CAGR in some segments).
Efficiency Gains:
- Liquid cooling can cut cooling energy by up to 90% in optimized setups and lower overall facility power significantly.
- PUE improvements: From ~1.5+ (air-heavy) to 1.1–1.3 common with liquid; extremes near 1.05 with advanced optimization.
- Water Usage Effectiveness (WUE) benefits from reduced evaporative cooling needs, though closed-loop systems vary.
- AI-Driven Cooling: Machine learning for predictive, dynamic control (e.g., Google's DeepMind reduced cooling energy by ~40%). Real-time sensor data optimizes pumps, fans, and setpoints. Quantum/federated learning models explored for further gains.
- Phase-Change Materials (PCMs), Spray/Jet Impingement, Heat Pipes, and Thermoelectrics: For targeted or novel applications.
- Membrane-Based and Low-Water Systems: Reduce water footprint, important in water-stressed areas.
- Nuclear-Inspired (e.g., subcooled boiling adaptations): Waterless or ultra-efficient options borrowing from reactor tech.
- Waste Heat Reuse: Pair with district heating, absorption chillers, or industrial processes. Increasingly a permitting requirement in some regions.
- Exotic Concepts: Orbital/radiative cooling proposals (e.g., space-based data centers using vacuum radiation), though highly speculative.
- Liquid cooling adoption surging: 19–22% current usage in surveys, with 20–50%+ planned in coming years. Market growing at 16–24%+ CAGR, driven by AI.
- Hyperscalers (Google, Microsoft, etc.) leading deployments; colocation providers offering high-density AI suites with advanced cooling.
- 2026 focus: Heat recovery integration, hybrid systems, and balancing capex with efficiency/ROI.
- Air: Cheaper initially, simpler → but higher long-term energy costs and density limits.
- Liquid: Higher efficiency, density support, lower TCO for AI → but complexity, fluid management, and retrofit costs.
In summary, AI is forcing a cooling revolution. Liquid and hybrid technologies are no longer optional for frontier workloads—they are foundational to scaling compute sustainably. Continued innovation in AI-optimized controls, materials, and integration with power sources (e.g., on-site nuclear/SMRs) will determine how effectively the industry manages the heat of intelligence. Expect rapid evolution through 2030 as densities climb further.
Check DM. Team up with me.
— Paramendra Kumar Bhagat (@paramendra) June 12, 2026
No comments:
Post a Comment