Friday, June 12, 2026

The Price Of Compute

 

The price of compute refers to the cost of performing computational operations, most relevant today in terms of FLOPS (floating-point operations per second) per dollar for AI workloads, GPU/cloud rental rates, or total cost of ownership (TCO) including hardware, power, and infrastructure. It has declined dramatically over decades due to technological progress but faces new constraints from AI-driven demand, chip supply, energy, and infrastructure limits. Historical Trends: Exponential Declines Followed by SpecializationCompute costs have followed a long-term downward trajectory, often linked to (but distinct from) Moore's Law, which observed transistor counts doubling roughly every two years with minimal cost increase. For GPUs and AI-relevant performance:
  • Early eras: In 1961, costs were around $18 billion per GFLOPS (inflation-adjusted extremes in the trillions). By the 2000s–2010s, GPU price-performance improved rapidly.
  • 2006–2021 data: FLOP/s per dollar for GPUs roughly doubled every ~2.5 years overall. ML-specific GPUs improved faster (~every 2.07 years), while top GPUs were slower (~2.95 years). This is close to but somewhat slower than classic Moore's Law doubling.
  • Recent GPU trends: Price per FP32 FLOP has trended downward, with 2025 prices around 26% of 2019 levels in some analyses (a ~74% drop). Single-precision FLOPS costs fell ~17% per year in earlier periods.

These gains came from process shrinks, architectural improvements (e.g., tensor cores for AI), and economies of scale. However, general-purpose compute improvements have slowed in recent years as the industry shifted to specialized AI accelerators. Training costs for large ML models have still risen overall because compute demand has grown faster than efficiency gains (e.g., ~0.5 orders of magnitude per year in dollar training costs despite hardware improvements).
Cloud/GPU rental perspective: Prices have varied with supply. As of mid-2026, H100 GPUs (a current AI workhorse) rent from ~$0.47–$14.90/hr per GPU across providers, with competitive rates around $1.38–$2.50/hr on specialized platforms (vs. much higher on major hyperscalers like AWS/Azure). A100s are cheaper (e.g., $0.60–$1+/hr). Newer Blackwell B200s start higher, often $2.99–$6+/hr, but are expected to decline with supply ramps. What Determines the Price of Compute?Several interlocking factors drive costs:
  1. Chip Availability and Pricing: NVIDIA dominates AI GPUs. High demand for H100/H200/B200 creates shortages and premiums. Production ramps (e.g., TSMC capacity), export restrictions, and yields affect supply. New generations (Blackwell) offer better performance/watt but command initial premiums ($30k–$50k+ per GPU purchase estimates). Custom ASICs (e.g., from Google, Amazon) can lower long-term costs for specific workloads.
  2. Electricity Availability and Price: AI data centers are power-hungry. GPUs consume significant energy; TCO includes power (often 30–50%+ of costs). Global data center electricity use is projected to double to ~945 TWh by 2030 (AI as the main driver), rivaling major countries. Wholesale prices have spiked in data center hubs (up to 267% in some U.S. areas). Constraints on grid capacity, permitting, and renewables/gas/nuclear buildout raise effective costs. Efficient cooling and location (cheap power) matter hugely.
  3. Infrastructure and "Database Construction" (Data Centers): Building facilities is capital-intensive. McKinsey projects ~$5.2T for AI data centers by 2030 (part of $6.7T–$7T total), covering land, power generation/transmission (~150–200 GW needed), cooling, networking, etc. Delays from power queues and construction push timelines and costs up. Utilization rates, networking (e.g., NVLink), and software efficiency also factor in.
Other drivers: Memory (HBM), interconnects, talent, and regulatory/environmental factors. Jevons Paradox applies—efficiency gains spur more demand.Near-Term Outlook (Next Few Months to 1–2 Years)
  • 2026: Continued tightness for leading GPUs (H100/B200). Rental prices may fluctuate with supply (recent surges noted in some segments, but spot/neocloud often cheaper). Blackwell ramp-up should ease some pressure later in the year, potentially lowering effective $/FLOP. Hyperscaler on-demand remains premium; specialized providers offer discounts.
  • Electricity and power constraints will keep TCO elevated in hot regions. Capex boom: AI data center spending ~$400B+ in 2026.
  • Efficiency gains (quantization, better algorithms, inference optimization) could moderate effective costs even if raw hardware stays pricey.
Longer-Term Projections (3–5+ Years)
  • Continued but uneven declines in raw compute price: GPU price-performance should improve with new nodes (e.g., 2nm+), 3D stacking, and specialized chips, though physical limits (e.g., power density) slow Moore-like gains. Expect FLOP/s/$ to keep improving, potentially at 1.5–2.5 year doublings depending on segment.
  • TCO pressures: Energy and infrastructure could dominate. Projections show data center power demand surging; without massive generation buildout (renewables, gas, nuclear/SMRs), costs rise. McKinsey/Goldman-style forecasts imply trillions in cumulative spend, with annual AI capex potentially reaching $1T+ levels.
  • Scenarios: High-efficiency case (better models/hardware) could cut demand 20%+. Supply bottlenecks or slower AI adoption moderate it. Inference (growing share) may be more cost-sensitive than training.
  • Financial projections (rough, illustrative): Assume effective AI GPU effective cost (hardware + power + infra amortized) declines 20–40% per generation but demand grows 4–5x/year in leading models. Training a frontier model today might cost tens–hundreds of millions; by 2030, on trends, 1000x more compute could mean billions unless efficiencies offset. Cloud rental for H100-class might trend toward $1/hr or below in competitive markets, but premium clusters stay higher. Overall market for AI chips could reach hundreds of billions annually.
Uncertainties: Geopolitics (chip supply chains), breakthroughs in neuromorphic/optical computing, or regulatory limits on energy use could shift trajectories. AI revenue must justify the spend—productivity gains and new applications are key to sustaining investment.
In summary, compute has never been cheaper in absolute historical terms, enabling the AI boom, but marginal costs for frontier capabilities are rising due to scale and bottlenecks. The "price of compute" is increasingly a story of energy, infrastructure, and supply chain mastery as much as silicon innovation.
Expect volatility in the near term and massive investment shaping the 2030 landscape. The winners will optimize not just for FLOPS, but for FLOPS per dollar per watt in real-world deployments.



The Price of Compute: Detailed Financial Projections (2025–2035)
The price of compute is tracked through several metrics: raw hardware price-performance (e.g., FLOPS per dollar), cloud/GPU rental rates (effective cost per hour or per FLOP), and total cost of ownership (TCO) including power, infrastructure, and amortization. Projections are illustrative, based on historical trends (FLOPS/$ roughly doubling every 2–3 years), current 2025–2026 data, and industry forecasts from McKinsey, Goldman Sachs, Epoch AI, IEA, and others.
Key Assumptions:
  • Hardware efficiency continues with generational improvements (e.g., Blackwell → Rubin), but at a moderated pace due to physical limits.
  • Demand growth outpaces some efficiency gains, pressuring TCO.
  • Electricity prices rise 5–10% annually in high-demand regions; PUE improves modestly.
  • Capex scales with GW buildout; ~$5.2T AI-specific cumulative by 2030 (McKinsey baseline).
  • Volatility from supply ramps, geopolitics, and utilization rates (assume 60–80% for projections).
  • Inflation-adjusted where possible; ranges reflect scenarios (base/moderate growth vs. accelerated/constrained).
1. GPU Hardware & Rental Price Projections (Per GPU, Illustrative for Leading AI GPUs)H100-class equivalent (or normalized) → evolving to B200/Rubin equivalents. Purchase prices and rentals decline with supply but premium for newest gens.
  • 2025: H100 purchase ~$25k–$40k; rental $2–$4+/hr (spot lower ~$1–2.5).
  • 2026: B200 purchase ~$30k–$50k+; rental $4–$6+/hr initially, trending down with ramp.
  • Longer term: Effective $/FLOP declines 20–40% per major generation.
2. Effective Compute Cost (Illustrative $/10^18 FLOPS or Similar Normalized Metric)Historical: Significant declines (e.g., price per FP32 FLOP ~74% drop 2019–2025 in some analyses). Projection Table: Key Metrics (Annual or Cumulative)Table 1: Global AI Data Center Capex & Infrastructure Projections (USD Trillions, Cumulative Unless Noted)
Year
AI-Specific Capex (Cumulative)
Total Data Center Capex (Cumulative)
Annual AI Capex (Approx.)
Notes/Sources
2025
~0.5–0.8
~1.0+
~300–400B (Big Tech)
Hyperscaler ramp; ~$350B planned for majors.
2026
~1.2–1.5
~2.0+
765B (baseline)
~$400B Big Tech; approaching $1T total DC.
2027–2030 (cumul. to 2030)
5.2 (baseline)
6.7
Rising to ~1.6T by 2031
McKinsey; 125+ GW incremental AI capacity. Higher scenarios: 7.9T.
2031–2035
+4–8 (est.)
+10–15 (est.)
1–2T+ annually
Extrapolated; sustained if AI ROI justifies.
Table 2: Power & Electricity Projections (Global Data Centers)
Year
Electricity Consumption (TWh)
% of Global Electricity
Power Capacity Demand (GW, AI-related)
Est. Annual Electricity Cost Impact (at avg. $0.05–0.10/kWh)
2024/2025
~415
~1.5%
Baseline building
Hundreds of billions.
2030
~945 (base)
~3%
150–200+ GW total AI-related
$50–100B+ annually (rising with prices).
2035
1,200+
3–4%+
Higher
Significant TCO share (30–50%+ for ops).
Table 3: Effective Price of Compute – TCO & Rental Examples (Per GPU-Hour or Normalized; 1 GW DC Context)
Metric
2025–2026
2030 (Projected)
2035 (Projected)
Key Drivers
H100/B200-class Rental (spot/competitive)
$1.5–$6/hr
$1–$4/hr (newer gens)
<$1–$3/hr
Supply ramps, competition from custom ASICs.
Annualized TCO per MW (1 GW DC example)
~$8–11M/MW
$7–10M/MW (efficiency gains)
$6–9M/MW
Servers ~60% of TCO; power rising then stabilizing.
Effective Cost Decline (FLOPS/$)
Doubling every ~2.5 yrs (historical trend)
Continued 1.5–2.5 yr doublings
Slowing to 3+ yrs
Architecture, nodes (2nm+), software (quantization).
1 GW DC Total Annual TCO
~$8.5B (servers dominant)
$7–12B range
Scaled with efficiency
Lifespan 3–7 yrs for IT equipment.
Table 4: Breakdown for a Typical Large AI Data Center (e.g., 1 GW Scale, Annualized TCO)
Component
Share of TCO
2026 Est. Cost (1 GW)
2030 Projection Notes
Servers/IT Hardware (GPUs dominant)
~60%
~$5B
Largest driver; price/performance improves but volume surges.
Power/Electricity (OpEx)
Major OpEx (~$0.6B)
Rising with demand
30–50%+ sensitivity; location & PUE key.
Other CapEx Amortized (Cooling, Building, Network)
20–30%
Balance of ~$8.5B total
$15M+/MW build costs; power gen/transmission add-ons.
Total Annualized
100%
~$8.5B
Efficiencies vs. scale trade-off.
Scenario Analysis & Risks
  • Base Case: Moderate demand growth; efficiencies (e.g., better models, liquid cooling, custom chips) offset some costs. Compute price continues long-term decline in $/FLOP, but TCO rises due to energy/infra.
  • Accelerated: Higher AI adoption → $7.9T+ capex by 2030; tighter supply keeps rentals elevated longer.
  • Constrained: Slower ROI or bottlenecks → $3.7T capex; faster price drops from oversupply.
  • Uncertainties: Electricity prices/grid delays (+costs), chip geopolitics, breakthroughs (e.g., optical/neuromorphic computing could accelerate declines), utilization rates, and AI monetization (must justify trillions in spend).
Financial Implications: The AI compute market (GPUs alone) could grow to tens–hundreds of billions annually. Hyperscalers and neoclouds face massive capex but potential high margins if utilization is strong. Investors should watch power availability and effective FLOPS/watt/dollar as the true "price" metric evolves.
These projections are not financial advice and carry high uncertainty due to rapid technological and market changes. Actual outcomes depend on innovation pace, policy, and economic returns from AI. For custom modeling (e.g., specific workload TCO), more granular inputs would refine these further.


AI Energy Consumption Trends: Rapid Growth Driven by Data Centers and Workloads
AI energy consumption is surging, primarily through data centers powering training and especially inference. While AI currently represents a modest share of global electricity, its growth rate far outpaces overall demand, raising infrastructure, grid, and sustainability challenges. Current Levels (2024–2026)
  • Global data centers: Approximately 415 TWh in 2024, accounting for ~1.5% of global electricity consumption. This has grown at ~12% CAGR since 2017, more than four times faster than total global electricity demand.
  • 2025 updates: Data center electricity demand grew ~17% in 2025, with AI-focused facilities surging ~50%. Global estimates for 2025 hover around 460–490 TWh.
  • AI-specific share: AI/accelerated servers (mainly GPUs) drive a growing portion. Estimates for AI-related consumption in 2024 range from ~53–85 TWh globally (roughly 13–20% of data center use), with projections showing it rising quickly. Non-AI data centers still consume more overall, but AI accounts for a large share of net growth.
Per-query and model examples:
  • A single advanced generative AI query (e.g., ChatGPT-like) uses ~0.24–2.9 Wh, vs. ~0.3 Wh for a traditional search (though efficiency has improved).
  • Training frontier models is energy-intensive (gigawatt-hours for large runs), but inference dominates long-term use (often estimated at 80–90% of total AI compute energy).

Regional concentration:
  • US: ~45% of global data center electricity in 2024; demand projected to rise sharply (e.g., from ~176 TWh in 2023 to 325–580 TWh by 2028, or 6.7–12% of US electricity).
  • China and Europe follow as major players.
Growth Trends and Drivers
  • Accelerated servers (AI): Electricity use growing ~30% annually (IEA Base Case), vs. ~9% for conventional servers. AI/accelerated servers could account for nearly half the net increase in global data center consumption through 2030.
  • Power density: Traditional data centers: 10–30 kW/rack. AI-optimized: 50–150+ kW/rack. Hyperscale AI campuses can reach hundreds of MW to GW scale.
  • Historical efficiency gains previously kept demand flat despite workload growth, but AI scale (more models, users, multimodal) has reversed this. GPU/accelerator efficiency improves (~26–40% per year in some metrics), but demand outpaces it.

Key drivers:
  • Explosive adoption of generative AI (training + massive inference scaling).
  • Larger models, reasoning chains, and multimodal capabilities.
  • Hyperscaler capex boom (hundreds of billions annually).
Projections to 2030 and BeyondGlobal data centers (IEA Base Case and similar):
  • 2030: ~945 TWh (roughly double 2024 levels), ~3% of global electricity.
  • 2035: ~1,200 TWh in base scenarios.
  • Other estimates: 980 TWh (Gartner), ~1,000–1,065 TWh (various). AI-optimized servers could reach ~44% of data center power by 2030.
US-specific:
  • Significant growth to 8–12%+ of national electricity in some forecasts by 2030; power demand from AI data centers potentially 30x higher by 2035 in aggressive scenarios.

Frontier training: Power for largest models growing >2x per year; could reach multi-GW scale by 2030 for single runs. Scenario Ranges and Uncertainties
  • Base/moderate: Doubling by 2030, with efficiencies (better chips, quantization, liquid cooling, custom ASICs) and utilization helping moderate growth.
  • High/accelerated: Faster adoption could push higher (e.g., data centers rivaling major countries' total use).
  • Constraints: Grid bottlenecks, power generation delays, permitting, and chip supply could temper near-term growth. Regional concentration (e.g., Virginia, Ireland, Frankfurt) creates local strains (up to 20–80% of grid in hotspots).

Mitigation factors:
  • Hardware efficiency gains.
  • Software optimizations (model compression, sparse inference).
  • Renewables + on-site generation (hyperscalers signing PPAs).
  • Location strategies (cheaper power regions) and advanced cooling.
ImplicationsAI is reshaping electricity demand growth after decades of stagnation in advanced economies. While global share remains limited (~3% by 2030), the absolute addition equals entire countries' consumption, driving trillions in infrastructure investment, grid upgrades, and potential shifts in energy mix (gas, nuclear/SMRs, renewables).
Risks: Delayed projects, higher electricity prices, water use for cooling, and emissions if fossil-heavy.
Opportunities: AI could optimize energy systems elsewhere (e.g., grid management), potentially offsetting some impacts.
In summary, AI energy trends show a clear break from past efficiency-dominated flat demand—now entering a high-growth phase led by inference at scale. The trajectory through 2030 is roughly a doubling of data center power needs, with AI as the primary accelerator. Monitoring real-world utilization, efficiency breakthroughs, and power buildout will be key to refining these trends.


AI Cooling Technologies: From Air Limits to Liquid Dominance and Beyond
AI workloads, particularly training and inference on dense GPU/accelerator clusters, generate extreme heat densities (racks often 50–150+ kW, with next-gen chips exceeding 1,000W TDP). Traditional air cooling is hitting physical limits, driving a rapid shift to advanced liquid-based and hybrid solutions. This transition is critical for sustaining AI growth while managing energy consumption, water use, and Power Usage Effectiveness (PUE). Traditional Air Cooling: Still Dominant but Insufficient for AI
  • How it works: CRAC/CRAH units, fans, hot/cold aisle containment, and raised floors circulate chilled air over servers.
  • Pros: Mature, low upfront cost, easy maintenance, suitable for lower-density IT.
  • Cons for AI: Poor thermal conductivity of air limits heat dissipation. High fan power consumption (up to 40% of data center energy). Hotspots, noise, and scaling issues at high densities. Cannot efficiently handle modern AI racks without massive airflow (effectively requiring "wind tunnel" conditions).
  • Efficiency: Contributes to average PUE ~1.5–1.6 globally. Improvements via containment and free cooling help modestly.
Air cooling remains viable for mixed or legacy workloads but is being supplemented or replaced for AI/HPC.Liquid Cooling Technologies: The AI StandardLiquids (water-glycol mixes or dielectric fluids) transfer heat ~1,000x more effectively than air, enabling higher densities, lower energy use, quieter operation, and better waste heat recovery.
Key Variants:
  1. Rear-Door Heat Exchangers (RDHx): Passive or active doors on racks that capture heat from exhaust air into liquid loops. Good retrofit option.
    • Moderate density support; lower disruption.
  2. Direct-to-Chip (DTC) / Direct Liquid Cooling (DLC) / Cold Plates: Liquid flows through plates directly attached to CPUs, GPUs, and high-heat components.
    • Advantages: Excellent for high-density AI (handles 100–600+ kW/rack). Reduces cooling energy by 30–50%+ vs. air. Scalable with hybrids. Many hyperscalers deploying (e.g., Microsoft on Azure).
    • Challenges: Requires plumbing, leak detection, and server modifications. Higher upfront complexity.
  3. Immersion Cooling:
    • Single-phase: Servers submerged in dielectric fluid; fluid circulates and is cooled externally.
    • Two-phase: Fluid boils on hot components for even higher efficiency.
    • Advantages: Superior heat transfer, high rack densities, PUE as low as 1.03–1.1, minimal/no fans, reduced water use with closed loops. Quiet operation.
    • Challenges: Fluid cost and compatibility, maintenance (draining for repairs), facility redesign for greenfield sites. Growing fast (highest projected CAGR in some segments).
Hybrid Systems: Combine air for lighter loads with liquid for AI racks—practical for brownfield retrofits.
Efficiency Gains:
  • Liquid cooling can cut cooling energy by up to 90% in optimized setups and lower overall facility power significantly.
  • PUE improvements: From ~1.5+ (air-heavy) to 1.1–1.3 common with liquid; extremes near 1.05 with advanced optimization.
  • Water Usage Effectiveness (WUE) benefits from reduced evaporative cooling needs, though closed-loop systems vary.
Emerging and AI-Optimized Innovations
  • AI-Driven Cooling: Machine learning for predictive, dynamic control (e.g., Google's DeepMind reduced cooling energy by ~40%). Real-time sensor data optimizes pumps, fans, and setpoints. Quantum/federated learning models explored for further gains.
  • Phase-Change Materials (PCMs), Spray/Jet Impingement, Heat Pipes, and Thermoelectrics: For targeted or novel applications.
  • Membrane-Based and Low-Water Systems: Reduce water footprint, important in water-stressed areas.
  • Nuclear-Inspired (e.g., subcooled boiling adaptations): Waterless or ultra-efficient options borrowing from reactor tech.
  • Waste Heat Reuse: Pair with district heating, absorption chillers, or industrial processes. Increasingly a permitting requirement in some regions.
  • Exotic Concepts: Orbital/radiative cooling proposals (e.g., space-based data centers using vacuum radiation), though highly speculative.
Market and Adoption Trends (2025–2026+)
  • Liquid cooling adoption surging: 19–22% current usage in surveys, with 20–50%+ planned in coming years. Market growing at 16–24%+ CAGR, driven by AI.
  • Hyperscalers (Google, Microsoft, etc.) leading deployments; colocation providers offering high-density AI suites with advanced cooling.
  • 2026 focus: Heat recovery integration, hybrid systems, and balancing capex with efficiency/ROI.
Trade-offs Summary:
  • Air: Cheaper initially, simpler → but higher long-term energy costs and density limits.
  • Liquid: Higher efficiency, density support, lower TCO for AI → but complexity, fluid management, and retrofit costs.
Implications for Energy Consumption and SustainabilityBetter cooling directly mitigates AI's energy surge (data centers projected ~945 TWh by 2030). It enables higher utilization, reduces PUE/WUE, and supports waste heat recovery. However, challenges remain: supply chain for fluids/components, skilled labor, and upfront capital. Location (climate, power, water) and regulatory pressures will shape choices.
In summary, AI is forcing a cooling revolution. Liquid and hybrid technologies are no longer optional for frontier workloads—they are foundational to scaling compute sustainably. Continued innovation in AI-optimized controls, materials, and integration with power sources (e.g., on-site nuclear/SMRs) will determine how effectively the industry manages the heat of intelligence. Expect rapid evolution through 2030 as densities climb further.

No comments: