Netizen: Baseten's Path To A Trillion

Monday, June 22, 2026

Baseten's Path To A Trillion

We closed our Series F today at a $13B valuation.

Our inference business grew 20x in the last year. I want to explain why:

The growth comes from a shift I think is permanent: companies want to own their intelligence layer. Instead of relying exclusively on closed models, teams…
— Amir Haghighat (@amiruci) June 22, 2026

Amir. Congrats. I have an offer you can't refuse. I offer some of the cheapest compute in the world. At a discount. As you know, the demand for compute is huge, unmet and exploding. Future proof yourself, let's talk.
— Paramendra Kumar Bhagat (@paramendra) June 22, 2026

Baseten (baseten.co) is a San Francisco-based AI infrastructure company specializing in high-performance model inference for production workloads. 
It provides a dedicated platform for deploying, serving, optimizing, and scaling open-source, custom, fine-tuned, and proprietary AI models (including LLMs, image generation, transcription, embeddings, TTS, and compound AI systems). The focus is on low-latency, high-throughput, cost-efficient inference with strong reliability (99.99% uptime claims), rather than general cloud computing or primarily training. Key Offerings and FeaturesInference Platform: Purpose-built stack with custom kernels, advanced decoding, caching, quantization, batching, and hardware optimization. Supports rapid cold starts, global multi-cloud scaling (or self-hosted/hybrid in customer VPCs), and autoscaling. 
Pre-optimized Model APIs/Library: Instant access to models like Kimi, DeepSeek, GLM, etc.
Training/Post-Training: Tools like Baseten Loops (Training SDK for frontier RL), fine-tuning, and one-click deployment from training to inference. 
Specialized Optimizations: Fastest Whisper transcription (with streaming/diarization), real-time TTS/audio streaming, high-throughput embeddings (BEI: >2x throughput, lower latency), image gen (ComfyUI/custom), ultra-low-latency compound AI (Chains), and performant LLM runtimes. 
Developer Experience & Support: Model management, observability, forward-deployed engineers for hands-on optimization, and strong DevEx for iteration. Single-tenant and self-hosted options for security/enterprise. 
Other: Frontier Gateway for monetizing models; partnerships with clouds (e.g., Google Cloud, AWS) and NVIDIA for hardware optimization (e.g., Blackwell GPUs). 
Customers include high-growth AI-native companies such as Abridge, Cursor, Clay, Decagon, Descript, EliseAI, Gamma, HeyGen, Lovable, Notion, OpenEvidence, Parallel, Poolside, World Labs, Writer, Zed Industries, Wispr, Clickup, and others. Testimonials highlight performance gains, reliability, cost savings, speed to production, and engineering support. 
The company emphasizes "owned intelligence" via open-weight and custom models, helping teams move beyond reliance on closed APIs like OpenAI/Anthropic for production apps. It positions itself in the "inference gold rush," addressing bottlenecks in serving models at scale amid GPU scarcity and exploding demand. 
Founded in 2019, Baseten started with a focus on making ML deployment easier (initially more traditional ML) before pivoting/growing heavily with the generative AI boom (e.g., post-Stable Diffusion/Whisper). It originated from frustrations with fragmented tools for training, serving, and scaling models in production. Funding and GrowthBaseten has seen explosive growth with multiple large rounds in a short period, driven by revenue surges (e.g., 6x or 20x reported in periods), high demand for inference infra, and strong customer retention. 
Notable rounds (approximate, based on reports):
Early: Seed (~$2.5M, First Round), Series A (~$13.5M, Sequoia).
2025: $75M Series C (~$825M valuation); $150M Series D (~$2.15B valuation).
2026: $300M Series E (~$5B valuation, with NVIDIA participation); recent ~$1.5B Series F at ~$13B valuation (led by Altimeter, Conviction, Spark, etc.). 
It has raised hundreds of millions overall, achieving unicorn (and beyond) status rapidly. Investors include Greylock, IVP, Spark, BOND, CapitalG, NVIDIA, and others. NVIDIA has invested directly. 
The company has a research arm (performance, kernels, infrastructure, post-training), open-sources some tools, and maintains a strong engineering culture. It offers a startup program with credits/support. FoundersBaseten was founded by engineers who knew each other from prior work (including a startup ~14 years before Baseten). 
Tuhin Srivastava (CEO & Co-Founder): Primary public face and leader. Background includes investment banking (Macquarie Group), ML engineering, and founding Shape Analytics (acquired). Worked at Gumroad as a data scientist. Emphasizes practical production challenges, custom/owned models, and scaling AI products. Active in podcasts/interviews on inference trends. 
Amir Haghighat (CTO & Co-Founder): Key technical leader. Prior roles include Head of Engineering at Gumroad, Engineering Manager at Clover Health, and software engineering at Yelp. Long-time collaborator with other founders. 
Phil (Philip) Howes (Co-Founder & Chief Scientist): Focus on science, research, and deep technical aspects (neural nets, inference engineering). Background includes co-founding Shape and work at Gumroad. PhD from University of Sydney. Contributes to research, writing, and tools (e.g., performance clients). 
Pankaj Gupta (Co-Founder): Involved in early stages; less public detail in sources, but part of the core founding team. 
The team is described as interdisciplinary (researchers, engineers, operators) obsessed with hard systems problems across the full stack. 
Baseten operates in a competitive inference space (alongside companies focused on serving/optimization) but differentiates via performance research, multi-cloud flexibility, developer experience, embedded engineering support, and enterprise-grade features. It has grown rapidly as inference has become a major bottleneck and cost driver in AI. For the absolute latest details, check their site, blog, or recent funding announcements, as the space moves extremely fast.

Baseten’s inference optimization techniques combine low-level custom engineering, state-of-the-art open-source frameworks, production hardening, and modality-specific tuning. This creates a high-performance stack focused on low latency (especially TTFT and p99), high throughput, cost-efficiency, reliability (99.99%+ uptime), and minimal quality degradation. 
The Baseten Inference Stack has two integrated layers:
Inference Runtime: Model execution, kernels, decoding, quantization, etc.
Inference-Optimized Infrastructure: Routing, autoscaling, multi-cloud/hybrid scaling, KV/LoRA cache-aware placement, and request prioritization. 
Baseten layers proprietary enhancements on top of frameworks like TensorRT-LLM (core for many workloads), SGLang, vLLM, and others. They benchmark and select per workload/hardware, then extend with custom code (e.g., C++ executor API usage, patches, and a custom server replacing Triton). 1. Custom Kernels and Low-Level OptimizationsKernel fusion: Combines operations (e.g., matmul + bias + activation) into single kernels to reduce memory traffic and launch overhead. 
Memory hierarchy optimization: Prioritizes fast memory (registers, shared) over global memory.
Custom/tailored attention kernels: Optimized for speed, memory use, context length, and hardware (e.g., Flash Attention variants, workload-specific balancing of quantization vs. quality for video). 
Asynchronous compute and PDL (Persistent Direct Launch?): Better GPU utilization on Hopper/Blackwell architectures.
Modality-specific kernels (e.g., for video denoise loops, embeddings, Whisper). 
These yield gains in tokens/second, TTFT, and overall efficiency. Baseten also supports features like structured output via state machines (e.g., Outlines/xGrammar integration with custom CUDA kernels, no ITL penalty). 2. Batching and SchedulingContinuous / in-flight batching (vs. static/dynamic): Processes tokens iteratively; new requests join as others finish. Maximizes GPU utilization for variable-length LLM outputs (huge throughput win over traditional batching). TensorRT-LLM uses in-flight batching. 
Request prioritization: Prefill (more expensive, latency-critical) over decode.
Disaggregated serving (prefill vs. decode on separate hardware/runtimes): Scales components independently. 
3. QuantizationPost-training quantization, preferring floating-point formats (FP8, FP4 on newer GPUs) for minimal perplexity/quality loss vs. integer methods. 
Supports KV cache quantization, selective schemes (e.g., FP8 weights + higher-precision KV), and per-model/GPU tuning.
Examples: Significant speedups on H100 (FP8), Blackwell (FP4), with tools like Engine Builder for automated compilation. 
4. Speculative Decoding (SpecDec) and RelatedDraft-target: Smaller draft model proposes tokens; target verifies (cheap). Dynamically enabled based on load. 
Self-speculative: Medusa (extra decode heads), Eagle (advanced), Lookahead decoding.
Optimized for code/structured/predictable content; productionized with better batching and orchestration to avoid crashes or throughput drops. 
Can double+ tokens-per-second in favorable conditions; dynamically managed.
5. KV Cache OptimizationsPrefix caching / Radix Attention (high hit rates, fine-grained in SGLang).
Reuse, offloading (GPU → CPU/system memory), and cache-aware routing (geographic + warm cache placement). 
Critical for long contexts and low TTFT (avoids recomputing prefixes). Techniques like chunked prefill help manage memory. 
6. Model Parallelism and ScalingTensor parallelism (TP), expert parallelism (EP), and topology-aware blends to minimize communication for large/multi-GPU models. 
Hybrid multi-cloud scaling, fast cold starts (weights distribution, <10s small / <1min large models), autoscaling, and active-active failover. 
7. Modality-Specific and Other TechniquesEmbeddings (BEI): >2x throughput, ~10% lower latency via dedicated runtime. 
Whisper ASR: Custom optimizations for sub-300ms transcription, streaming, diarization, and real-time factor gains. 
TTS/Audio: Real-time streaming, low TTFB; leverages LLM-like backbones with TensorRT-LLM/FP8. 
Image/Video: Kernel fusion, custom attention/denoise kernels, timestep distillation (e.g., for FLUX), ComfyUI support. 
LoRA support: Serve many fine-tunes from one base model. 
Compound AI (Chains): Granular hardware allocation and autoscaling for multi-model systems.
Additional Production FeaturesAutomated Engine Builder for TensorRT-LLM compilation (minutes vs. hours). 
Forward-deployed engineers for custom optimization.
Rigorous quality/latency testing; configurable tradeoffs.
Strong DevEx: Config-driven (e.g., config.yaml for quantization, decoding), observability, and iteration speed.
Baseten’s approach (detailed in their Inference Engineering resources by Philip Kiely) emphasizes combining techniques—individual optimizations address only parts of the problem (e.g., KV cache helps TTFT but not decode speed). Results include customer-reported 2x+ throughput, 60%+ gains, massive cost-performance improvements (e.g., on Google Cloud/Blackwell), and production reliability at scale. The field evolves rapidly (new models like DeepSeek bring architecture changes like MLA; new GPUs like Blackwell enable FP4, etc.), so Baseten maintains a dedicated model performance team for day-zero support and ongoing research. For the latest benchmarks or hands-on details, check their blog, docs, or Model Library.

Baseten’s path to a $1 trillion valuation by 2036 (from its current ~$13B in mid-2026) is extraordinarily ambitious but plausible in the AI infrastructure supercycle, where cumulative AI CapEx could reach trillions and inference becomes the dominant, high-margin layer. 
This requires ~77x growth in valuation, implying compound annual growth in revenue/profitability that outpaces historical tech giants (e.g., via massive market expansion, platform lock-in, and multiple expansion as it becomes a "picks-and-shovels" leader). The plan assumes sustained AI adoption, Baseten executing flawlessly on differentiation, favorable macro (energy, chips, regulation), and successful M&A/expansion. Risks are high: execution failures, commoditization, energy constraints, or regulatory shifts could derail it. Current Baseline (2026)Valuation: ~$11-13B (recent $1.5B round). 
Revenue: Reports suggest hundreds of millions ARR (e.g., ~$600M in some estimates), with 20x+ growth in periods driven by inference demand. 
Strengths: Superior performance (custom kernels, optimizations for LLMs, embeddings, Whisper, TTS, compound AI), developer experience, hybrid/self-hosted options, forward-deployed engineers, blue-chip customers (Cursor, Notion, HeyGen, etc.), and research edge. 
Focus: Owned intelligence via open/custom models on optimized inference.
Competitive Landscape (2026)Key Inference Specialists:
Together AI: Broad platform (training + inference), strong revenue (~$1B ARR estimates), developer-friendly. Valuations in talks ~$7.5B range earlier. 
Fireworks AI: Speed-focused, high performance on select models, strong revenue (~$800M ARR estimates), valuations talked at $15B. Ex-PyTorch talent. 
DeepInfra: Cost leader, attractive pricing for high-volume. 
Groq: Hardware (LPUs) for ultra-low latency; fast but specialized and potentially higher cost. 
Others: Modal (serverless Python), Replicate, RunPod, Hugging Face Inference, Anyscale. Hyperscalers (AWS SageMaker, GCP Vertex, Azure) for integrated ecosystems. 
Broader Landscape: NVIDIA dominates hardware; hyperscalers offer scale but less specialization; chip startups (Cerebras, etc.) compete on custom silicon. 
Evolution Over 10 Years:
Short-term (2026-2028): Fragmentation with many specialists; performance/cost differentiation wins. Commoditization pressure on raw APIs; winners add enterprise features (SLAs, compliance, hybrid).
Mid-term (2028-2032): Consolidation via M&A. Inference shifts heavily to agents/compound systems, multi-modal, edge/on-prem. Regulatory scrutiny on energy/use rises. Custom silicon and vertical integration grow.
Long-term (2032-2036): Inference as ubiquitous "AI OS/utility" layer. Survivors become platform monopolies (like AWS in cloud) with network effects from models, data, optimizations. Edge/distributed inference explodes; new bottlenecks (power, data, orchestration) create adjacencies. Hyperscalers and big tech (Google, Microsoft, Amazon, Meta) integrate deeply but leave room for specialized "AI-native" leaders. Open-source momentum could accelerate or disrupt depending on model quality. 
Baseten differentiates via full-stack performance research + enterprise-grade flexibility (vs. pure API players or rigid hyperscalers).Phased Super-Ambitious Growth PlanPhase 1: Dominate Inference (2026-2028) — Target: $5-10B Valuation, $2-5B ARR
Product: Double down on optimizations (custom kernels, speculative decoding, KV cache, disaggregated serving, FP8/FP4, Chains for compound AI). Launch Baseten Loops (RL/training) for seamless train-to-infer. Expand modality leadership (video, audio agents). Build "Inference Fabric" — unified API + self-hosted with one-click portability. 
Go-to-Market: Embed deeply with AI-native startups (agents, coding, voice, enterprise apps). Enterprise push: compliance (SOC2, HIPAA, GDPR), SLAs, dedicated capacity. Startup credits + marketplace for fine-tunes/models. Frontier Gateway for model monetization.
Tech/Moat: Open-source selective tools for community; proprietary engine + research (performance team). Partner deeper with NVIDIA (Blackwell/Rubin), Google Cloud, etc., while building multi-cloud neutrality.
Metrics: 5-10x revenue via volume + premium pricing for performance. 99.999% uptime, sub-100ms p99 latency benchmarks. Acquire smaller inference/optimization startups.
Funding: Continue large rounds; prepare for IPO ~2028 at $20B+.
Phase 2: Platform Empire & Expansion (2028-2032) — Target: $100-300B Valuation, $20-50B+ ARR
Horizontal Platform: Evolve into full "AI Operating System" — inference + orchestration, observability, agent frameworks, fine-tuning marketplace, data flywheel (anonymized optimizations). Add edge inference, on-prem hardware integrations.
Verticals: Deep industry solutions (healthcare agents via Abridge-like, legal, finance, creative with media optimizations). Acquire or build vertical models/tools.
Global/Infra Scale: Massive capacity deals (GW-scale reservations). Invest in/partner for power (nuclear/SMRs, renewables). Expand self-hosted/hybrid for sovereign AI (governments, enterprises wary of hyperscalers). International data centers (EU, Asia, Middle East).
M&A/Ecosystem: Buy competitors (e.g., Fireworks/Together assets), tooling companies, or chip design IP. Build developer ecosystem (SDKs, plugins) like NVIDIA CUDA. Potential partnerships with foundational model labs.
Monetization: Tiered (usage + platform fees + enterprise support + marketplace cuts). High-margin software + managed hardware revenue.
Defensibility: Data moat (performance insights), talent (top inference engineers), switching costs (optimized pipelines hard to migrate).
Phase 3: AI Utility & New Frontiers (2032-2036) — Target: $1T+ Valuation
Ubiquity: Become default for production AI (like AWS for cloud). Power autonomous agents, robotics, AR/VR, scientific discovery at global scale.
Adjancencies: Expand into training/post-training at scale, synthetic data, energy optimization software, or even custom silicon co-design. Enter consumer via partnerships (e.g., devices with on-device inference).
Sustainability/Impact: Lead on green AI (efficiency reducing total power needs). Sovereign AI platforms for nations.
Financials: Aim for 30-50%+ operating margins at scale. Revenue mix: 60% inference usage, 20% platform/subscriptions, 20% premium/verticals. Multiple expansion as "must-own" infra with recurring, sticky revenue.
Exit/Structure: Public company with massive float; potential spin-offs or holding structure.
Key Levers for Trillion-Dollar ScaleMarket Tailwinds: Inference market to hundreds of billions (some forecasts $200B+ by 2035); broader AI infra CapEx in trillions cumulatively. Baseten captures 10-20%+ share via superiority. 
Execution Musts: Obsessive performance (benchmark leadership), customer obsession (embedded teams), talent density, capital efficiency then scale.
Risk Mitigation: Diversify beyond GPUs (multi-vendor), geopolitical hedging, continuous innovation against commoditization.
Comparables: NVIDIA’s rise (chips), AWS (cloud utility), or hypothetical "AI AWS." Few reach $1T, but AI scale (trillions in spend) enables it if Baseten owns the critical layer. 
This plan is deliberately aggressive — requiring near-perfect execution, continued AI hype-to-reality transition, and favorable economics (e.g., inference costs dropping but volume exploding). Success hinges on Baseten staying ahead on the "last mile" of AI deployment: making frontier intelligence reliable, cheap, and ubiquitous in production. The inference gold rush is real; owning its infrastructure layer could mint the next mega-cap. For implementation, leadership must prioritize long-term moats over short-term revenue.

Baseten would need an enormous scale of compute—potentially millions of advanced GPUs (H100/Blackwell-equivalent or better) by the mid-2030s—to support a $1T valuation trajectory. This assumes the company captures a significant share of a rapidly growing inference market through superior optimization, platform stickiness, and expansion into orchestration/verticals. Projections involve many assumptions and high uncertainty due to efficiency gains, hardware evolution (e.g., Blackwell → Rubin → successors), utilization rates, pricing compression, and mix of managed vs. self-hosted workloads. Key Assumptions (2026 Baseline to 2036)Current (mid-2026): ~$600M ARR. Valuation ~$5-13B. Strong growth from inference demand. 
Revenue Path (ambitious, aligned with prior plan):2028: $2-5B ARR (Phase 1 dominance).
2032: $20-50B+ ARR (platform empire).
2036: $100B+ ARR (AI utility layer; implies high multiples for $1T valuation via margins, ecosystem, and market expansion). 
Inference Market Context: $100-250B+ by 2030 (various forecasts; inference ~70-80% of AI compute by mid-2030s). Broader AI infra CapEx in trillions cumulatively. Baseten aims for 10-20%+ share via differentiation. 
Revenue per GPU: Rough estimates from providers (Together, Fireworks, etc.) suggest $20K–$100K+ annual revenue per H100-equivalent in 2026, depending on utilization (high for inference), optimizations (Baseten's edge helps), pricing model (per-token premium vs. raw GPU-hour), and mix. Optimistic blended average: ~$40K–$80K/year initially, declining over time due to efficiency but offset by volume/premium features. 
Hardware Evolution: 2-3x+ efficiency gains per generation (throughput, tokens/$, power). Baseten’s custom kernels, quantization (FP8/FP4), speculative decoding, etc., amplify this. Shift to newer chips (Blackwell B200/GB200, Rubin, etc.) and potential custom silicon/edge. 
Power: H100 ~700W TDP (cluster ~1-1.4kW/GPU effective with overhead). Newer chips higher TDP but far better perf/W. Data center PUE, cooling, etc. 
Projected Compute NeedsBy 2028 (Phase 1: Dominate Inference):
~50,000–150,000+ advanced GPUs (H100/Blackwell-equivalent).
Supports $2-5B ARR at high utilization and premium pricing for performance/low-latency/enterprise features.
Power: ~50-200 MW (rough; scales with efficiency).
By 2032 (Phase 2: Platform Empire):
500,000–2M+ GPUs (or equivalents).
Revenue scales with massive volume (compound AI, verticals, global/edge), self-hosted/hybrid (less direct CapEx but orchestration revenue), and marketplace.
Power: Hundreds of MW to low single-digit GW. Partnerships for capacity reservations, multi-cloud, and power deals (nuclear/SMRs critical).
By 2036 (Phase 3: AI Utility, $1T Valuation):
2M–10M+ GPUs/equivalents (or far fewer next-gen systems due to efficiency; e.g., Rubin/ successors could deliver 5-10x+ perf/W). This is a massive but plausible slice if Baseten becomes a leading "AI OS" layer powering agents, robotics, enterprise, sovereign AI, etc. 
Equivalent to a significant fraction of global AI compute supply at the time. Total AI compute demand could reach hundreds of GW by 2030-2035. 
Power: Low single-digit to tens of GW (depending on efficiency). Global data center power for AI/data centers projected to reach hundreds of GW/TWh scale; Baseten as a major player would drive/partner on dedicated capacity. 
Cumulative CapEx Implications: Tens to hundreds of billions in GPU/hardware procurement (or reserved capacity), data centers, power infrastructure over 10 years. Baseten would leverage hyperscaler partnerships, self-hosted customer infra (revenue from software/orchestration), M&A, and potential co-design/investments (e.g., with NVIDIA). Direct ownership of all compute unlikely—hybrid model key for scale and margins. Challenges and Mitigations in the PlanSupply Constraints: GPU scarcity, power availability (nuclear/renewables critical), geopolitics. Mitigation: Multi-vendor (AMD, custom), edge/distributed inference, efficiency leadership.
Efficiency Gains: Offsets raw GPU count (e.g., 10x lower cost/token via software + hardware). Baseten’s research moat (kernels, disaggregated serving, Chains) crucial. 
Utilization & Pricing: High utilization for inference (vs. training) helps revenue/GPU. Per-token + platform fees sustain margins as raw costs fall.
Competition: Hyperscalers (AWS/Azure/GCP) integrate deeply; specialists (Together, Fireworks) consolidate; Groq/custom silicon for niches. Baseten wins on performance, DevEx, hybrid flexibility, and enterprise SLAs.
Risks: Overbuild (demand slowdown), commoditization, energy bottlenecks, regulation. Success requires flawless execution, talent, and ecosystem lock-in (data flywheels, developer platform).
This is an extremely aggressive projection—few companies achieve this scale. It positions Baseten as an infrastructure giant akin to a specialized AWS for AI production. Real numbers depend on execution, AI adoption curves, and hardware roadmaps (NVIDIA Rubin, etc.). For finer modeling, detailed benchmarks on tokens/GPU and utilization would refine these estimates. The inference layer’s centrality in the AI stack makes this compute hunger both a risk and a massive opportunity.

Baseten’s Bold Bet: Why a $500M Investment in Himalayan Compute Could Secure the Cheapest AI Inference on Earth 
In the white-hot race for AI dominance, compute is the new oil — scarce, expensive, and increasingly the deciding factor between breakout success and also-ran status. Recent mega-deals underscore the frenzy: Anthropic committed to paying SpaceX $1.25 billion per month for access to the Colossus data center capacity, while Google signed on for $920 million monthly for roughly 110,000 NVIDIA GPUs. These contracts, stretching into 2029 and potentially worth tens of billions, reflect companies paying premium (reportedly 2-3x market rates in some analyses) for guaranteed future capacity amid exploding, unmet demand. 
For an inference specialist like Baseten, which has ridden the wave to a ~$13B valuation on superior optimization and production reliability, the message is clear: secure low-cost, scalable compute now or risk margin compression and competitive disadvantage as the inference gold rush intensifies. A strategic $500 million investment in a forward-looking venture like Himalayan Compute — leveraging Nepal’s vast Himalayan hydropower for ultra-cheap, green AI data centers — could be the masterstroke that delivers discounted, abundant capacity and positions Baseten as a vertically integrated leader. The Compute Crunch Is Real — And Getting WorseThe SpaceX deals are not anomalies; they are symptoms of structural imbalance. Global AI demand, particularly for inference (which now dominates total compute spend), is growing exponentially. Training runs for frontier models consume massive clusters, but serving those models to millions of users — with low latency, high throughput, and reliability — multiplies the need. Hyperscalers and AI labs are scrambling, signing eye-watering forward contracts because spot or near-term capacity simply isn’t available at scale. 
Market rental prices for H100-class GPUs hover in the $2–$4+ per hour range depending on provider and commitment, but premium reserved capacity commands far higher effective rates, especially when bundled with SLAs and power infrastructure.  At the premiums Anthropic and Google are reportedly paying, they are effectively subsidizing rapid buildout while locking in supply. This dynamic creates a perfect storm: demand outstrips supply, power and grid constraints bite in traditional hubs (U.S., Europe), and costs remain elevated even as hardware efficiency improves.
Baseten’s core strength — custom kernels, advanced quantization, speculative decoding, disaggregated serving, and modality-specific optimizations — already delivers superior price-performance for customers. But owning or controlling cheaper underlying compute would amplify this edge dramatically, enabling lower prices, higher margins, or both.Why the Himalayas? Cheap Power, Natural Advantages, and Green AINepal and the broader Himalayan region represent one of the most compelling untapped frontiers for AI infrastructure. The area boasts enormous untapped hydropower potential — renewable, dispatchable baseload energy from high-altitude rivers and reservoirs. Unlike solar/wind-heavy regions plagued by intermittency or U.S. grids strained by permitting and NIMBY issues, Himalayan hydro offers stable, low-cost electricity ideal for 24/7 AI workloads. 
Additional advantages include:
Lower land and construction costs compared to Silicon Valley or Northern Virginia.
Natural cooling from high altitudes, reducing energy overhead for thermal management.
Geopolitical diversification — reducing reliance on strained U.S.-China supply chains or single-country regulatory risks.
Emerging ecosystem interest: Proposals and discussions around “Himalayan Compute” or Nepal-based AI data centers highlight the vision of turning the region into a “trillion-dollar AI engine” powered by green hydro. 
A $500M infusion from Baseten could catalyze a dedicated campus or joint venture: funding initial GW-scale power reservations, liquid-cooled GPU clusters optimized for Baseten’s stack, and fiber connectivity for low-latency global routing. In return, Baseten secures multi-year discounted capacity — potentially at a meaningful fraction of U.S./SpaceX premiums — plus priority access for its inference platform and customers.Strategic Fit for Baseten’s Ambitious TrajectoryBaseten has already demonstrated capital discipline and rapid scaling, raising hundreds of millions while growing revenue aggressively through performance leadership. A $500M strategic investment (a fraction of recent rounds) aligns perfectly with its hybrid/multi-cloud model: it doesn’t abandon partnerships with NVIDIA, Google Cloud, or others but adds proprietary low-cost capacity as a differentiator. 
Benefits include:
Cost leadership: Dramatically lower power and opex enable sub-market pricing or fatter margins on high-volume inference (embeddings, Whisper, TTS, compound AI systems).
Supply chain resilience: Dedicated capacity hedges against future shortages and premium spikes.
Vertical integration moat: Combine Baseten’s software optimizations with hardware-level control for end-to-end superiority — from kernel tuning to power procurement.
Sustainability angle: Green hydro appeals to enterprise customers facing ESG pressure and regulatory scrutiny on AI’s energy footprint.
Expansion platform: Use the site for sovereign AI offerings, regional markets (Asia), or even R&D into next-gen efficiency.
This move would accelerate Baseten’s path toward platform empire status in the earlier 10-year trillion-dollar vision — moving beyond pure software into controlled infrastructure without the full CapEx burden of building everything in-house.Risks and Execution RealitiesChallenges exist: Nepal faces infrastructure hurdles (connectivity, skilled labor, political stability), longer lead times for grid and permitting, and the need for strong local partnerships. Geopolitical risks in the region and water/climate concerns tied to Himalayan hydro must be managed carefully. Execution would require forward-deployed teams, phased buildout, and blending with Baseten’s existing multi-cloud strategy. 
Yet these are surmountable with the right JV structure, government incentives (common for data centers), and Baseten’s engineering DNA. Compared to paying ongoing premiums for constrained capacity, the upside of securing “the cheapest compute in the world” is transformative.The Bottom LineThe AI compute market is in a classic shortage-driven boom. While others pay billions monthly for tomorrow’s capacity at today’s inflated prices, forward-thinking players like Baseten can invest in abundant, low-cost alternatives. A $500M stake in Himalayan Compute isn’t just opportunistic — it’s a potential masterclass in vertical strategy that could supercharge margins, lock in customers, and provide the fuel for Baseten’s decade-long sprint to infrastructure supremacy.
In the race to own production AI, cheap and reliable electrons may prove as decisive as clever kernels. The Himalayas, long a source of natural wonder, could soon power the artificial intelligence revolution — with Baseten at the forefront.

10 Slides: Himalayan Compute: The Grand Solara Vision
Himalayan Compute: Grand Solara Vision
Himalayan Compute: 10 Years To A Trillion: Detailed Roadmap
Himalayan Compute: Podcasts
The Founder: Profile By Adam Shuaib
The Columbus Way, The Neil Armstrong Way From Unicorns to Solaras: Building Trillion-Dollar Companies That Transform Humanity
Nepal's Trillion Dollar Himalayan Compute Plan 🏔️Himalayan Compute: Nepal’s Blueprint for Triple-Digit Economic Growth

Sameer Maskey: Why Nepal must build a sovereign ‘AI Factory’
बुढानीलकंठ स्कुल: स्याउ रुख बाट झर्यो, अंतरिक्ष तर्फ हाननियो
Himalayan Compute: The Vehicle For Nepal's Economic Revolution
अमेरिकामा रहेको प्रत्येक नेपालीको आर्थिक क्रांति गर्ने प्रथम र अंतिममौका
Nepal's Trillion Dollar Himalayan AI Moonshot
🇳🇵 The Super App That Will Transform Nepal

https://t.co/YPONx4IZSz
— Tuhin Srivastava (@tuhinone) May 13, 2026

The GLM moment is going to be bigger than the DeepSeek moment.

Baseten has the fastest inference on the best open-weight model. >280 tps and <0.8 ttft. https://t.co/xoM5s5ApmD pic.twitter.com/wwG6XLS9qn
— Tuhin Srivastava (@tuhinone) June 22, 2026

.@Baseten is building the Inference Cloud, and has raised another $1.5B to invest aggressively in their capacity, infrastructure platform and research products.

Today, they serve the leading AI-native companies who want to own and improve their intelligence. These frontier… https://t.co/EgAFyuPSwy
— sarah guo (@saranormous) June 22, 2026

Baseten's Path To A Trillion https://t.co/e3PTeAIlop @tuhinone @saranormous @amiruci
— Paramendra Kumar Bhagat (@paramendra) June 22, 2026

The next decade will be the Inference Decade.

Open-weight models are reaching frontier-level performance at a fraction of the cost. The need for independent inference infrastructure will only grow.

That’s why we’re backing Baseten for the fourth time. https://t.co/rLfIKoYptd
— adam bain (@adambain) June 22, 2026

Baseten's Path To A Trillion https://t.co/e3PTeAIlop
— Paramendra Kumar Bhagat (@paramendra) June 22, 2026

INSTEAD OF WATCHING AN HOUR OF NETFLIX TONIGHT.

This 1 hour Stanford lecture by Joel Peterson will teach you more about negotiation and getting what you want than most people learn in years.

Bookmark it and give it an hour, no matter what. https://t.co/ASPQxGajgP pic.twitter.com/tT2UFzBpHP
— Swati Gupta (@hrswatigupta) June 22, 2026

AI advantage is shifting from raw model scale to deep integration with unique organizational data, workflows, and human expertise through continual learning loops. Ideally, this will result in distributed, defensible value rather than winner-take-all dynamics around general… https://t.co/TibhvwVb2o
— Tuhin Srivastava (@tuhinone) June 14, 2026

Pages

Monday, June 22, 2026

Baseten's Path To A Trillion

No comments: