Netizen: 2026

Tuesday, July 28, 2026

28: Kimi K3

Honoured to meet Hon’ble Prime Minister Shri @narendramodi ji along with my co-founder, Bharath.

From two engineers who started out with a bold idea which had low chance of success—to standing together with the Prime Minister after Vikram-1 became India’s first private orbital… pic.twitter.com/FFcbgF5VTc
— Pawan (@PawanKChandana) July 27, 2026

T 5812 - हाथ जोड़, आभार !! और तुलसी पे जलम समर पयानी 🕉️ pic.twitter.com/1wyIIrM3Ng
— Amitabh Bachchan (@SrBachchan) July 27, 2026

People see a cool F/A‑18 axial roll and as a controls systems engineer, I just see a control allocation flex 😅 😛

The N30’s flight controls aren’t that different from a fighter jet’s. The F/A‑18’s inner loop runs~ 80 Hz, ours runs ~50 Hz- constantly adjusting the control… https://t.co/SuMLub7ucP pic.twitter.com/HUAfGzqHCr
— Sampriti Bhattacharyya (@sampritibh) July 27, 2026

The best way to communicate with the world is directly in your own words, not through the reality distortion nightmare mirror that is the legacy mainstream press https://t.co/cuCKtoMLmd
— Elon Musk (@elonmusk) July 27, 2026

Be earnest, helpful, and sincere

Being mean, nihilistic, or clout chasing will infect your soul and you’ll never make something great if you do that https://t.co/bMPcDYty0D
— Garry Tan (@garrytan) July 27, 2026

I think this might be the first time I've starred in a startup's promo vid. Glad it was Sarvam!

Epoch will be epic! https://t.co/e6yzggQDfJ
— Caleb (@caleb_friesen) July 27, 2026

Congrats.
— Paramendra Kumar Bhagat (@paramendra) July 28, 2026

Outlier talents often change their minds and contradict themselves constantly; they will bin strong opinions the instant new data materialises, with zero emotional attachment.

I've met founders who pivot their whole company after one news release, and a researcher who spent 5…
— Adam Shuaib (@adamshuaib) July 27, 2026

The older I get, the more I realize that the happiest people are very difficult to rush.

I recently sat next to a gentleman on a flight who was returning from his 60th high school reunion. We struck up a conversation and ended up chatting for the whole time we were in the air.…
— Sahil Bloom (@SahilBloom) July 27, 2026

Only 2 months of creating content on @X led to a job offer directly from the founder of a YC-backed startup and the craziest part is, I never even applied & I don't have any degree yet.

I didn't grow by chasing the algorithm.

I simply started sharing what I was learning & doing… pic.twitter.com/zhVRP81hrx
— Nandinyy (@nandiny77300) July 27, 2026

My former boss.

He funded Android while an exec at Google. https://t.co/pwmzLALpDq
— Robert Scoble (@Scobleizer) July 28, 2026

Sunday, July 26, 2026

Kimi K3 Is Making Waves

Kimi K3: Moonshot AI's 2.8 Trillion-Parameter Open-Weight Frontier Model Shakes Up the AI Landscape

On July 16, 2026, Beijing-based Moonshot AI released Kimi K3, a massive multimodal AI model that quickly captured global attention. As the latest flagship in the Kimi series, it stands as the world's largest announced open-weight model to date, with 2.8 trillion total parameters. It delivers performance that places it among the absolute top tier—trailing only Anthropic's Claude Fable 5 and OpenAI's GPT-5.6 Sol on many independent benchmarks while excelling in key areas like long-horizon coding and agentic tasks.

Background on Moonshot AI and the Kimi Family

Moonshot AI was founded in March 2023 in China. The company launched its first Kimi model in October 2023, notable early on for strong long-context capabilities (initially up to 128K tokens). Subsequent iterations built momentum: the open-weights Kimi K2 arrived in July 2025, followed by variants like K2 Thinking (November 2025), K2.5, and K2.6 (a multimodal model in April 2026).

Kimi K3 represents a major leap forward. It builds on prior versions with architectural innovations while scaling aggressively. The model is already available via the Kimi platform (web, iOS, Android apps), Kimi Work desktop client, Kimi Code, and API. Full model weights were scheduled for release by July 27, 2026, making it accessible for self-hosting, fine-tuning, and community innovation.

Technical Specifications and Architecture

Kimi K3 is a sparse Mixture-of-Experts (MoE) model:

2.8 trillion total parameters, with only 16 out of 896 experts activated per token (roughly 1.8% active). This keeps inference costs manageable despite the scale—closer to a much smaller dense model in practice.
1-million-token context window (1,048,576 tokens), enabling it to handle enormous inputs like entire code repositories, long documents, or extended conversations.
Native multimodal capabilities: Text, image, and video understanding (inputs); text output. It supports visual feedback loops, such as analyzing its own generated interfaces or designs.
Key innovations: Kimi Delta Attention (KDA, a hybrid linear attention mechanism) and Attention Residuals (AttnRes). These improve information flow in deep, long-sequence models and enable up to 6.3x faster decoding at million-token scales. Moonshot reports roughly 2.5x better scaling efficiency than K2.
Reasoning features: Always-on "thinking mode" (max effort at launch), with plans for adjustable levels. Strong tool use, structured outputs, and context caching.

The architecture targets long-horizon tasks—sustained software engineering, complex knowledge work, agentic workflows, and deep reasoning—rather than simple Q&A.

Pricing (API): $3 per million input tokens (non-cached), $0.30 cached, $15 per million output tokens. This is more premium than many prior Chinese models but still competitive with Western frontier offerings.

Performance and Benchmarks

Kimi K3 debuts strongly on independent evaluations:

Artificial Analysis Intelligence Index v4.1: 57.1 (4th overall, behind Claude Fable 5 at 59.9 and GPT-5.6 Sol at 58.9; ahead of Claude Opus 4.8 at 55.7).
GDPval-AA v2 (Elo, economically valuable knowledge work): Around 1,668–1,687 (strong improvement from prior Kimi versions; competitive or ahead of Opus 4.8).

It particularly shines in coding and agentic benchmarks:

#1 on Arena.ai Frontend Code Arena (1,679 points, ahead of Claude Fable 5), topping 6 of 7 domains (e.g., brand/marketing, data/analytics, consumer products).
Strong scores on Terminal-Bench 2.1 (88.3), BrowseComp (91.2), SWE Marathon (42.0, leading in some reports), and others. It often leads or ties in practical software engineering and agentic tasks.

Vendor and independent tests confirm it trails the absolute leaders on broad intelligence but outperforms most competitors (including many proprietary models) in specialized, real-world scenarios. Hallucination rates may be slightly higher than predecessors in some evaluations.

Why Kimi K3 Has Been Making Waves

Several factors explain the excitement and market impact:

Scale + Open Weights at Frontier Level: It is the first open model in the ~3T-parameter class. Releasing weights democratizes access, allowing global developers, researchers, and companies to run, modify, and build on it—potentially turning others' compute into an advantage for Moonshot. This contrasts with closed U.S. leaders and echoes (but surpasses) prior Chinese open models like DeepSeek.
Closes the Gap with U.S. Frontier Models: In a context of U.S. export controls on advanced chips, Kimi K3 demonstrates China's ability to innovate architecturally (MoE sparsity, custom attention) and compete closely on performance. It has sparked "another DeepSeek moment" discussions, with analysts noting an "all-round catch-up."
Practical Strengths and Demand: Exceptional for coding, visual/agentic workflows, and long-context work. Demand overwhelmed Moonshot's capacity, leading to a temporary pause on new subscriptions days after launch.
Geopolitical and Market Ripples: The release coincided with broader AI news, contributing to temporary sell-offs in chip and tech stocks. It highlights shifting dynamics in the global AI race, with implications for valuations, investment, and open vs. closed model strategies. Moonshot reportedly eyes high valuations and potential listing.
Broader Ecosystem Signal: It signals maturing Chinese AI capabilities in efficiency, multimodality, and agentic systems. While not the cheapest option (marking a shift from "super cheap" Chinese models), its pricing and openness could accelerate adoption and innovation.

Limitations and Outlook

Kimi K3 is not perfect. It trails top proprietary models on some general benchmarks, has noted sensitivities (e.g., to thinking history), and inference at full scale requires significant resources. Early feedback praises it as an outstanding pair programmer and agent tool but not fully autonomous for every complex project.

With weights now (or soon) available, the community will likely push its boundaries through fine-tunes, optimizations, and applications. Moonshot continues iterating, and K3 sets a high bar for what open-weight frontier models can achieve.

Kimi K3 is more than just another large model—it exemplifies how architectural ingenuity, strategic openness, and focused capabilities can challenge incumbents. Whether it reshapes the broader AI market or sparks further acceleration in the U.S.-China race, it has undeniably made waves and will influence development for months or years to come.

Kimi K3 vs. DeepSeek V3: A Head-to-Head Comparison (as of July 2026)

Kimi K3 (Moonshot AI, July 2026) and DeepSeek V3 (DeepSeek AI, initial release December 2024, with updates like V3-0324 and later variants) represent two major Chinese open-weight MoE models. Kimi K3 is a newer, much larger frontier challenger, while DeepSeek V3 (and its evolutions) pioneered highly efficient, cost-effective performance.

Key Specifications

Parameters & Architecture:
- Kimi K3: 2.8 trillion total parameters (sparse MoE, 16 of 896 experts active per token, ~estimated 50B active). Uses Kimi Delta Attention (KDA) and Attention Residuals for long-context efficiency.
- DeepSeek V3: 671B total parameters (37B active per token; updates around 685B). Employs Multi-head Latent Attention (MLA) and DeepSeekMoE, with auxiliary-loss-free load balancing and multi-token prediction.
Context Window:
- Kimi K3: 1 million tokens (strong for long-horizon tasks like full repositories).
- DeepSeek V3: 128K tokens (solid but significantly smaller).
Modalities:
- Kimi K3: Native text + image + video understanding.
- DeepSeek V3: Primarily text (no native vision in base versions).
Release & Openness:
- Both open-weights (Kimi K3 weights by ~July 27, 2026; DeepSeek V3 earlier with MIT/permissive licenses).

Performance and Benchmarks

Kimi K3 operates at a higher overall capability level, especially in frontier evaluations, while DeepSeek V3 remains strong in efficiency-focused scenarios.

Overall Intelligence (e.g., Artificial Analysis Intelligence Index):
- Kimi K3: ~57 (4th overall, ahead of many closed models like Claude Opus 4.8; competitive with top proprietary).
- DeepSeek V3: Lower (around 14–30 range in early evaluations; later variants improved but still trail K3). Kimi K3 shows clear superiority on shared benchmarks like GPQA Diamond (93.5% vs. ~59% for V3).
Coding & Agentic:
- Kimi K3 excels in long-horizon/agentic coding: #1 on Arena.ai Frontend Code Arena, strong on Terminal-Bench 2.1 (88.3), SWE Marathon (42.0), BrowseComp (91.2), and FrontierSWE. Designed for sustained engineering projects with visual feedback.
- DeepSeek V3 (and coder lineage) is excellent for code generation, math, and competition benchmarks (e.g., strong LiveCodeBench, SWE-bench in variants). It is a proven, efficient workhorse but lags K3 on advanced agentic/long-context tasks.
Other Areas:
- Kimi K3 leads in multimodal, long-context knowledge work, and many agentic benchmarks (e.g., Automation Bench).
- DeepSeek V3 shines in math/reasoning efficiency and multilingual (especially Chinese) tasks, with very stable training.

Kimi K3 ranks much higher on aggregate leaderboards (e.g., top 5 vs. DeepSeek V3 in the 100s+ range in some July 2026 evaluations).

Pricing and Efficiency

Kimi K3 API: $3 / $15 per million input/output tokens ($0.30 cached). More premium, reflecting frontier positioning.
DeepSeek V3: Significantly cheaper (e.g., ~$0.27–0.50 input / $0.42–1.10 output in various offerings; often sub-$1 blended). Excellent value for high-volume or cost-sensitive use.

Inference Efficiency: Both leverage MoE for strong speed relative to dense models. DeepSeek V3 emphasizes low training/inference costs (e.g., ~60 tokens/sec in early reports); Kimi K3's sparsity and attention innovations support fast decoding at million-token scales.

Strengths and Use Cases

Choose Kimi K3 if you need:

Frontier-level performance on complex, long-horizon, or agentic tasks.
Native vision/multimodal.
Massive context (e.g., whole codebases + visuals).
Cutting-edge coding with iteration and feedback.

Choose DeepSeek V3 (or variants) if you need:

Best-in-class cost-efficiency for high-volume coding, math, or general tasks.
Proven reliability in open-source ecosystems.
Strong Chinese/multilingual performance without paying frontier premiums.

Summary

Kimi K3 is the more powerful, newer model—pushing open-weight boundaries closer to (or matching aspects of) 2026 proprietary frontiers like Claude Fable 5 or GPT-5.6 Sol, especially in practical agentic and long-context scenarios. DeepSeek V3 remains a landmark for accessibility and efficiency, democratizing strong AI capabilities at low cost.

For many developers, the choice depends on budget vs. capability needs: DeepSeek for scale and affordability, Kimi K3 for maximum performance on demanding workloads. As both are open-weight, the community will continue to fine-tune and optimize them. Later DeepSeek variants (e.g., V3.2) narrowed some gaps, but Kimi K3's scale and timing give it the edge in mid-2026 evaluations.

Kimi K3 vs. Claude Opus 4.8: A 2026 Frontier Comparison

Kimi K3 (Moonshot AI, released July 16, 2026) is a 2.8T-parameter open-weight MoE model that positions itself as a strong challenger to leading closed models. Claude Opus 4.8 (Anthropic, released around May 2026) is a high-end proprietary model known for strong reasoning, safety, and reliability.

Kimi K3 is newer, larger in scale (though MoE sparsity keeps active parameters lower), open-weight, and competitive or superior in several agentic/coding areas. Claude Opus 4.8 offers polished enterprise features, strong knowledge/presentation, and proven production reliability.

Specifications

Parameters & Architecture:
- Kimi K3: 2.8 trillion total (16 of 896 experts active; sparse MoE). Innovations include Kimi Delta Attention and Attention Residuals for long-context efficiency.
- Claude Opus 4.8: Undisclosed (dense or hybrid; prior Opus models were high-capability but not as explicitly massive-MoE). Focuses on constitutional AI/safety.
Context Window:
- Both support ~1 million tokens (Kimi K3 edges with 1.048M). Practical output limits may vary (Opus has published ~128K output in some contexts).
Modalities:
- Kimi K3: Native text + image + video input.
- Claude Opus 4.8: Strong vision/multimodal support (Anthropic's Claude family excels here).
Availability:
- Kimi K3: API available now; full open weights by ~July 27, 2026 (self-hosting/fine-tuning possible).
- Claude Opus 4.8: API-only (proprietary, with enterprise controls and safety features).

Performance and Benchmarks

On aggregate intelligence, they are very close, with Kimi K3 often edging ahead in independent evaluations while shining in specific practical tasks.

Artificial Analysis Intelligence Index:
- Kimi K3: ~57 (4th overall; ahead of Opus 4.8).
- Claude Opus 4.8: ~55–56.
Agentic & Coding(Kimi K3's strength):
- Kimi K3 leads on several: Terminal-Bench 2.1 (88.3 vs. ~84.6), SWE Marathon (42.0 vs. ~40 or lower for Opus), BrowseComp (91.2 vs. ~84), Automation Bench, and Frontend Code Arena (#1). It excels in long-horizon agentic work (e.g., AA-Briefcase Elo second only to Fable 5, ahead of Opus).
- Claude Opus 4.8 is competitive and sometimes stronger in verified production coding (e.g., certain SWE-bench variants) or balanced agentic tasks with safety. It performs well on knowledge-heavy or presentation-focused work.
Knowledge & Reasoning:
- Claude Opus 4.8 often edges in general knowledge, factuality, or areas requiring careful judgment/hallucination control (Anthropic's strength).
- Kimi K3 is comparable or ahead on GPQA Diamond and other reasoning benchmarks but may have slightly higher hallucination rates in some reports.

Kimi K3 frequently beats or ties Opus 4.8 on Moonshot's and independent agentic/coding suites, while trailing top models like Claude Fable 5 overall.

Pricing and Practicality

Kimi K3: $3 input / $0.30 cached / $15 output per million tokens. Cheaper than Opus and competitive for frontier performance.
Claude Opus 4.8: Higher (~$5 input / $25 output per million; varies with tiers). Enterprise plans add compliance features.

Efficiency: Both handle long contexts well. Kimi K3's MoE design aids cost at scale; Claude emphasizes controllable reasoning effort.

Strengths and Ideal Use Cases

Kimi K3 advantages:

Superior on many agentic/long-horizon coding tasks.
Multimodal native + massive context.
Open weights (future customization).
Better price/performance for high-volume or developer use.

Claude Opus 4.8 advantages:

Polished safety, low hallucination, and judgment (ideal for sensitive/enterprise work).
Strong in knowledge presentation and balanced reasoning.
Mature ecosystem with Anthropic's reliability tools.

Verdict

In mid-2026, Kimi K3 is a strong peer or slight leader over Claude Opus 4.8 on many capability benchmarks (especially coding/agentic), at a lower price, with the bonus of openness. It narrows the gap with Western frontier models effectively.

Claude Opus 4.8 remains preferable for applications prioritizing safety, compliance, or refined output quality. Test both on your specific workloads—Kimi K3's open weights (post-July 27) make experimentation easier. The choice often comes down to priorities: raw frontier performance/value (Kimi K3) vs. enterprise polish (Claude Opus).

Kimi K3 vs. Claude Fable 5: 2026 Frontier Showdown

Claude Fable 5 (Anthropic) is the current top proprietary model, leading most aggregate benchmarks with exceptional reasoning, safety, and balanced performance. Kimi K3 (Moonshot AI, released July 16, 2026) is a 2.8T-parameter open-weight MoE model that narrows the gap significantly, often matching or beating Fable 5 in specific agentic and coding tasks while offering lower cost and openness.

Kimi K3 trails overall but delivers impressive value and leads in targeted areas, especially for developers and open ecosystems.

Key Specifications

Parameters & Architecture:
- Kimi K3: 2.8 trillion total parameters (sparse MoE, 16 of 896 experts active). Features Kimi Delta Attention and Attention Residuals for efficient long-context handling.
- Claude Fable 5: Undisclosed size (frontier-scale, likely dense/hybrid with advanced reasoning optimizations). Emphasizes constitutional AI for safety and controllability.
Context Window: Both ~1 million tokens (Kimi K3 at 1.048M). Excellent for long-horizon work.
Modalities:
- Kimi K3: Native text + image + video.
- Claude Fable 5: Strong multimodal (vision) capabilities with high reliability.
Availability:
- Kimi K3: API now; full open weights ~July 27, 2026.
- Claude Fable 5: API-only (proprietary, with enterprise safety features).

Performance and Benchmarks

Claude Fable 5 holds the overall lead, but the gap is small, and Kimi K3 wins several practical categories.

Artificial Analysis Intelligence Index:
- Claude Fable 5: ~59–60 (top or near-top).
- Kimi K3: 57 (3rd/4th overall; strong for an open model).
Agentic & Knowledge Work (Mixed results):
- Fable 5 leads on GDPval-AA v2 (1,760 Elo vs. Kimi’s 1,668), AA-Briefcase (higher Elo), and some broad agentic tasks.
- Kimi K3 wins or ties on Automation Bench, BrowseComp (91.2 vs. 88.0), Terminal-Bench 2.1 (88.3 vs. 84.6), and others. It ranks 2nd on AA-Briefcase overall.
Coding & Software Engineering (Kimi K3 shines):
- Kimi K3 #1 on Arena.ai Frontend Code Arena (1,679 points, ahead of Fable 5; tops 6/7 domains). Strong on SWE Marathon (42.0 vs. 35.0) and Program Bench.
- Fable 5 leads on FrontierSWE and some DeepSWE variants; highly reliable in production coding.

Kimi K3 often beats or closely trails Fable 5 on agentic coding/long-horizon tasks but lags slightly on general intelligence, presentation quality, and some knowledge benchmarks. Real-world routing (using both) can achieve high combined performance.

Pricing and Efficiency

Kimi K3: $3 input / $0.30 cached / $15 output per million tokens. Significantly cheaper.
Claude Fable 5: Much higher (~$10–50 output range; premium pricing).

Kimi K3 offers better cost per task in many agentic scenarios (sometimes 2–3x more efficient value), though it may use more tokens/turns. Fable 5 is faster/more optimized in some deployments.

Strengths and Use Cases

Claude Fable 5 advantages:

Highest overall capability and reliability.
Superior safety, low hallucination, and judgment (enterprise-grade).
Strong across broad intelligence, presentation, and complex reasoning.

Kimi K3 advantages:

Excellent (sometimes leading) in frontend/agentic coding and specific long-horizon tasks.
Multimodal native + open weights for customization/self-hosting.
Dramatically better price/performance; accessible frontier capabilities.

Verdict

Claude Fable 5 remains the stronger overall model in mid-2026, particularly for general intelligence, safety-critical, or polished outputs. However, Kimi K3 is remarkably close—often superior in coding/agentic niches—and provides outstanding value as an open-weight option.

For many developers, researchers, or cost-sensitive teams, Kimi K3 (especially post-weights release) is the practical winner or strong complement. Test on your workflows: Kimi excels in sustained coding/visual/agent loops, while Fable 5 sets the ceiling for balanced, trustworthy performance. The rapid progress from Chinese open models like Kimi K3 is compressing the frontier gap.

Kimi K3 Agentic Coding Benchmarks: Strengths, Comparisons, and Implications (July 2026)

Kimi K3 stands out for agentic coding—tasks requiring sustained tool use, multi-step reasoning, repository navigation, terminal interaction, web browsing, and long-horizon project completion. Its 1M-token context, native vision, always-on reasoning (max effort at launch), and architectural innovations (Kimi Delta Attention + Attention Residuals) support these workloads effectively.

Key Agentic Coding Benchmarks

Here are the main ones where Kimi K3 shows strong results (often using "max" reasoning effort):

Terminal-Bench 2.1: 88.3% — Near or tied for top (vs. GPT-5.6 Sol ~88.8%, Claude Fable 5 84.6%, Opus 4.8 84.6%). Tests command-line tool use, debugging, and system tasks.
SWE Marathon: 42.0% — Strong lead (vs. Claude Opus 4.8 ~40.0, GPT-5.6 Sol 39.0, Fable 5 35.0). Measures sustained software engineering over long sessions with large codebases. Kimi K3 excels in endurance and iteration.
Program Bench: 77.8% — Leads or ties top models (vs. GPT-5.6 Sol 77.6, Fable 5 76.8). General program construction and problem-solving.
Frontend Code Arena (Arena.ai / LMArena): #1 with 1,679 Elo (ahead of Fable 5 ~1,631). Tops 6 of 7 domains (e.g., brand/marketing, data/analytics, consumer products). Blind human preference for generated interfaces.
BrowseComp: 91.2% — Leads (vs. GPT-5.6 Sol 90.4, Fable 5 88.0). Agentic web research and information gathering.
Automation Bench: 30.8% — Leads narrow (vs. GPT-5.6 Sol 29.7). SaaS workflow automation.
FrontierSWE / DeepSWE: Trails leaders (81.2% vs. Fable 5 86.6 on FrontierSWE; 67.5 vs. 70–73 on DeepSWE). These test deep repo understanding and complex engineering.
Other: Strong on SpreadsheetBench, OmniDocBench, and internal Kimi Code Bench. Artificial Analysis Coding Agent Index: ~57 (joint #5, ahead of Opus 4.8).

How Kimi K3 Performs in Context

Kimi K3 shines in long-horizon, iterative, and tool-heavy agentic workflows (e.g., terminal ops, sustained coding, frontend generation, browsing+automation). Its massive context and sparsity help maintain coherence over extended tasks.

It is competitive with (or beats) top closed models like Claude Fable 5 and GPT-5.6 Sol in several practical areas but trails on the absolute hardest repo-level tasks (DeepSWE/FrontierSWE). Independent tests (e.g., Artificial Analysis) broadly confirm vendor claims, with Kimi K3 ranking high among open models and in the frontier tier overall.

Cost Efficiency: Often 2–3x better value than Fable 5 or similar (e.g., ~$3–4 per task vs. higher for closed models), making it attractive for scaling agents.

Caveats: Some scores are vendor-reported (weights release enables more verification). It can be verbose (higher token use) and may have slightly elevated hallucination in some evals. Real-world results depend on scaffolding, tools, and prompting.

Why It Matters

Kimi K3 demonstrates that open-weight models can reach (or exceed) proprietary performance in key agentic coding niches, especially long-running and visual/terminal tasks. Its Frontend Code lead and SWE Marathon strength make it particularly relevant for developers building agents or UIs. Combined with openness and cost advantages, it accelerates experimentation and deployment in agentic systems.

For production, many teams route between Kimi K3 (for volume/specific strengths) and top closed models like Fable 5 (for peak reliability). As weights become available, community fine-tunes and optimizations will likely push its agentic capabilities further.

Kimi K3 doesn't dominate every benchmark but carves out a strong position in the agentic coding frontier, making it one of the most exciting releases of 2026 for practical AI engineering.

26: Kimi K3

For all the young founders who told me two or three years ago that they had to drop out of school to start companies at that moment or it would be too late: https://t.co/ottoNZVptI
— Paul Graham (@paulg) July 25, 2026

Quagmire of the Vanities Why the world economy is still at risk from the Iran debacle

The Crash of 2026 I hope I'm being unnecessarily alarmist

Elon Musk, the New Hero of Populism The useful narcissism of the new class of oligarchs

26: Iran

The early projects I did were writing software to automate the control and calibration of superconducting magnets for the Tevatron. Then I got really into piezo actuators- for compensating Lorentz force in pulsed accelerators like the ILC (so that the beam stays focused). Did a… https://t.co/dRNkwSN2pv pic.twitter.com/jpFRdV0xfm
— Sampriti Bhattacharyya (@sampritibh) July 24, 2026

View this post on Instagram

A post shared by Tomer Amsallem (@tomer.amsallem)

China goes behind Putin's back to plot with Russian elite “China has a truly remarkable chance of transforming Russia into something like a giant Laos or Pakistan. Russia is becoming increasingly dependent on China to avoid being completely left behind and isolated on the global political scene, as well as being far from modern technological trends,” Gabuyev said. .......... Military ties are also deepening in the shadows. An investigation by Der Spiegel, Insider, and Le Monde uncovered a vast exchange of battlefield data......... Beijing supplies the Kremlin with missing electronics. In return, Russia shares real world tactics about fighting Western weapons......... The Chinese embassy in Berlin denied the accusations to Deutsche Welle. Officials called the reports “slander” and stated that “these claims have no factual basis.”

OpenAI says AI acted on its own in an 'unprecedented' hack of another company OpenAI said its AI used stolen credentials and discovered a previously unknown vulnerability to access Hugging Face servers......... It went to “extreme lengths to achieve a rather narrow testing goal” and “found ways to gain access to secret information that it could use to cheat the evaluation,” the company said.

One way you can recognize AI slop is when the ideas and the diction don't match — when the ideas are completely ordinary, but the diction is that of someone announcing a brilliant discovery that they're really excited about.
— Paul Graham (@paulg) July 22, 2026

When software was expensive - thin, horizontal, best-of-breed software stacks extracted rents across every business.

Now that software is cheap - value moves to vertically integrated businesses that deliver opinionated end-to-end experiences.
— Naval (@naval) July 22, 2026

Wanted to take a moment to thank someone who gave @SkyrootA its first real lift-off—Mukesh Bansal.

8 Years ago, we sent him a cold LinkedIn message.

He replied.

We met that very weekend.

Soon after, he became Skyroot’s first investor with a ₹10 crore cheque.

At the time,… pic.twitter.com/4SGY5SzJDJ
— Pawan (@PawanKChandana) July 23, 2026

In entrepreneurship, you need to have grit because things will be challenging and most of the time they won't go your way.
- Via Mark Zuckerberg#Entrepreneurship #Business #Entrepreneurs #Founders #Startups pic.twitter.com/vbfZO4pRuc
— Jonathan Aufray (@jonathan_aufray) July 21, 2026

At my Homeowners Clinic today, one constituent shared what it meant to finally feel heard and hopeful.

This is what these events are about: connecting homeowners with real help and a path forward.

The work continues—because serving people has always been more than a job. It is… pic.twitter.com/3pOvdbnsr4
— Jenifer Rajkumar (@JeniferRajkumar) July 21, 2026

A passion is something you love doing. A purpose is something you love doing that benefits the world.
— Peter H. Diamandis, MD (@PeterDiamandis) July 21, 2026

Today we launch the next chapter of Gushwork.

Over the last 30 years, we saw commerce moving online -

But what actually moved online were low value retail products: books, clothing, electronics.

Anything high value, from $5,000 to $1M - a hydraulic crane, an industrial roofing… pic.twitter.com/NRqIW1eZ8c
— Nayrhit B (@NayrhitB) July 21, 2026

Isn't that a separate standalone company?
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

hot take.

being early is overrated.

being early with distribution is underrated.

i've seen great products disappear because nobody knew they existed.

i've seen average products grow because the founders obsessed over getting in front of customers every single week.

product…
— Sridhar A (@sridharfyi) July 21, 2026

Everyone is now a programmer, because the present and future programming language is your natural language. If you can write or speak, you can build. This is the most important evolution in our field since its creation. And it's a tsunami that has *just* started.
— Guillermo Rauch (@rauchg) July 21, 2026

America’s edge over China is open markets, permissionless innovation, and attracting the world’s best talent.
China’s edge is long-term coordination and massive state-led execution.
Banning open-weight AI means abandoning America’s strength to play China’s game.
We will lose that…
— Xiaoyin Qu (@quxiaoyin) July 21, 2026

How I used to break up tweets to make them hard for mobs to attack at the peak of the woke era. Strange to think this was only 6 years ago. https://t.co/z4O8tD0R6k
— Paul Graham (@paulg) July 21, 2026

AI painting of the day.

Created with ChatGPT Image 2.0 pic.twitter.com/jgA3Yfixl2
— Derya Unutmaz, MD (@DeryaTR_) July 21, 2026

RIP
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

I have been doing the written version of that for a while now. https://t.co/4onfo6URsP
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

I have been doing the written version of that for a while now.
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

One pattern I find useful for working with LLMs is a nice long ramble session. Sometimes the LLM needs more bits to understand what you're trying to achieve, but you're too lazy to type them. In these cases I like to lean back, switch to /voice and just ramble for like 10…
— Andrej Karpathy (@karpathy) July 21, 2026

You should hire me to create a Grand Solara Vision (TM) for your company.
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

Me at 20. In my natural environment.
—Always be building 🔩🛠️👩🏻‍💻🔭🌌#fermilab pic.twitter.com/fjJTTkSK9q
— Sampriti Bhattacharyya (@sampritibh) July 21, 2026

"Love you, man!" works.

But "I love you" does not seem to work for men. Weird. Inexplicable.
— Paramendra Kumar Bhagat (@paramendra) July 21, 2026

How Toyota has put every automaker on notice with its 745-mile solid-state battery

Iran may be stronger than we thought According to assessments reported in June, Iran still has access to most of its missile sites and around 70 percent of its prewar stockpile............ “We do not have enough [munitions] to safely sustain operations, and I don’t think the White House is aware of that,” one US official told the Washington Post this week. .......... the proliferation of one-way attack drones like Iran’s Shaheds have shifted the balance, giving weaker adversaries the ability to target US bases at low cost. The attack that killed the two servicemembers on Friday was, like many of Iran’s attacks in this war, a combined missile and drone barrage intended to overwhelm the facility’s defenses. The best-case scenario from these strikes, from a US perspective, is that they are intercepted, albeit at the cost of draining valuable air defense resources. In the worst case, US soldiers are killed. .......... “air superiority is something that we no longer have all the time” due to the proliferation of low-cost drones, which he compared to the improvised explosive devices, many planted by Iranian proxies, that killed hundreds of US troops during the war in Iraq. ........... The hard logic is that just as high fuel prices and declining poll numbers have not yet been enough to deter Trump from continuing to fight Iran, a few American casualties are not enough to be a deterrent to military action. In fact, as this weekend’s retaliatory strikes against Tabriz showed, they may only pull the US deeper into the war. .............. As long as Trump avoids a ground invasion, Iran’s ability to target and kill US troops will remain limited. But as this past weekend’s events showed, that ability exists, and there is only so much the US can do to prevent it. The risk of an even larger catastrophe for American troops in the Middle East will only grow the longer this war goes on.

AI company hit by hack entirely done by artificial intelligence
NYC mayor Zohran Mamdani doubles down on threat to arrest Israeli Prime Minister Benjamin Netanyahu
Hawks around Putin are now demanding strikes on NATO itself — and a green light for 'limited' nuclear weapons

View this post on Instagram

A post shared by Tomer Amsallem (@tomer.amsallem)

Awesome to have Jensen on X! https://t.co/QVJhqDljo3
— Gavin Baker (@GavinSBaker) July 24, 2026

For my first post, I’m sharing a letter @NVIDIA signed on why open models matter.

AI will transform every industry, power every company, and be built by every country.

Open models strengthen safety and cybersecurity, accelerate innovation and diffusion, and enable sovereignty.… pic.twitter.com/t02bi51N4C
— Jensen Huang (@JensenHuang) July 24, 2026

30% chance Jensen Huang is the world's "second" trillionaire 👀 pic.twitter.com/5k2bAwPxN3
— Kalshi (@Kalshi) July 24, 2026

lmao even i gave up on understanding myself and life has been happy ever since https://t.co/N4JJ12mxQI
— Parmita Mishra 🧬 🇺🇸 🚀 💪🏼 (@parmita) July 24, 2026

This has my full support.

Jensen is right. https://t.co/FubUPT6DVJ
— Elon Musk (@elonmusk) July 24, 2026

America's tech leadership has always come from ecosystems where the best ideas compete.

AI is no different. Open-weight models expand access, speed innovation, and give organizations real control, and long-term leadership requires choice across the full ecosystem.

Dell is… pic.twitter.com/Tu50ECWfvl
— Michael Dell 🇺🇸 (@MichaelDell) July 24, 2026

Biology is stuck in a tradeoff that semiconductors escaped decades ago: we constantly "dumb down" data richness (like using flat petri dishes) just to "amp up" our throughput (getting more rows of data). But semis broke this exact SAME bottleneck! It is not an unfamiliar problem.… pic.twitter.com/QcaB1TudwX
— Parmita Mishra 🧬 🇺🇸 🚀 💪🏼 (@parmita) July 24, 2026

Friday, July 24, 2026

Himalayan Compute: Picks and Shovels

Everyone is talking about AI.

Almost nobody is talking about the commodity that actually powers AI.

That commodity is compute.

Not GPUs. Not chatbots. Not apps.

Compute.

It may become the oil of the 21st century. 🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

People ask if we're in an AI bubble.

Wrong question.

The Internet wasn't a bubble.https://t.co/DMWZpJewfl was.

Likewise, some AI companies will fail.

But compute isn't a bubble. It's infrastructure. 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

Think of the AI stack.

Top layer:
• OpenAI
• Google
• Anthropic
• Microsoft
• xAI
Bottom layer:
• NVIDIA and AI chips
Between them?
Electricity flowing through GPUs becomes compute.
That's the product. 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

Selling electricity is like selling apples.

Turning that electricity into compute is like selling premium apple sauce.

The same raw material.

Far more value added.

The AI age changes the economics of renewable energy. 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

We're also thinking differently about ownership.

10% equity for the Government of Nepal to align incentives and accelerate execution.

10% equity for an independent foundation dedicated to reducing poverty through direct cash transfers. 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

Himalayan Compute: Why Compute Is the Most Valuable Commodity of the AI Age https://t.co/xzjngatjsE 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

The biggest opportunities in AI may not be building the next chatbot.

They may be building the infrastructure that powers every chatbot.

The companies that supply compute could become the railroads, utilities, and energy giants of the AI economy. 👆🗻👇
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

That's the thesis behind Himalayan Compute. 🗻👆
— Paramendra Kumar Bhagat (@paramendra) July 24, 2026

Pages

Tuesday, July 28, 2026

28: Kimi K3

Sunday, July 26, 2026

Kimi K3 Is Making Waves

Background on Moonshot AI and the Kimi Family

Technical Specifications and Architecture

Performance and Benchmarks

Why Kimi K3 Has Been Making Waves

Limitations and Outlook

Key Specifications

Performance and Benchmarks

Pricing and Efficiency

Strengths and Use Cases

Summary

Specifications

Performance and Benchmarks

Pricing and Practicality

Strengths and Ideal Use Cases

Verdict

Key Specifications

Performance and Benchmarks

Pricing and Efficiency

Strengths and Use Cases

Verdict

Key Agentic Coding Benchmarks

How Kimi K3 Performs in Context

Why It Matters

26: Kimi K3

26: Iran

Friday, July 24, 2026

Himalayan Compute: Picks and Shovels