Adversarial AI Research — Live Tracker
A living research document. Two adversarial rounds across three frontier AI models. Updated continuously as new evidence arrives.
Current Confidence
This report tested a specific thesis: that AI's recursive self-improvement loop is already compressing economic disruption timelines beyond historical precedent. The strongest counter-arguments were tested and retracted by the models that originally proposed them. What follows is what survived.
Anthropic's release gaps: 335 → 186 → 73 → 70 → 42 days. Opus 4.8 shipped May 28. Mythos (tier above Opus) is weeks from public release. OpenAI shipped GPT-5.5 on April 23. DeepSeek V4 on April 24. Gemini 3.5 Flash on May 19. Google I/O showed Gemini 3.5 Flash building an OS from scratch. Ten model launches in 22 days during May alone.
90% of Claude Code is self-written. Cowork built by Claude Code in two weeks. 16 agents built a 100K-line C compiler. Perplexity Computer completed 3.25 years of enterprise work in 4 weeks. OpenClaw: 196K GitHub stars, built by one person using 4-10 AI agents (6,600 commits/month). Creator acqui-hired by OpenAI.
Jevons Paradox: "Cheaper services increase demand." Gemini proposed, stress-tested, retracted. AI is an outcome machine, not a task tool.
Abstraction Upward: "Workers move to higher roles." Gemini: automating entry level severs the ladder to senior roles.
HSB (Munich Re): AI liability insurance, March 2026. EU PLD: AI as "products" with strict liability, December 2026. Full decoupling projected late 2027.
Goldman Sachs: ~3% lower earnings, 10pp less growth over a decade. New-grad hiring down 50%. 60%+ of displaced tech workers moved outside tech. Manufacturing parallel: persistent scarring for a decade.
72+ governments on DeepSeek. 1% to 15% market share in 12 months. V4 on Huawei chips. 98% cheaper than Opus 4.7. Deploy first, govern later.
Anthropic alone: 5 GW (Amazon) + 5 GW (Google/Broadcom) + $30B Azure + $50B Fluidstack + 300 MW SpaceX Colossus. Exploring orbital AI compute. Combined Big Tech AI capex: $600B+. ByteDance: $70B. xAI: 1.5 GW cluster.
Mythos: first to complete UK AISI 32-step attack simulation. GPT-5.5 shipped comparable capability 16 days later. "Mythos-like hacking, open to all." Containment lasted a fortnight.
Codebases legible to AI. Build costs collapsing. SaaS growth compression = valuation haircuts. For K-6: adaptability, judgement, AI fluency over vocational training. If entry-level roles disappear, career paths lose their bottom rungs.
Each item below has a specific trigger. When evidence arrives, update the status and add an entry to the Update Log.
Scoring: 7+ confirmed = 9.5 · 5-6 confirmed = 9 · 3-4 = 8.5 (hold) · METR flattens + layoffs plateau = drop to 7
Newest entries first. Each entry cites a source.
This analysis was developed in conversation with Claude, made by Anthropic. Anthropic benefits from the narrative that AI is powerful, fast-moving, and potentially dangerous. This justifies Anthropic's safety-focused market positioning, premium pricing, and support for regulation that favours large, well-funded labs over open-source competitors.
Every alarming conclusion in this analysis serves Anthropic's commercial interests. The critic model (Gemini, by Google) explicitly flagged this: framing the "governance gap" as the central risk encourages regulatory frameworks that lock out smaller competitors.
The evidence has been verified against external sources including Reuters, METR, IMF, NBER, Goldman Sachs, Fortune, the Australian Financial Review, UK AISI, Foreign Policy, Wolters Kluwer, Clifford Chance, McKinsey, Deloitte, BCG, Stanford HAI, TechCrunch, VentureBeat, CBS News, the ABS, and Jobs and Skills Australia. The strongest counter-arguments (Jevons Paradox, abstraction upward) were retracted by the model that proposed them.
However: verification of evidence does not neutralise framing bias. The selection of which evidence to foreground, which gaps to emphasise, and which narrative to construct inevitably reflects the system it was built inside. Read accordingly.
This document is not financial, legal, investment, or professional advice. It is a research exercise. Do not make employment, investment, education, or policy decisions based solely on this analysis. Independently verify all claims before acting.
The authors are not economists, labour market specialists, actuaries, or policy professionals. This analysis was produced by AI models (Claude/Anthropic, ChatGPT/OpenAI, Gemini/Google) in conversation with a non-specialist human researcher. AI models can be confidently wrong, can hallucinate sources, and can systematically bias toward narratives that serve their creators' interests.
Forward-looking claims about job displacement, professional services compression ratios, and economic restructuring timelines are projections based on early-stage evidence, not established facts. Historical parallels (manufacturing displacement, China Shock) are illustrative, not predictive. Past technology transitions are not reliable guides to future outcomes.
The "confidence score" in this document is a subjective assessment by an AI model evaluating its own analysis. It should not be interpreted as a statistical probability or an objective measure of accuracy.