DISCLAIMER: This is a research exercise developed in conversation with AI models (Claude, ChatGPT, Gemini). It is not financial, legal, investment, or professional advice. The authors have conflicts of interest (see full disclosure). All claims should be independently verified. Do not make decisions based solely on this document.

Adversarial AI Research — Live Tracker

AGI & Implications

A living research document. Two adversarial rounds across three frontier AI models. Updated continuously as new evidence arrives.

Claude × ChatGPT × Gemini Last updated: May 29, 2026 Confidence: 8.5/10

Current Confidence

8.5 / 10
SpeculativeDirectionally confirmedOperationally certain
255+
Model releases
Q1 2026
12+
Labs shipping
frontier
42
Days: latest
release gap
89
Days: METR
doubling
1/100
DeepSeek Flash
vs Opus 4.7
100K+
Tech layoffs
2026 YTD
The Report

What survived adversarial review

This report tested a specific thesis: that AI's recursive self-improvement loop is already compressing economic disruption timelines beyond historical precedent. The strongest counter-arguments were tested and retracted by the models that originally proposed them. What follows is what survived.

The Cadence

Anthropic's release gaps: 335 → 186 → 73 → 70 → 42 days. Opus 4.8 shipped May 28. Mythos (tier above Opus) is weeks from public release. OpenAI shipped GPT-5.5 on April 23. DeepSeek V4 on April 24. Gemini 3.5 Flash on May 19. Google I/O showed Gemini 3.5 Flash building an OS from scratch. Ten model launches in 22 days during May alone.

The Recursive Loop

90% of Claude Code is self-written. Cowork built by Claude Code in two weeks. 16 agents built a 100K-line C compiler. Perplexity Computer completed 3.25 years of enterprise work in 4 weeks. OpenClaw: 196K GitHub stars, built by one person using 4-10 AI agents (6,600 commits/month). Creator acqui-hired by OpenAI.

The Compression Canaries

PwC Australia

40 → 6
18 months to 90 days. 18 AI agents. 4,000 lines rewritten in 4 hours. Big Four shed 500 partners, 9,100 staff. AFR, Apr 17.

Klarna

7,000 → 2,000
$250M loss to profit. 3.5:1 compression heading lower.

Medvi

$401M / 2 people
Competitor: $2.4B, 2,442 employees.

Layoffs

100K+ YTD
Meta 8K, Microsoft 8K, PayPal 20%, Intuit 17%, Cisco 4K. 50K explicitly AI-linked.

Counter-Arguments That Fell

Jevons Paradox: "Cheaper services increase demand." Gemini proposed, stress-tested, retracted. AI is an outcome machine, not a task tool.

Abstraction Upward: "Workers move to higher roles." Gemini: automating entry level severs the ladder to senior roles.

"Abstraction upward is the most comfortable outcome, not the most likely." — Gemini

The Liability Gap

HSB (Munich Re): AI liability insurance, March 2026. EU PLD: AI as "products" with strict liability, December 2026. Full decoupling projected late 2027.

The Deskilling Pattern

Goldman Sachs: ~3% lower earnings, 10pp less growth over a decade. New-grad hiring down 50%. 60%+ of displaced tech workers moved outside tech. Manufacturing parallel: persistent scarring for a decade.

The China Pressure

72+ governments on DeepSeek. 1% to 15% market share in 12 months. V4 on Huawei chips. 98% cheaper than Opus 4.7. Deploy first, govern later.

The Infrastructure Arms Race

Anthropic alone: 5 GW (Amazon) + 5 GW (Google/Broadcom) + $30B Azure + $50B Fluidstack + 300 MW SpaceX Colossus. Exploring orbital AI compute. Combined Big Tech AI capex: $600B+. ByteDance: $70B. xAI: 1.5 GW cluster.

Cybersecurity Convergence

Mythos: first to complete UK AISI 32-step attack simulation. GPT-5.5 shipped comparable capability 16 days later. "Mythos-like hacking, open to all." Containment lasted a fortnight.

Software & Education

Codebases legible to AI. Build costs collapsing. SaaS growth compression = valuation haircuts. For K-6: adaptability, judgement, AI fluency over vocational training. If entry-level roles disappear, career paths lose their bottom rungs.

Honest Gaps

Propagation. 80% of firms still see zero impact. Frontier vs majority gap is enormous.
Timeline. 3 to 15 years. Anyone claiming precision is selling something.
Jagged frontier. Coding: nearly solved. Creative work: still uneven.
Open source. Chinese models at 15% share, 98% cheaper. Meta going proprietary. Unresolved.
Live Monitoring

What we're watching by June 30, 2026

Each item below has a specific trigger. When evidence arrives, update the status and add an entry to the Update Log.

Confirmed
Model cadence <42 daysOpus 4.7 (Apr 16) → Opus 4.8 (May 28) = 42 days. Watching for sub-42 next gap or Mythos public release.
Watching
METR TH1.2 updateTH1.1 was January 2026. Quarterly update expected. If 88.6-day doubling holds or compresses, acceleration confirmed.
Confirmed
Monthly layoffs >20KEvery month of 2026 except April exceeded 20K. 100K+ YTD. 50K explicitly AI-linked. PayPal, Intuit, Cisco all in May.
Watching
Enterprise adoption >50%McKinsey 2025: 39% report EBIT impact. Waiting for mid-2026 update. Three of four Big Four now on Claude.
Watching
DeepSeek pricing drops further950 new supernodes coming online. Watching for sub-$0.10/M token pricing.
Watching
New AI insurance productsHSB launched March 2026. EU PLD December 2026. Watching for Australian or UK AI liability frameworks.
Watching
China market share updateWas 15% in January. Watching for Q2 data and Huawei chip benchmarks.
Watching
More Medvi-pattern companiesSub-5-person companies at $100M+ revenue. If 3+ emerge, compression thesis upgrades from outlier to pattern.
Watching
Agentic product adoption dataPerplexity Personal Computer (Apr 16), Gemini Spark (May 19). Watching for usage numbers.

Scoring: 7+ confirmed = 9.5 · 5-6 confirmed = 9 · 3-4 = 8.5 (hold) · METR flattens + layoffs plateau = drop to 7

Update Log

Chronological evidence trail

Newest entries first. Each entry cites a source.

2026-05-29
Initial publication. Report compiled from two rounds of adversarial review (Claude, ChatGPT, Gemini) plus additional research through May 29. Confidence: 8.5/10. Nine monitoring items set for June 30 review.
2026-05-28
Opus 4.8 released. 42-day gap from Opus 4.7 (Apr 16). First model to complete every case on Super-Agent benchmark. Highest score on Legal Agent Benchmark. Fast mode 3x cheaper. Source: anthropic.com/news/claude-opus-4-8
2026-05-24
Tech layoffs pass 100,000 in 2026. Layoffs exceeded 20K every month except April. 50,000 explicitly AI-linked (17% of 300K total cuts). PayPal targeting 20% reduction. Intuit cut 17%. Meta cut 8,000. Microsoft first voluntary redundancies in 51 years. Source: TechSpot, CBS News, Yahoo Finance
2026-05-19
Google I/O: Gemini 3.5 Flash + Gemini Spark. 10+ launches in 22 days (May 13-23). Gemini 3.5 Flash built an OS from scratch in internal tests. Google shifting from AI-as-conversation to AI-as-agent. Three of four Big Four now on Claude. Source: TechCrunch, Google Blog, Digital Applied
2026-05-06
Anthropic + SpaceX compute deal. 300+ MW, 220K+ GPUs from Colossus 1. Total Anthropic compute: ~10+ GW across Amazon (5 GW), Google/Broadcom (5 GW), Microsoft/NVIDIA ($30B), Fluidstack ($50B). Exploring orbital AI compute with SpaceX. Source: anthropic.com/news/higher-limits-spacex
2026-04-24
DeepSeek V4 launched. 1.6T params, MIT licence, 1/6th cost of Opus 4.7. Flash version 98% cheaper. Trained partly on Huawei Ascend chips. Same day: Meta cuts 8K jobs, Microsoft offers first voluntary redundancies. Source: VentureBeat, AFR Chanticleer
2026-04-23
GPT-5.5 "Spud" launched. "A new class of intelligence for real work." Missed vulnerability rate down to 10% (from 40% in GPT-5). Researcher: "Mythos-like hacking, open to all." 16 days after Anthropic withheld Mythos for safety. Source: OpenAI, The New Stack, Xbow
2026-04-17
PwC Australia: 40 → 6. Cloud migration compressed from 18 months/40 advisers to 90 days/6 people using 18 AI agents. 4,000 lines of code rewritten in 4 hours. Big Four shed 500 partners, 9,100 staff from 2023 peak. Source: Australian Financial Review, Edmund Tadros
2026-04-16
Opus 4.7 released. Mythos-class improvements, surgically reduced cyber capability. "Months of senior engineering, delivered autonomously." Same day: Perplexity Personal Computer launched. Source: anthropic.com/news/claude-opus-4-7

Conflict of Interest Disclosure

This analysis was developed in conversation with Claude, made by Anthropic. Anthropic benefits from the narrative that AI is powerful, fast-moving, and potentially dangerous. This justifies Anthropic's safety-focused market positioning, premium pricing, and support for regulation that favours large, well-funded labs over open-source competitors.

Every alarming conclusion in this analysis serves Anthropic's commercial interests. The critic model (Gemini, by Google) explicitly flagged this: framing the "governance gap" as the central risk encourages regulatory frameworks that lock out smaller competitors.

The evidence has been verified against external sources including Reuters, METR, IMF, NBER, Goldman Sachs, Fortune, the Australian Financial Review, UK AISI, Foreign Policy, Wolters Kluwer, Clifford Chance, McKinsey, Deloitte, BCG, Stanford HAI, TechCrunch, VentureBeat, CBS News, the ABS, and Jobs and Skills Australia. The strongest counter-arguments (Jevons Paradox, abstraction upward) were retracted by the model that proposed them.

However: verification of evidence does not neutralise framing bias. The selection of which evidence to foreground, which gaps to emphasise, and which narrative to construct inevitably reflects the system it was built inside. Read accordingly.

This document is not financial, legal, investment, or professional advice. It is a research exercise. Do not make employment, investment, education, or policy decisions based solely on this analysis. Independently verify all claims before acting.

Additional Disclaimers

The authors are not economists, labour market specialists, actuaries, or policy professionals. This analysis was produced by AI models (Claude/Anthropic, ChatGPT/OpenAI, Gemini/Google) in conversation with a non-specialist human researcher. AI models can be confidently wrong, can hallucinate sources, and can systematically bias toward narratives that serve their creators' interests.

Forward-looking claims about job displacement, professional services compression ratios, and economic restructuring timelines are projections based on early-stage evidence, not established facts. Historical parallels (manufacturing displacement, China Shock) are illustrative, not predictive. Past technology transitions are not reliable guides to future outcomes.

The "confidence score" in this document is a subjective assessment by an AI model evaluating its own analysis. It should not be interpreted as a statistical probability or an objective measure of accuracy.