DISCLAIMER: This is a research exercise developed in conversation with AI models. It is not financial, legal, investment, or professional advice. The authors have conflicts of interest (full disclosure). All claims should be independently verified. Do not make decisions based solely on this document.

Adversarial AI Research — Evidence-Graded Live Tracker

AGI & Implications

Two adversarial rounds. Three frontier models. Evidence-tiered. Observed and projected claims separated. Falsification criteria published.

Claude × ChatGPT × Gemini Last updated: May 30, 2026 Confidence: 8.5/10

Current Confidence

8.5 / 10
SpeculativeDirectionally confirmedOperationally certain
255+
Model releases
Q1 2026
42
Days: latest
release gap
89
Days: METR
doubling
100K+
Tech layoffs
2026 YTD
98%
Cheaper: DeepSeek
Flash vs frontier
Evidence tiers: T1 Primary source / govt / peer-reviewed T2 Company disclosure / industry survey T3 Trade press T4 Social / community
The Report

The evidence increasingly supports a widening gap between AI capability growth and institutional adaptation

The magnitude of eventual labour-market disruption remains uncertain. The direction of travel is becoming clearer.

Capability Acceleration

Observed
Model cadence compressing. Anthropic release gaps: 335 → 186 → 73 → 70 → 42 days. Opus 4.8 shipped May 28. Mythos (tier above Opus) weeks from public release. GPT-5.5 shipped April 23. DeepSeek V4 shipped April 24. Gemini 3.5 Flash shipped May 19. Ten launches in 22 days during May. T1 T2
Observed
METR task horizon accelerating. Doubling time for post-2024 models: 88.6 days (down from 196.5 days historically). No flattening detected. Frontier estimate noisy but trend intact. T1
Observed
12+ labs across 3 continents shipping frontier models simultaneously. Anthropic, OpenAI, Google, Meta, xAI, DeepSeek, Alibaba/Qwen, Moonshot, Zhipu, ByteDance, Mistral, NVIDIA. 255+ releases in Q1. Meta abandoned open-source (Muse Spark). Chinese models training on Huawei Ascend chips. T2 T3
Observed
Cost collapse. DeepSeek V4 Flash: 98% below Opus 4.7. V4 Pro: 1/6th the cost. DeepSeek and Qwen: 1% to 15% global market share in 12 months. T2 T3

The Recursive Loop

Observed
AI builds its own tools. 90% of Claude Code self-written (Pragmatic Engineer). Cowork built by Claude Code in two weeks (Axios, Fortune). 16 agents built 100K-line C compiler (Anthropic Engineering). Opus 4.7: "months of senior engineering, delivered autonomously" (Koios Audio). T1 T2
Observed
Agentic layer compounding. Perplexity Computer: 3.25 years of enterprise work in 4 weeks (company claim). OpenClaw: 196K GitHub stars, one person, 6,600 commits/month using AI agents (GitHub, Fortune). Creator acqui-hired by OpenAI. T2 T3
Projected
Each model generation will be more AI-built than the last. If tooling is ~90% AI-built and model development is substantially AI-assisted, the boundary moves in one direction. No public data on the exact ratio for model development itself. T2

Professional Services Compression

Observed
PwC Australia: 40 → 6. Cloud migration compressed from 18 months to 90 days using 18 AI agents. 4,000 lines rewritten in 4 hours. Big Four Australia shed 500 partners, 9,100 staff from 2023 peak. T1 AFR, Edmund Tadros, April 17, 2026.
Observed
Klarna: 7,000 → 2,000. $250M loss to profit. AI doing work of 700 agents. T2 Fortune, Final Round AI.
Medvi: $401M revenue, 2 employees. Competitor Hims & Hers: $2.4B, 2,442 employees. T3 NYT, PYMNTS.
Citi: contractor share 50% → 20%. HSBC: 20,000 roles under review. Baker McKenzie: <10% business staff citing AI. T1 Reuters.
Observed
100,000+ tech layoffs in 2026 YTD. 50,000 explicitly AI-linked (17% of 300K total). Meta 8K, Microsoft 8K, PayPal 20%, Intuit 17%, Cisco 4K. Layoffs exceeded 20K every month except April. T1 CBS News, TechSpot, Challenger Gray & Christmas.
Projected
Professional services compression of 5:1 to 20:1 at scale. PwC achieved 6.7:1 on one engagement. No firm has reported this as a firm-wide norm. Jevons Paradox (demand increases with cheaper services) was tested and retracted by Gemini, but has not been empirically falsified at macro level. Upper-end ratios (10:1+) remain ahead of evidence.

Labour Market Impact

Observed
Early displacement signals. Goldman Sachs: displaced tech workers suffer ~3% lower real earnings, 10pp less growth over a decade. SignalFire: new-grad tech hiring down 50% from 2019. Revelio Labs: 60%+ of tech workers changing jobs moved outside tech; juniors disproportionately pushed out. Job-switch pay premium: 54% → 41%. T1 T2
Observed
Historical parallel: manufacturing. China Shock (Autor, Dorn, Hanson, NBER): wages and participation depressed for a decade. Higher overdose mortality, disability uptake, political polarisation in exposed communities. T1
Projected
Mass deskilling as the likely outcome. The thesis projects that millions will work for less in roles below their training, mirroring manufacturing towns. This is inferred from early signals and historical parallel, not yet observed at scale in AI-displaced knowledge workers. No public longitudinal study tracks where laid-off knowledge workers ended up at what salary.

Counter-Arguments Tested and Retracted

Observed — Adversarial Review Result
Jevons Paradox: "Cheaper services increase demand." Gemini proposed, stress-tested against ATMs/self-checkout/auto-trading, retracted. Key: Jevons works for task automation, fails for outcome automation. T2 Gemini adversarial output.

Abstraction Upward: "Workers move to higher roles." Gemini found new roles (agentic architect, AU$275-540K), then destroyed own case: automating entry level severs the ladder to senior roles. T2 Gemini adversarial output.
"Abstraction upward is the most comfortable outcome, not the most likely." — Gemini's own verdict

Structural Factors

Observed
Liability Gap closing. HSB (Munich Re): AI liability insurance, March 2026. EU PLD: AI as "products" with strict liability, December 2026. Clifford Chance: full decoupling by late 2027. T1 T2
Observed
China: deploy first, govern later. 72+ governments on DeepSeek. 1% to 15% market share in 12 months. V4 on Huawei chips. US advisers: "self-reinforcing competitive advantage." T1 Reuters, USSC, Jamestown.
Observed
Cybersecurity convergence. Mythos: first to complete UK AISI 32-step attack simulation. GPT-5.5 shipped comparable capability 16 days later. "Mythos-like hacking, open to all." T1 UK AISI, Xbow. T3 The New Stack.
Projected
SaaS valuation compression. Growth compression from 40% to 15% would produce two-thirds valuation haircuts. Build costs collapsing. Perplexity connects to 400+ tools. Not yet confirmed in broad market data — needs Bessemer/Jamin Ball verification.
Observed
Infrastructure arms race. Anthropic: ~10+ GW total compute (Amazon 5 GW, Google/Broadcom 5 GW, Azure $30B, Fluidstack $50B, SpaceX Colossus 300 MW). Exploring orbital AI compute. Combined Big Tech capex: $600B+. ByteDance: $70B. xAI: 1.5 GW. T1 T2

Honest Gaps

Propagation. 80% of firms see zero impact (NBER, Feb 2026 T1). Gap between frontier and median is enormous.
Destination data. Workers moving out and down. No public role-by-role salary maps.
Timeline. 3 to 15 years. Anyone claiming precision is selling something.
Jagged frontier. Coding: nearly solved. Creative work: still uneven. "The jagged frontier is still there. It is just much further out." — Mollick, Wharton T3
Open source. Chinese models at 15%, 98% cheaper. Meta going proprietary. Unresolved.
Falsification Ledger

What would kill this thesis

If any of the following are observed, the thesis requires significant revision or abandonment. This ledger is published to increase accountability.

METR flattening. If the task-horizon doubling time plateaus above 200 days for 12+ months, the acceleration thesis collapses. This is the single most important falsification trigger. Status: Not triggered. Last checked: METR TH1.1, January 2026.
Entry-level hiring recovery. If new-grad tech hiring returns to 2019 levels (or within 20%) while AI adoption continues growing, the deskilling thesis is wrong. Status: Not triggered. SignalFire shows -50% from 2019 as of May 2025.
Professional services employment holds steady. If Australian or US professional services employment is flat or growing through Q4 2026 despite widespread AI adoption, the compression thesis is premature. Status: Mixed. Support staff cuts observed; fee-earner headcount not yet declining at scale.
Displaced workers land in equal-or-better roles. If a longitudinal study shows that AI-displaced knowledge workers routinely transition to roles at equal or higher pay, the deskilling thesis fails. Status: No such study exists publicly. Revelio shows 60%+ leaving tech; salary data is thin.
Open-source fragmentation destroys concentration. If open-weight models reach true frontier capability and fragment the market, the wealth-concentration thesis weakens. Status: DeepSeek V4 at 90% of frontier, 1/6th cost. But Meta went proprietary. Unresolved.
AI deployment plateau. If enterprise AI adoption stalls below 50% measurable impact for 12+ months, the propagation thesis fails. Status: McKinsey 2025 showed 39%. Waiting for mid-2026 update.
Jevons Paradox proves dominant. If cheaper AI-delivered professional services demonstrably increase total employment in those sectors (not just revenue), the demand-elasticity counter-argument wins. Status: Wolters Kluwer shows firm revenue up 20%, but headcount data is inconclusive. Gemini retracted the argument but empirical data is still early.
Live Monitoring

Watching by June 30, 2026

Confirmed
Model cadence <42 daysOpus 4.7 → 4.8 = 42 days. Watching for sub-42 gap or Mythos public.
Watching
METR TH1.2 updateQuarterly update expected. 88.6-day doubling: hold, compress, or flatten?
Confirmed
Monthly layoffs >20KEvery month except April. 100K+ YTD. 50K AI-linked.
Watching
Enterprise adoption >50% impactMcKinsey 2025: 39%. Three of four Big Four on Claude. Waiting for mid-2026 survey.
Watching
Cost floor drops furtherDeepSeek 950 supernodes coming. Sub-$0.10/M tokens?
Watching
New AI insurance productsHSB March 2026. EU PLD Dec 2026. Australian/UK frameworks?
Watching
China market share updateWas 15% Jan. Q2 data? Huawei benchmarks?
Watching
More Medvi-pattern companiesSub-5 people, $100M+ revenue. 3+ = pattern not outlier.
Watching
Agentic adoption dataPerplexity, Gemini Spark, Codex agents. Usage numbers?

Scoring: 7+ confirmed = 9.5 · 5-6 = 9 · 3-4 = 8.5 · METR flattens + layoffs plateau = 7

Update Log

Chronological evidence trail

Newest first. Every entry cites sources with evidence tier.

2026-05-30
Tracker rebuilt with evidence tiers, observed/projected split, and falsification ledger following external review. Confidence holds at 8.5/10. Capability acceleration evidence rated HIGH. Labour market evidence rated MODERATE. Governance gap evidence rated HIGH.
2026-05-28
Opus 4.8 released. 42-day gap from Opus 4.7. First to complete every Super-Agent benchmark case. Highest Legal Agent Benchmark score. Fast mode 3x cheaper. T2 anthropic.com/news/claude-opus-4-8
2026-05-24
Tech layoffs pass 100K in 2026. 50K AI-linked (17% of 300K total). PayPal 20%, Intuit 17%, Cisco 4K, Meta 8K, Microsoft 8K. T1 CBS News, TechSpot, Challenger Gray & Christmas
2026-05-19
Google I/O: Gemini 3.5 Flash + Spark. Flash built OS from scratch in internal tests. 10+ launches in 22 days. Three of four Big Four on Claude. T2 T3 TechCrunch, Google Blog, Digital Applied
2026-05-06
Anthropic + SpaceX compute deal. 300+ MW, 220K+ GPUs. Total Anthropic compute: ~10+ GW. Exploring orbital compute. T2 anthropic.com/news/higher-limits-spacex
2026-04-24
DeepSeek V4 + Meta/Microsoft cuts. V4: 1.6T params, MIT, 1/6th cost. Flash: 98% cheaper. Meta: 8K cuts. Microsoft: first voluntary redundancies in 51 years. T1 T2 VentureBeat, AFR, Reuters
2026-04-23
GPT-5.5 shipped. "New class of intelligence." Missed vulns down to 10%. "Mythos-like hacking, open to all." 16 days after Mythos withheld. T2 T3 OpenAI, The New Stack, Xbow
2026-04-17
PwC Australia: 40 → 6. 18 months to 90 days. 18 agents. 4,000 lines in 4 hours. Big Four shed 500 partners, 9,100 staff. T1 AFR, Edmund Tadros
2026-04-16
Opus 4.7 + Perplexity Personal Computer. Mythos-class improvements, reduced cyber. "Months of engineering, delivered autonomously." T2 anthropic.com, Perplexity

Conflict of Interest Disclosure

This analysis was developed in conversation with Claude, made by Anthropic. Anthropic benefits commercially from the narrative that AI is powerful, fast-moving, and potentially dangerous. This justifies Anthropic's safety-focused market positioning, premium pricing, and support for regulation that favours large, well-funded labs over open-source competitors. Every alarming conclusion in this analysis serves Anthropic's commercial interests.

The critic model (Gemini, by Google) explicitly flagged this: framing the "governance gap" as the central risk encourages regulatory frameworks that lock out smaller competitors.

Evidence has been verified against external sources including Reuters, METR, IMF, NBER, Goldman Sachs, Fortune, AFR, UK AISI, Foreign Policy, Wolters Kluwer, Clifford Chance, McKinsey, Deloitte, BCG, Stanford HAI, CBS News, TechCrunch, VentureBeat, ABS, and Jobs and Skills Australia. The strongest counter-arguments were retracted by the model that proposed them. However, verification of evidence does not neutralise framing bias. The selection of which evidence to foreground inevitably reflects the system it was built inside.

Additional Disclaimers

The authors are not economists, labour market specialists, actuaries, or policy professionals. This analysis was produced by AI models in conversation with a non-specialist human researcher. AI models can be confidently wrong, hallucinate sources, and systematically bias toward narratives that serve their creators.

Forward-looking claims about job displacement, compression ratios, and timelines are projections based on early-stage evidence, not established facts. Historical parallels are illustrative, not predictive. The "confidence score" is a subjective AI assessment, not a statistical probability.

This document is not financial, legal, investment, or professional advice. Do not make employment, investment, education, or policy decisions based solely on this analysis. Independently verify all claims.