I get pitched a lot. Every week brings a fresh email from founders promising "revolutionary" AI that will disrupt industries, save time, or unlock new business models overnight. As an editor who covers tech and a journalist who wants the truth, I have to separate the genuinely novel from the spin. Over time I've developed a set of concrete tests — the sort of questions investors, skeptical journalists, and experienced engineers use — to evaluate whether a claim that an AI startup is "revolutionary" holds water.
Start by defining what "revolutionary" means
Before you dig into demos and numbers, press the founders to define their terms. Do they mean radically better accuracy? A new algorithmic paradigm? Mass market adoption within a year? Or lower cost for a specific task? The label "revolutionary" is meaningless without a benchmark for comparison.
Ask for a clear, explicit claim in plain language. For example: "Our model reduces false positives in medical image diagnosis by 30% versus current state-of-the-art," or "We can summarize legal contracts in under 30 seconds with less than 5% factual error." Those are testable. Broad, emotional claims are a red flag.
Reproducibility and evidence
A true breakthrough survives independent verification. When a startup claims a new model or system is revolutionary, I ask for the following immediately:
If founders decline citing IP concerns, ask for a controlled third-party audit or an academic partner willing to validate results. Big names like OpenAI and DeepMind publish benchmarks and research code; that openness is part of what makes their claims credible.
Benchmark performance — not company slides
Benchmarks matter. For language models, look to MMLU, GLUE, SuperGLUE, or newer domain-specific suites. For vision, ImageNet variants and COCO remain baseline touchstones. For multimodal systems, check established evaluation suites or cross-modal retrieval metrics.
But be cautious: benchmarks are gamable. Ask:
Sanity-check the demo
Demos are persuasive, but they can be cherry-picked. When I attend demos I press for live, unscripted tests — ideally with audience-supplied inputs. Watch for these warning signs:
Ask for a hands-on trial or a recorded session where inputs are chosen by independent testers. If the product is a model, request evaluation on a public benchmark or an anonymized set of real customer queries.
Data provenance and labeling
A model is only as good as its data. I dig into data sources, labeling quality, and potential biases. Key questions I ask:
For high-stakes use cases — medicine, finance, legal — expert-labeled data and documented annotation guidelines are non-negotiable. If a startup uses poorly labeled or noisy data, any claim of revolution is dubious.
Robustness, safety, and adversarial testing
Revolutionary claims should include evidence of robustness. I ask founders to show:
Companies serious about safety will have red-teaming reports, bug bounty programs, or formal verification where applicable. If there is no documented approach to known failure modes, be skeptical.
Cost, latency, and engineering constraints
Sometimes "revolutionary" means cheaper or faster. Validate those efficiency claims by asking for:
Startups that tout state-of-the-art scores but require impractical hardware or excessive latency aren’t ready for mainstream adoption. I compare their numbers with public models from Hugging Face, Meta, or Google to gauge realism.
Product-market fit and customer evidence
Technical novelty is only part of the story. Revolution also requires customers and workflows that will change because of the product. Ask for:
I often reach out to named customers directly, especially for enterprise startups. If references are evasive or only include close friends and angel investors, that’s a sign to probe deeper.
Team, IP, and reproducible research pedigree
A revolutionary idea needs the team to execute it. Look for:
Academic publications (peer-reviewed) and reproducible research are especially persuasive. If the founders claim an algorithmic breakthrough but have no prior publications or code, ask why — and watch how convincingly they answer.
Regulatory, ethical, and IP risks
Consider non-technical constraints. Revolutionary AI often raises regulatory or ethical flags. I ask:
Revolutionary tools that ignore compliance are likely to be slowed or blocked by regulators, which reduces their practical impact.
A practical checklist I use
| Test | What I expect to see |
| Clear claim | Quantified, testable statement |
| Reproducibility | Code/demo/data or third-party audit |
| Benchmarks | Fair comparisons, variance reported |
| Demo | Live, unscripted inputs |
| Data quality | Documented provenance and labeling |
| Robustness | Adversarial and OOD tests |
| Cost & latency | Production-level metrics |
| Customers | Independent references and usage data |
| Team & IP | Relevant pedigree, publications, patents |
| Compliance | Plan for regulation and ethics |
If a startup clears most items on that checklist, I start taking "revolutionary" seriously. If they clear a few items but not the core ones — reproducibility, real-world customers, and robustness — I file the claim under "promising but unproven." If they avoid scrutiny or use opaque language, I treat the claim as marketing until proven otherwise.
I don’t expect every early-stage company to have perfect answers. But I do expect transparency, rigorous thinking, and a willingness to let others test their claims. Calling something revolutionary is a high bar; the best teams welcome the scrutiny that lets the market — and the press — separate genuine breakthroughs from clever language. If you want, I can walk you through evaluating a specific startup pitch and apply these tests in real time.