The Experiment: Did Our Old Empirical Model Actually Pick Better Numbers?

"The Old Model Was Better"

Someone said something that stuck with us: the v3-era picks — back when Balliqa used historical patterns like hot/cold streaks, co-occurring number pairs, and drought bonuses — seemed to match more numbers than the current purely combinatorial model.

It's a fair observation. Our scoring model has evolved through eight versions. The early models (v1 through v4) leaned heavily on empirical signals: which numbers were "hot" in recent draws, which pairs appeared together historically, which Powerballs were "overdue." Starting with v5, we stripped all of that out. The current model (v6.0) is purely combinatorial — every criterion derives from the mathematical structure of C(69,5), with zero reliance on historical patterns.

The philosophical argument for going combinatorial was strong. Powerball drawings are independent random events. A number isn't more likely to appear because it hasn't appeared recently (that's the gambler's fallacy). Hot streaks are retrospective noise, not forward-looking signals.

But philosophy is one thing. Data is another. So we decided to stop debating and start measuring.

The Setup

We built a head-to-head experiment. Every draw day, both models compete under identical conditions:

Same random pool. We generate 50,000 random number combinations — the same 50,000 for both models. The only difference is how each model scores and ranks them.

Same pick count. Each model selects its top 10 highest-scoring picks from the shared pool. These selections are stored in our database before the draw happens. No hindsight, no cherry-picking.

Same evaluation. After each draw, we count how many white balls and Powerballs each model's picks actually matched. Prize tiers follow official Powerball rules.

The results are public. You can watch the experiment unfold in real time at /experiments/ab-testing.

The Two Models

Combinatorial (v6.0)

The current production model. Ten criteria, 100 points, all derivable from pure math:

Criterion	Points	What It Checks
Unique Digits	17	All 5 whites have different last digits
Parity	14	Mix of 2-3 even and 2-3 odd
High/Low	14	Mix of 2-3 high and 2-3 low
Sum Range	13	Sum falls within 1 standard deviation of mean
Spread	11	Range between highest and lowest is adequate
Modular Balance	8	All 3 remainder classes (mod 3) represented
Range Coverage	7	At least one number in each third of 1-69
Tens Diversity	6	4+ different tens groups represented
Even Spacing	5	No huge gaps between consecutive numbers
Primes	5	1-2 prime numbers included

Every criterion answers a question about the structure of the combination. None of them look at what happened in previous draws.

Empirical (E1.0)

A resurrected version of the v3/v4 criteria set, reweighted to a fair 100-point scale. It shares six structural criteria with the combinatorial model, then adds four historical pattern criteria:

Criterion	Points	What It Checks
Parity	12	Same as combinatorial
High/Low	12	Same as combinatorial
Sum Range	12	Same as combinatorial
Co-occurrence	10	Pick contains number pairs that appeared together historically
Hot/Cold Mix	10	Includes numbers that are hot or cold in recent draws
Spread	10	Same as combinatorial
Unique Digits	10	Same as combinatorial
Drought Bonus	10	Includes numbers that are overdue for selection
PB Weighting	8	Powerball hasn't appeared recently
Primes	6	Same as combinatorial

The four bolded criteria are the ones we removed during the v5/v6 evolution. They look at recent draw history and assume that patterns persist — or revert. This is exactly the kind of signal that the gambler's fallacy warns us about, but it's also what many lottery analysts swear by.

What We're Measuring

The primary metric is match rate: what percentage of each model's picks matched 1, 2, 3, or more white balls against the actual winning numbers? We also track Powerball matches and whether any picks would have won a prize.

We're not measuring who scores higher on their own criteria — that would be circular. Both models will obviously score their own picks highly. What matters is how those picks perform against reality.

What We Expect

Honestly? We expect no significant difference. Here's why:

Powerball is a genuinely random process. Neither model has any information about which numbers will be drawn next. The combinatorial model selects "well-structured" combinations. The empirical model selects combinations that are well-structured and align with recent historical patterns. But if those historical patterns don't predict the future — and probability theory says they don't — then the empirical criteria are just noise layered on top of the structural base.

The interesting case would be if we do see a difference. If the empirical model consistently matches more numbers, it would suggest that short-term patterns in Powerball draws carry some forward-looking information — which would be genuinely surprising from a statistical standpoint. If the combinatorial model wins, it validates our decision to strip out the empirical criteria.

Either way, we'll have data instead of opinions.

Why This Matters

Most lottery tools make claims they never verify. "Our algorithm picks better numbers" is easy to say and hard to disprove if you never track results transparently.

We're not claiming either model will help you win. The odds are 1 in 292 million regardless. But we are claiming that intellectual honesty matters — and that means putting our models to the test publicly, even if the result proves us wrong.

Follow the experiment at /experiments/ab-testing. New data arrives every Monday, Wednesday, and Saturday after each Powerball draw.