"The Old Model Was Better"
Someone said something that stuck with us: the v3-era picks — back when Balliqa used historical patterns like hot/cold streaks, co-occurring number pairs, and drought bonuses — seemed to match more numbers than the current purely combinatorial model.
It's a fair observation. Our scoring model has evolved through eight versions. The early models (v1 through v4) leaned heavily on empirical signals: which numbers were "hot" in recent draws, which pairs appeared together historically, which Powerballs were "overdue." Starting with v5, we stripped all of that out. The current model (v6.0) is purely combinatorial — every criterion derives from the mathematical structure of C(69,5), with zero reliance on historical patterns.
The philosophical argument for going combinatorial was strong. Powerball drawings are independent random events. A number isn't more likely to appear because it hasn't appeared recently (that's the gambler's fallacy). Hot streaks are retrospective noise, not forward-looking signals.
But philosophy is one thing. Data is another. So we decided to stop debating and start measuring.
The Setup
We built a head-to-head experiment. Every draw day, both models compete under identical conditions:
Same random pool. We generate 50,000 random number combinations — the same 50,000 for both models. The only difference is how each model scores and ranks them.
Same pick count. Each model selects its top 10 highest-scoring picks from the shared pool. These selections are stored in our database before the draw happens. No hindsight, no cherry-picking.
Same evaluation. After each draw, we count how many white balls and Powerballs each model's picks actually matched. Prize tiers follow official Powerball rules.
The results are public. You can watch the experiment unfold in real time at /experiments/ab-testing.
The Two Models
Combinatorial (v6.0)
The current production model. Ten criteria, 100 points, all derivable from pure math:
| Criterion | Points | What It Checks |
|---|---|---|
| Unique Digits | 17 | All 5 whites have different last digits |
| Parity | 14 | Mix of 2-3 even and 2-3 odd |
| High/Low | 14 | Mix of 2-3 high and 2-3 low |
| Sum Range | 13 | Sum falls within 1 standard deviation of mean |
| Spread | 11 | Range between highest and lowest is adequate |
| Modular Balance | 8 | All 3 remainder classes (mod 3) represented |
| Range Coverage | 7 | At least one number in each third of 1-69 |
| Tens Diversity | 6 | 4+ different tens groups represented |
| Even Spacing | 5 | No huge gaps between consecutive numbers |
| Primes | 5 | 1-2 prime numbers included |
Every criterion answers a question about the structure of the combination. None of them look at what happened in previous draws.
Empirical (E1.0)
A resurrected version of the v3/v4 criteria set, reweighted to a fair 100-point scale. It shares six structural criteria with the combinatorial model, then adds four historical pattern criteria:
| Criterion | Points | What It Checks |
|---|---|---|
| Parity | 12 | Same as combinatorial |
| High/Low | 12 | Same as combinatorial |
| Sum Range | 12 | Same as combinatorial |
| Co-occurrence | 10 | Pick contains number pairs that appeared together historically |
| Hot/Cold Mix | 10 | Includes numbers that are hot or cold in recent draws |
| Spread | 10 | Same as combinatorial |
| Unique Digits | 10 | Same as combinatorial |
| Drought Bonus | 10 | Includes numbers that are overdue for selection |
| PB Weighting | 8 | Powerball hasn't appeared recently |
| Primes | 6 | Same as combinatorial |
The four bolded criteria are the ones we removed during the v5/v6 evolution. They look at recent draw history and assume that patterns persist — or revert. This is exactly the kind of signal that the gambler's fallacy warns us about, but it's also what many lottery analysts swear by.
What We're Measuring
The primary metric is match rate: what percentage of each model's picks matched 1, 2, 3, or more white balls against the actual winning numbers? We also track Powerball matches and whether any picks would have won a prize.
We're not measuring who scores higher on their own criteria — that would be circular. Both models will obviously score their own picks highly. What matters is how those picks perform against reality.
What We Expect
Honestly? We expect no significant difference. Here's why:
Powerball is a genuinely random process. Neither model has any information about which numbers will be drawn next. The combinatorial model selects "well-structured" combinations. The empirical model selects combinations that are well-structured and align with recent historical patterns. But if those historical patterns don't predict the future — and probability theory says they don't — then the empirical criteria are just noise layered on top of the structural base.
The interesting case would be if we do see a difference. If the empirical model consistently matches more numbers, it would suggest that short-term patterns in Powerball draws carry some forward-looking information — which would be genuinely surprising from a statistical standpoint. If the combinatorial model wins, it validates our decision to strip out the empirical criteria.
Either way, we'll have data instead of opinions.
Why This Matters
Most lottery tools make claims they never verify. "Our algorithm picks better numbers" is easy to say and hard to disprove if you never track results transparently.
We're not claiming either model will help you win. The odds are 1 in 292 million regardless. But we are claiming that intellectual honesty matters — and that means putting our models to the test publicly, even if the result proves us wrong.
Follow the experiment at /experiments/ab-testing. New data arrives every Monday, Wednesday, and Saturday after each Powerball draw.