Greedy Test Case Algorithm in a SQL Stored Proc

Here’s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables — new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).

I want to extract a few real world test records that exercise the entire table… a “covering set” of test cases… so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!

Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.

There’s some code below.  Note that it is very bad code.  I use the wrong scope on global temporary tables, I don’t do lots of checking of things, I generate SQL and execute it, I debug with print statements.  It is also formatted poorly, but that’s actually more of a wordpress/plugin issue than anything else.

But it’s mine, and I love it…

Continue reading

Posted in Data, happytechnologist, SQL | Leave a comment

Horses and Olympians and Data and Such

I pick on Aaron Carroll a lot, and it’s really not that he deserves picking on, it’s in fact because he writes so much good stuff that I like that I am compelled to investigate at length.  Fortunately this time he actually asked for comments, so here we are…

He recently posted a simple question (itself based upon a piece by Ben Rosen):  “Why are people getting so much faster, but not horses?” on The Incidental Economist blog, which you should read (the post, the site, all of it).

It has two pretty charts , and otherwise it’s very short.  But he asks for an answer to the question.  First, the charts… one depicting the wining Kentucky Derby time over 60ish years and the next depicting the world record human 1-Mile running record:

Charts from:

Charts from Ben Rosen

So, without the raw number or validating the trend lines, sure… the chart on the left shows a pretty even-keel trend (note the axis labels — the deviation is less than 5%, even though it looks spikey), while the one on the right shows a clearly, quickly, decreasing line.

Of course, the error is easy to spot, and I’m sure Aaron was being coy about it… you can’t compare best times at one race against a world record history.  The right hand chart is NECESSARILY going to go down all the time.  It would be astounding if the left hand chart did… if every single race beat the best time of the previous race.  And remember, the right hand chart is the aggregation of hundreds, possibly thousands of races competing for that best race.  It’s apples to oranges, and that explanation fits in a tweet:

Right, right, “monotonically decreasing” because world record times vs. one race is the core issue… “But wait!” I can hear you all gasp, in the back of your minds, why throw in the bit about training and breeding?

See, the reason this deserves more than a tweet, and thus the deep dive blog post, is that just because the charts used to ask a question were bad doesn’t mean the original question _itself_ was bad, and certainly we haven’t answered it. The answer to “why are humans improving faster than horses” is not “your charts don’t match”.  It’s just the charts don’t prove the premise (that people ARE improving faster than horses), so we have to go back and look at it a bit first.  (NB: I think they might be, thus the breeding vs. training comment, but we’ll come back to that).

To check the underlying assumption, we have to look at world-record times for horses. And that takes a bit of digging. The Guinness World Record people keep track of the “Fastest Race Horse“, but there’s no easily accessible history.  And the horse-racing statistics site Equibase has a page of about 75 record times (combinations of different length horce races on dirt, turf, and all-weather tracks), but again these are only the current record holders.

But it’s a good starting point.  The Kentucky Derby is a 1-1/4 mile dirt track.  It turns out that the world record for a 1-1/4 mile horse run was NEVER set there (as far as I can tell)… the three records I can find are:

  1. Spectacular Bid ran 1:57.8 (117.8 seconds) on Feb. 3, 1980 at Santa Anita Park
  2. Noor ran it in 1:58.1 (118.1 seconds) on June 20, 1950 “beating the prior record by 1.6 seconds” meaning…
  3. someone ran it in 1:59.8 (119.8 seconds)… sometime (Coaltown matched this in 1949, but the record had already been set)

Ok, so that’s not much data and it was hard to find.  I’m searching for a better set of world record horse racing data, but for now at least we can compare a chart in Excel because why not!

Horse Records vs. People Records on a bad scale

Horse Records vs. People Records (on a bad scale)

Hey look!  Our trendlines nearly match!  Horses HAVE been improving just like people have, right?!  Excellent, case closed, let’s go home.

Except, of course, it’s not that easy either.  These scales aren’t remotely similar (the left hand chart is eggregious, but both of them are misleading in order to fill in the space with an arbitrary y-axis) and of course the left hand side is based on exactly two points (note that the trendline is NOT the line between the two points… this is some of the oddity of evaluating a world record trend, since the data points for subsequent years are really imputed).


But… but… right, but… the original chart… I mean… that was kind of my p…. oh, never mind.

Anyway, at this point, we’re going to need more data and to come up with a method of comparing the base premise… it’s easy to say that since a horse record was broken once in 50 years and a human record 10 times or so (I’m counting inflection points), that the idea that humans are improving faster is valid, it’s far from a rigorous statistical analysis.  Is one incident at one distance, or one-second in 120 vs 15 from 240 THAT different?  As mentioned before, there are dozens of horse race distances and categories, and the same goes for human records — is the mile an outlier?  Do records get beaten as frequently and to the same degree in the 100-meter?  In the half- and full-marathons?

I mean, it seems like it… from this limited data, I just think we should be more careful taking it for granted.  Still, I stand by my point that breeding plus classic training, which would have shown benefits a hundred plus years ago in horses, beats out modern training which has only recently become the massively streamlined situation it is today.  Sure people are physically changing too, and we may yet start engineering people for their speed, but for now, well…

let me see if I can dig up some data and maybe we’ll write a part 2…

Update update: I’m probably not going to get any more data, and since I think I wasn’t on the same page as the original post I’ll just bow out quietly without starting much more of a row, and point my reader (s?) to Stephen Few’s blog, where we can discuss visualizing data best practices (like zero-axis bases and similar scales) a bit better (and who, similar to Mr. Carroll, I pick on sometimes mostly because I like his writing and agree with him so much of the time).

Posted in Data, happytechnologist, Random Musings | Tagged , , | Leave a comment

No Man’s Sky – Finally! Nearly as good as Star Control II

I’ve long been on record (well, I’ve long said it I promise) that my favorite game ever is still Star Control II, and I’ve been waiting for a game to recapture what I loved about it. (You can download the “Ur-Quan Masters” for free — a 100% original code port of Star Control II; go spend a few days playing it after you read this)

The Zoq, the Fot, and the Pik

The Zoq, the Fot, and the Pik

No Man’s Sky isn’t QUITE going to replace what I loved about SCII, but it comes as close as anything has in a long time.

Let’s back up.  Star Control II is a space exploration game with a huge number of stars and planets.  The first time I played it, I found out that, if you dilly-dally and just wander around searching all the stars, rather than following hints and clues left by chance alien encounters and random artifacts found on planets, that you will surely die, having lost the game after, say, 72 hours of painstaking gameplay, since the main story arc of the universe continues, whether or not you’re paying attention.  Even after rolling back to a save game from 12 or 24 hours prior to dying my first game was unsalvageable — I had to restart.

No Man’s Sky doesn’t seem to do that to you.  But you DO have a sense that the universe is still chugging along while you hop from system to system and planet to planet.  And, while I haven’t finished it after, say, 10 hours of gameplay, NMS DOES seem to have a compelling story arc, AND it lets you literally just while away your time throughout a nearly endless universe.

No Man's Sky Press Release Screenshot

No Man’s Sky Press Release Screenshot

And that’s pretty much all you do.

And it’s pretty cool.

And some people agree with me!

Yes, there are aliens and outposts EVERYWHERE — some nicer than others — some humanoid, some not, some apparently androids… none of the planets feel truly “undiscovered”.  Other similarities include animals and plants and mining and sentries and artifacts and languages you have to learn and upgrades and mysteries that you may never uncover and lots of awesomesauce.  Except NMS has 20 years of improved technology behind it, compared to SCII, and a cutting edge planetary generator that is pretty awesome, even if some people expected better.

Star Control II made me want to get into video game programming more than any other game.  Sure I’d already written a (horrible) asteroids clone back in high school (some day I’ll find the source code for that… it’s on one of these floppies… somewhere).  But SCII was the most balanced game I’d seen and it was simply enough made that it looked approachable.  I’d spent time studying the 3D modeling in doom and figured it wasn’t worth my time… by the time I got an animated cube spinning in front of me, I tired of the matrix manipulation and the constant bounds checking and accidental inversions that plagued that sort of development at the time.  Now, of course, you can just describe a polyhedra and send it to a GPU, but at the time the math hadn’t been built into silicon.  Trust me, it was daunting.

NMS takes the procedurally generated plasmas that SCII made so easy and succesfully DOES leverage that modern hardware to pull them into the 3D world.  And it’s gorgeous.

Star Control II procedurally generated rainbow planet vs. No Man's Sky terrain and lifeforms

Star Control II procedurally generated rainbow planet vs. No Man’s Sky terrain and lifeforms

The “randomness” of the worlds isn’t very high.  The important parts boil down to a few things:  How hospitable is the planet (temperature and toxicity), how much flora and fauna there are, and how angry the local aliens are.  The rest is straight landscape… water, land, and caves make up a pretty reliable system, but there’s not REALLY a lot of noteworthy variety, at least not yet.  People have complained about gorgeous but repetitive color schemes and easy-to-get-lost-in caves due to the lack of real on-planet variety.  But these are complaints when compared against hand-crafted easy-to-navigate arenas people have been weaned on in video games.  It actually IS easy to get lost in a cave or in a forest IRL (actually, I have yet to encounter anything forest-like in NMS, but others have, so I’ll see).

These are all trade-offs I’m willing to take for the vast expanse of the universe.  Of note, many people are complaining about it crashing a lot… that absolutely would bother me… but on my ancient gaming rig (the case is, what, at least 15 years old — the GPU over 3), it runs just under 1080p, looks great, and doesn’t hiccup at all (if you don’t count how the procedurally generated landscapes come to visibility… but that seems to happen with everyone).

There are other complaints floating around that don’t bother me.  A fairly restrictive inventory and upgrade path for your suit, ship, and handheld-mining-tool (gun), which doesn’t bother me at all… compare this to say, EVE Online, which took it to the other extreme and I far prefer the NMS version.  Space combat isn’t “all that”… well, yes, I agree, but it’s not a space comat game.  SC2 had fun combat, but it was of the cheap-arcade easy-to-master style that was more like a fun android game.  Elite: Dangerous is the go-to space battle game now, right?  Meh… I only wish I could run away from fights, but even losing your spaceship isn’t so bad in NMS, as you get popped to the nearest space station with a brand new one awaiting you.  For combat I miss X-Wing vs. Tie Fighter and Wing Commander.

"Melee" in SCII vs a combat ready scene in NMS (remember square aspect ratios?)

“Melee” in SCII vs a combat ready scene in NMS (remember square aspect ratios?)

Other games have other mixtures of these basic elements that I love… Mass Effect was a pretty good spiritual relative of SCII, but it wasn’t a clear hit for me… the game play was too linear, although you did have the sense of the impending lose-everything ending, but the openness was really an illusion.  There are myriad others that try certain aspects… Sins of a Solar Empire, Kerbal Space Program, anything Star Wars or Star Trek related… but nothing hits quite right.

And No Man’s Sky still doesn’t.  But it’s close.  Very very close, and it’s easily worth my first-run money (a rarity).  I expect to be disappointed when the very loose “Atlas” story line is exhausted, but who knows.  And maybe they’ll add other story lines in over time that are more universe-shattering?  I doubt there will be a game again that you can actually, you know… LOSE, at least not of this caliber, but for now I’m engrossed in an infinite-as-necessary universe…

and I’m having a grand time.

Posted in Gaming, Programming, Reviews, Video Game | 1 Comment

On Repeated Medical “testing”

I’ve blogged about this before, but I wanted to revisit because FiveThirtyEight, whom I love and adore, just posted about it again, in light of the recent Theranos problems.

My last blog post was long, so let me simplify my position… there’s a big graphic in the middle of the FiveThirtyEight article:

And it’s fine, it’s very simple, very accurate math, EXCEPT, it only deals with the case where each person takes exactly one test.  This isn’t FiveThirtyEight’s fault, they borrowed a common example, and that example is what most medical research is based on… see the flow chart in my last post… you take blood, you look for things, and you (doctors and patients) react to what you find.

This is NOT the best way forward if cheap reapeatable tests become available.   It’s not: test, react, test, react, test, react – which I agree leads to over-reactive medicine.  It’s: test, test, test, react, which I’m arguing will reduce over-reaction (but I admit may not be for the feint of heart). Continue reading

Posted in Healthcare, Rant | Leave a comment

On Key Performance Indicators

You’ve got a company, or an organization, or even just in everyday life, and you have goals:  you want to grow, you want to learn, you want to educate.  But how do you quantify those goals?  From a business perspective, how do you build successful metrics or Key Performance Indicators (KPIs)* when you want to quantify how well?

This came up recently when working on the nonprofit Louisville Makes Games organization, and it’s something I deal with at work a lot, so I thought I’d collect all my thoughts in one place.

First, of course, it’s not easy.  There are books, and books, and books on the topic.  There’s even a book for Dummies:

Continue reading

Posted in Data, Gaming, Work | Leave a comment