California – Week One

Wow, so, let me catch you up:

  • Got engaged to Mary Ann
  • Moved to Los Angeles, CA last week and started a new job at SpaceX

Ok, I guess it’s just two things, but they’re REALLY big things!

Posted in Personal | Leave a comment

Once More Into the Bitcoin…

Last time I talked about Bitcoin, in 2011, I said this:

Prices quickly went from a few dollars to around $30, although they’ve now backed off a bit to around $20/BTC (Bitcoin).

And, you will notice, I was not predicting great things from Bitcoin (although I DID admire the underlying blockchain and proof of work technology).

Seven years later, let me follow up.

First, despite what I said, I did do some bitcoin speculation myself. I ended up losing thousands, not because I didn’t make the right choices in market timing and investment… but because I was part of the MtGox Hack. I lost several thousand dollars, the equivalent of about five bitcoin at the time, which would have been worth $50,000 today.

So that sucked.

Despite that, I got back in a few months ago, using Coinbase, gained back about the amount I lost in MtGox, sold all my bitcoin and bailed. I’m completely out now.

I’m not saying that Bitcoin prices won’t go higher. They might.

But I AM still saying that I don’t believe in its long-term stability. And let me tell you why.
Continue reading

Posted in Uncategorized | Leave a comment

Best Reddit Yet…

Ok, I’ve only posted to reddit a couple of times, but this one will be my favorite for probably forever I expect…

I posted this thread on reddit asking about an MP3 song I’ve had lying around since college, or shortly thereafter (1997? 98? 99?).

It’s on the “Name That Song” subreddit because I wasn’t sure I had the name of the artist or the song correct. I’d transcribed the lyrics, searched the internet as best I could (and I generally consider myself fairly adept at that), and finally resorted to posting to reddit.

A few weeks go by… crickets. No comments; just a few views.

Until I woke up this morning to not only a comment on the thread, but an email from the comment-writer, who was none other than the original artist, excited to have googled himself and found me posting about a song he wrote 20 years ago!

Made his day, and it made mine too. It’s a fun little thread; check it out. Good times.

Posted in Uncategorized | Leave a comment

Greedy Test Case Algorithm in a SQL Stored Proc

Here’s a straightforward problem: I have a table with a lot of fields in it (in this case, several tables — new Fact and Dimension tables in a star schema data warehouse, but, you know, any wide table will do).

I want to extract a few real world test records that exercise the entire table… a “covering set” of test cases… so if I have 100 columns, and record A has non-zero, non-null values in columns 1-50 and record B has good values in columns 51-100, then I only need to test those two records. How great is that?!

Ok, I should probably BUILD test cases, but I like using real data since there are always unseen business rules lurking about. Anyway, this is a pretty basic math problem: Select the minimal number of objects from the set of rows where the union of the viable (non-null, non-zero) columns across the subset covers all possible columns.

There’s some code below.  Note that it is very bad code.  I use the wrong scope on global temporary tables, I don’t do lots of checking of things, I generate SQL and execute it, I debug with print statements.  It is also formatted poorly, but that’s actually more of a wordpress/plugin issue than anything else.

But it’s mine, and I love it…

Continue reading

Posted in Data, happytechnologist, SQL | Leave a comment

Horses and Olympians and Data and Such

I pick on Aaron Carroll a lot, and it’s really not that he deserves picking on, it’s in fact because he writes so much good stuff that I like that I am compelled to investigate at length.  Fortunately this time he actually asked for comments, so here we are…

He recently posted a simple question (itself based upon a piece by Ben Rosen):  “Why are people getting so much faster, but not horses?” on The Incidental Economist blog, which you should read (the post, the site, all of it).

It has two pretty charts , and otherwise it’s very short.  But he asks for an answer to the question.  First, the charts… one depicting the wining Kentucky Derby time over 60ish years and the next depicting the world record human 1-Mile running record:

Charts from:

Charts from Ben Rosen

So, without the raw number or validating the trend lines, sure… the chart on the left shows a pretty even-keel trend (note the axis labels — the deviation is less than 5%, even though it looks spikey), while the one on the right shows a clearly, quickly, decreasing line.

Of course, the error is easy to spot, and I’m sure Aaron was being coy about it… you can’t compare best times at one race against a world record history.  The right hand chart is NECESSARILY going to go down all the time.  It would be astounding if the left hand chart did… if every single race beat the best time of the previous race.  And remember, the right hand chart is the aggregation of hundreds, possibly thousands of races competing for that best race.  It’s apples to oranges, and that explanation fits in a tweet:

Right, right, “monotonically decreasing” because world record times vs. one race is the core issue… “But wait!” I can hear you all gasp, in the back of your minds, why throw in the bit about training and breeding?

See, the reason this deserves more than a tweet, and thus the deep dive blog post, is that just because the charts used to ask a question were bad doesn’t mean the original question _itself_ was bad, and certainly we haven’t answered it. The answer to “why are humans improving faster than horses” is not “your charts don’t match”.  It’s just the charts don’t prove the premise (that people ARE improving faster than horses), so we have to go back and look at it a bit first.  (NB: I think they might be, thus the breeding vs. training comment, but we’ll come back to that).

To check the underlying assumption, we have to look at world-record times for horses. And that takes a bit of digging. The Guinness World Record people keep track of the “Fastest Race Horse“, but there’s no easily accessible history.  And the horse-racing statistics site Equibase has a page of about 75 record times (combinations of different length horce races on dirt, turf, and all-weather tracks), but again these are only the current record holders.

But it’s a good starting point.  The Kentucky Derby is a 1-1/4 mile dirt track.  It turns out that the world record for a 1-1/4 mile horse run was NEVER set there (as far as I can tell)… the three records I can find are:

  1. Spectacular Bid ran 1:57.8 (117.8 seconds) on Feb. 3, 1980 at Santa Anita Park
  2. Noor ran it in 1:58.1 (118.1 seconds) on June 20, 1950 “beating the prior record by 1.6 seconds” meaning…
  3. someone ran it in 1:59.8 (119.8 seconds)… sometime (Coaltown matched this in 1949, but the record had already been set)

Ok, so that’s not much data and it was hard to find.  I’m searching for a better set of world record horse racing data, but for now at least we can compare a chart in Excel because why not!

Horse Records vs. People Records on a bad scale

Horse Records vs. People Records (on a bad scale)

Hey look!  Our trendlines nearly match!  Horses HAVE been improving just like people have, right?!  Excellent, case closed, let’s go home.

Except, of course, it’s not that easy either.  These scales aren’t remotely similar (the left hand chart is eggregious, but both of them are misleading in order to fill in the space with an arbitrary y-axis) and of course the left hand side is based on exactly two points (note that the trendline is NOT the line between the two points… this is some of the oddity of evaluating a world record trend, since the data points for subsequent years are really imputed).


But… but… right, but… the original chart… I mean… that was kind of my p…. oh, never mind.

Anyway, at this point, we’re going to need more data and to come up with a method of comparing the base premise… it’s easy to say that since a horse record was broken once in 50 years and a human record 10 times or so (I’m counting inflection points), that the idea that humans are improving faster is valid, it’s far from a rigorous statistical analysis.  Is one incident at one distance, or one-second in 120 vs 15 from 240 THAT different?  As mentioned before, there are dozens of horse race distances and categories, and the same goes for human records — is the mile an outlier?  Do records get beaten as frequently and to the same degree in the 100-meter?  In the half- and full-marathons?

I mean, it seems like it… from this limited data, I just think we should be more careful taking it for granted.  Still, I stand by my point that breeding plus classic training, which would have shown benefits a hundred plus years ago in horses, beats out modern training which has only recently become the massively streamlined situation it is today.  Sure people are physically changing too, and we may yet start engineering people for their speed, but for now, well…

let me see if I can dig up some data and maybe we’ll write a part 2…

Update update: I’m probably not going to get any more data, and since I think I wasn’t on the same page as the original post I’ll just bow out quietly without starting much more of a row, and point my reader (s?) to Stephen Few’s blog, where we can discuss visualizing data best practices (like zero-axis bases and similar scales) a bit better (and who, similar to Mr. Carroll, I pick on sometimes mostly because I like his writing and agree with him so much of the time).

Posted in Data, happytechnologist, Random Musings | Tagged , , | Leave a comment