Datamining, Privacy, and Ethics

[I’m trying to write shorter blog posts these days — let’s see how that goes]

There was a lot of chatter recently around about how Target (the shopping chain) has used data mining to identify pregnant shoppers in an effort to woo them as loyal customers. This is a prime example of things that are of direct interest to me: data mining, privacy, and the ethics surrounding the vast amount of knowledge we can compile about everything today, so I thought I’d share my perspective.

First off, the NYT article should not have been a surprise to anyone familiar with data. I’ve worked very closely with data mining teams on large retailers, insurance companies, and government agencies, and they uncover correlations all the time that lead to spooky predictability. The classic example of this is correlated sales of diapers and beer (from

A number of convenience store clerks, the story goes, noticed that men often bought beer at the same time they bought diapers. The store mined its receipts and proved the clerks’ observations correct. So, the store began stocking diapers next to the beer coolers, and sales skyrocketed.

One common interpretation was that a new father was sent out in the night to get much needed diapers, which put him in the mood to buy a six-pack.  Of course, that last part is purely subjective, but that’s the story.

The article goes on to call this a “myth,” but even if the specific case isn’t verifiable, the decades-old example is on point for what it describes: Everyoneâ„¢ is trying to make money by learning about predictable patterns, then exploiting those patterns to achieve their goals.  This has been going on for thousands of years at a very human level in sales: vendors put up shops in high traffic areas, they’re careful what they put in plain view to attract customers, they offer sales on one item and try to get you to buy more things once you’re there, they give better prices to loyal customers.  Think of those examples in a modern shopping mall, then think of them in an ancient city square. It’s not hard to imagine examples in both places.

The difference is that we’re getting to the point where we can see patterns that would not occur to even a very thorough store clerk.  These insights require large amounts of data to verify and to pluck out the most important and exploitable correlations, but the results, as the NYT article indicates, can be pretty spooky.

Stores are now trying to gather more information and be more specific about where it comes from, specifically to learn these patterns.  “Loyalty” card programs are an attempt to do just that… when you use a supermarket card you may save a few dollars (which is often incentive enough to use the card and to keep coming back), but the supermarket gets to associate your purchases with the purchases you’ve made every other time you visited the store.  Stores can be very clever about this, too… if you and your four roommates all use the same loyalty card, but frequently pay with different credit cards, they can learn something from this… if you and your spouse use different loyalty cards but the same credit card number then, again, they learn something different.

Some people are even trying to identify what your privacy is worth to these companies.

Countless examples exist in retail.  If your consumption doubles, then you may be shopping for two people; if your food choices get “healthier” you may be on a diet like
prosper wellness cbd; if you buy cupcakes and birthday candles at the same time every year there may be a birthday… if you buy the big blue candles in the shape of numbers, they can probably figure out what age the birthday boy is.  Facebook knows if you’re in a relationship — if you’re single, expect to see advertisements for dating sites at some point… if not, expect to see advertisements for Ashley Madison [thanks for pointing that out, Barrett].

And the article is correct — these data initiatives are powerful.  It is something most people don’t understand, and it has the potential to be very creepy.  But there has been a lot of backlash about whether or not it is “ethical” or if it’s an invasion of privacy, and I think that’s much harder to discuss.

The main argument that these ARE invasions of privacy is that the information gathered could have negative consequences or could be used for more nefarious means.  In the case of the Target article, some young woman was “outed” as being pregnant to people she was apparently trying to hide the fact from.  If it was a doctor’s office that had revealed the same information, very specific privacy laws could have been broken, but no such laws exist for shopping patterns, to my knowledge.  If the issue was not health related, it still could have been embarrassing — imagine if someone was shopping for an engagement ring, or was having an affair, or was planning to move or sell their house or send a relative to a retirement home.  Is it safe for a company to know this about us?

Or what if you were planning on murdering someone?  You obviously wouldn’t want them to find out, right?  ðŸ™‚

And what is unsafe about it, exactly… is the only risk that they’ll leak the information — intentionally or not — to someone we mind knowing?  And not just the people we’re trying to hide a secret (or a surprise) from, but are the people we “mind knowing” about us the entire general public?  Conspiracy theories abound — some more valid than others.  Someone that can know when you’re on vacation can know when to rob you. Someone that can identify who you are and where you are can profile, stalk, and do nefarious things to you, right?

Well… sort of.

It’s fairly clear that younger crowds that have grown up with internet access and the ability to connect to all their friends instantly have a different attitude towards privacy.  The backlash against the recent app Girls Around Me, which ties together information that users explicitly make available on Foursquare and Facebook and allows people to see who is nearby in a nice handy map with people’s faces on it. Is this a tool for stalking, or is it a viable social tool? All the data is actively being shared, and the legitimate uses are pretty cool — if you’re out wanting to hang out with friends, this is cool technology. This is exactly what Google Glass does when two people are trying to meet for lunch — it plucks their exact location from Latitude and puts them on a map.

I personally LIKE retailers to have a good idea of the things I want; advertisements tailored to me are better than the alternative. I already know too much about Kotex Tampons from mis-targeted TV advertising. It’s a win-win for the advertiser and for, well, me. To me, the potential downsides, such as the criminal (like burglars, stalkers, and identity thieves), or financial (retailers ARE trying to get more money in the long run) are outweighed by the benefits — lower prices, tailored content, and a more custom experience dealing with, well, everything.

I DON’T like the idea of my insurance company finding out how often I order pizza or what size pants I wear and then charging me more for health care because of it. But there’s a whole other side to that story — I’m sure tobacco smokers don’t like being charged extra either, but that’s actively happening and most people justify it.

The real problem is that people need to be made more aware of what their data can be used for. Young children need to be taught not only not to take candy from strangers, but also to not send risqué pictures of themselves to friends (a relatively new problem parents are facing). Consumers need to understand the risks of using loyalty cards and credit cards in terms of their shopping habits, and articles like the NYT Target article are one of the ways that education has to happen. However, I caution against a knee-jerk reaction that these things are bad. Yes, a poor girl’s pregnancy was announced to the world in an unfortunate manner, but had things gone the way they were intended, that girl would have saved hundreds of dollars using coupons at Target, and Target would have earned a loyal, and lucrative, customer.

This entry was posted in happytechnologist, Random Musings, Work. Bookmark the permalink.

2 Responses to Datamining, Privacy, and Ethics

  1. Pingback: » Big Data: It’s not the size that matters

  2. Pingback: Big Data: It’s not the size that matters « The Happy Technologist

Leave a Reply

Your email address will not be published. Required fields are marked *