Friday, January 27, 2017

Selective Sampling

I think that Donald Trump is really dangerous.

Here are some true facts.* These facts include moderately detailed descriptions of sexual assaults—skip past the bullet list if you like; the relevance of these facts will be explained below.

  • In 2008, a 54-year-old New York man named Donald Bowen traveled to Texas in order to pursue a sexual relationship with a 14-year-old boy he'd been grooming on the internet.
  • In 2009, another New York man, one Donald Caban, was convicted of molesting multiple teenage girls in his home over several decades.
  • Donald Darwell, of New York, is a convicted rapist. In 1978 he forcibly raped a woman at knifepoint. She was badly injured in the attack, and was unable to work for nearly a year.
  • In 2006, New Yorker Donald Valentine was convicted on two counts of sexual assault. His victim was a 16-year-old mentally disabled girl.
  • Donald Brown, a 42-year-old New York man, was apprehended in the act of the attempted rape of a 10-year-old boy whose parents had left him in his care.
  • 55-year-old Donald Glenn, of New York, convinced a reluctant acquaintance to go out on a date with him. That night, he overpowered and choked her before raping her both vaginally and anally.
  • Donald Jones struck a female stranger with his fist in 1993, then held her at gunpoint and attempted to rape her. When she screamed, passers-by stopped and he fled, before being caught by police and convicted of first-degree attempted rape.

([*] These are not quite facts, but they're close. In the interest of privacy, I have changed the surnames of these individuals. Importantly, I have left all of their given names unchanged. I've embellished these stories slightly, to give a bit more detail to hang on them. (I did this for vivacity, and to make the fallacy more tempting. I didn't have easy access to the actual details, or I would have used them.) I found these facts via the New York State Criminal Justice Services website.)

There's a real pattern to these facts, right? New York men named Donald are bad news! So many of them are sexual criminals! And believe me, folks, I just listed the first seven instances I found. I could give you a lot more.

OK, end satire. Obviously this is a terrible argument for being worried about people from New York named 'Donald'. A long, colourful list of terrible things Donalds have done carries practically no epistemic weight at all with respect to whether Donalds on the whole are dangerous. I didn't tell you how I collected my facts, but it's pretty obvious that I wasn't sampling randomly. Here's what I did: I looked at the New York state sex offender lists, and searched for people named Donald. They're long lists, and Donald is a common name, so it's easy to find a lot of examples. I knew before I looked at the list, on purely statistical grounds, that I was going to be able to find a bunch of bad Donald stories.

(This is an illustration of the same point I was making a couple of months ago in this post.)

If I'd sampled sex offenders at random, and almost all of them turned out to be Donalds, that would be a striking fact, and maybe it would make us worry about that name. Or if I'd randomly sampled some Donalds, and almost all of them turned out to be sex offenders, that'd be telling. But if what I did—and this is what I did—is go through the list of sex offenders and pick out every Donald to add to the list, it tells you nothing at all.

This week the Trump administration announced its intention to publish a weekly list of crimes committed by non-citizens. The point of this list is to make Americans think that immigrants are likelier than others to commit crimes. (This is very far from true.) It may succeed in its aim, but it will do so only to the extent to which Americans fall prey to this fallacious form of reasoning.

There are over 40 million immigrants in the United States. (There are about 1.5 million Donalds.) About 1% of the general American population is in prison, so if we took that as a very rough estimate (setting aside factors like systemic racial discrimination in law enforcement), we'd expect that there'd be at least 400,000 cases of relatively recent criminal convictions of immigrants. (And add in non-resident aliens, and we'd expect it to be higher.) Some of those alien criminals will have done some awful things. We know that in advance, on general statistical grounds.

Publishing a list of these cases—and doing so by the same cherry-picking methodology used above—gives practically no evidence at all that aliens are likelier to be dangerous criminals. The Trump list will have the same epistemic significance as did my list of Donalds above. If we avoid this fallacy, we'll be immune to this particular bit of propaganda.


  1. The more I re-read this, the more excellent I find it.

    1. Thanks! See, all we have to do is make sure everyone's a perfect Bayesian reasoner, and presto, we've fixed like half the world's problems

  2. Wouldn't it be better for the registry to be exclusively of aliens charged with SEX crimes? Just as you have used sex for its prurient value to your publication, the executive branch could use it for its own purposes.

    1. Would that be more effective harmful propaganda? Maybe. Trump's original remark along these lines basically said just that: Mexican immigrants are rapists.

  3. Anonymous Sense5/09/2017 02:01:00 AM

    Gonna get one last comment in here off the main stage before you aver on whether you will allow me to continue to keep participating.

    The reasoning here seems like it might also potentially debunk the idea of "rape culture", and the spurious correlation of all sorts of cultural phenomena.

    Also not sure why your readers weren't bothered by this one, nakedly discussing multiple cases of sexual violence.

    & I promise this is not a comment made in bad faith. I think it would be useful to differentiate, or fail to, between the kind of emotionally driven generalizations you're debunking here, and the generalizations some parts of your discipline accept with some certainty.

    1. I am a believer in data. If you want to know how common sexual assault is, or how often allegations are just made up, or how often survivors are subject to victim-blaming, I think that going by how many cases you notice would be a terrible methodology.

      Measuring these things isn't easy, but it can be done, and it's the only trustworthy method I know. If you want an example, I am pretty impressed by Jennifer Freyd's work along these lines.

      I don't understand your remark about why readers weren't "bothered" by my discussing cases of sexual violence. I think most of my readers understand that it's appropriate and important to discuss sexual violence sometimes. In this instance, I included a content warning for people who'd prefer not to read the descriptions.