Ad Clicks : Ad Views : Ad Clicks : Ad Views : Ad Clicks : Ad Views :

Jamie Bernstein

  • This year was an unusual year at the Boston Marathon. In a typical year, the top 15 winners in each gender division are given cash prizes with no controversy over whether they really placed in the top 15. This […]

  • I’d love to take your advice, but I typically don’t listen to anything random internet men say. If you could please provide you C.V., I will go over it when I get a chance and determine whether your comment is worth reading or not.

  • Welcome to the second installment of Mansplain Monday. Last time I wrote about men who mansplain statistics to me. Of course, as any woman on the internet knows, if you mention the word “mansplaining” ine […]

    • Since you’re commenting on economics and statistics where an academic and work background makes a difference, have the courtesy to put those credentials in your bio so readers know ahead of time whether they should even bother considering your views to be credible or not.

      • I’d love to take your advice, but I typically don’t listen to anything random internet men say. If you could please provide you C.V., I will go over it when I get a chance and determine whether your comment is worth reading or not.

  • Throwback Thursday is a series we sometimes do at Skepchick wherein we re-post old Skepchick pieces that have updates, are relevant again, or that we just really want to re-share. The U.S. is currently in the […]

    • Thank you for this. My son and his girlfriend have started talking about a baby in a couple of years. I know she’s antivax but I don’t to what level. I’m hoping I can just get her to open up and tell my what she thinks and why she thinks it. I think I’ve got the facts down pretty good (plus citations!). You give a lot of good information up above. I’ll need to think on it. I just hope she hasn’t gone to far down the rabbit hole on this.

      And also. A criticism. I really don’t like the way you state “Don’t lie about vaccines being 100% safe”. First, the “Don’t lie” part. That just bugs me. Hadn’t planned on lying at all.

      And then the context. I read a lot of provax stuff. All over the web. I’ve never seen anybody state that. Hey, it’s a big internet so somebody probably did. But I’ve never seen it and I’m pretty wide read.

      But I have seen the antivaxxers show up on a message thread and claim it. And it is always patiently explained to the troll that nobody claims that. There are side effects, swelling, redness blah blah blah. And then another thread and another troll claiming it.

      So. It bugged me and I thought I would point out why.

  • I actually thought the research showed that in general most people overestimate the percentage of the population of their own ethnic group. It makes sense when you consider the fact that we live in segregated areas, so the people you live around are likely to look just like you.

    A lot (A LOT) of the Hispanic population is going to overlap with…[Read more]

  • ThumbnailBack in November 2012 I went on a road trip from San Diego to Chicago. At one point I spent a full day in Memphis and while there took a ton of photos. I was taking some photos of the Mississippi river when I saw […]

    • Many years ago (about 1968, give or take a year) I had just read the Report on Unidentified Flying Objects, and decided to fake some pictures. I got a real insight into human nature. Most people who saw them were taken in, but when I explained how the pictures were made and pointed out the evidence that showed they were fake they were more amused than anything else. There were, however, a few who were infuriated, the one I remember most vividly said that it was people like me (bad people, presumably) who stopped people from believing in UFOs. He seemed to think I must be the spawn of Satan or something similar. True Believers in almost anything can be kind of scary.

      • I’ve had people cite groups of mostly non-Indian academics who claimed to be our allies. I started going through my social networks, finding ndn activists who may have heard of them, no one from the NCAI, from AIM, from Idle No More, or from any other org ever had. I called the person on this. He continued to cite them. I continued to call him on it. And I get blocked from more Facebook pages that way than any other.

    • btw – the trees in that Bigfoot photo look SO fake!

    • Question, related to the plausibility of Bigfoot existing:
      Some of the people who believe cite our finding of other new species, which is superficially fair, but when was the last time we found a land animal of a similar order of size that was a new species we hadn’t seen before? Also, is species the right level of differentiation? Genus? Family?
      I tried a little google, but my google fu is weak. It mostly returned a ton of dinosaur discoveries, which are great, but not what I meant =P

      • I think there was a member of the horse/zebra family discovered in the jungles of southeast Asia about 30 years ago. It is very rare and very similar to other wild horse species, not an extremely distinctive animal in an unexpected environment like a Bigfoot would be. People catching glimpses of it or discovering remains would probably have assumed it was one of the known horse species or a feral domesticated horse or pony.

  • It may seem unlikely that a telenovela adaptation about a 23-year-old virgin by choice who was accidentally artificially inseminated by her soon-to-be baby daddy’s gynecologist sister would be a bastion of […]

    • This doesn’t surprise me at all, the last “telenovela” to American television translation (Ugly Betty) was similarly deft at handling touchy subjects in a very progressive way (not surprising given that Salma Hayek was executive producer) while still being plenty goofy. I already watch too many shows but once I get bored with too many procedurals I’ll have to give Jane the Virgin a look since Glee has failed to scratch my “goofy melodrama with a heart AND a head” itch sufficiently, especially of late.

    • Especially important is that there is no way anyone can say her pregnancy is her fault. And she chooses (There’s that word, chooses. See, right-wingers? Pro-choice doesn’t necessarily mean everyone has an abortion.) not to have an abortion.

    • Interesting writeup. I’d never heard of the show (don’t turn the tv on much these days).

      I’ve also never heard the word “telenovela” before. What is it?

    • Another show I’ve found surprisingly progressive is Disney’s ‘Girl Meets World’, sequel the old ‘Boy Meets World’ from the nineties.

      The main characters are in the 7th grade (I think), but they’re up front with the idea that girls are attracted to boys. Sometimes I’m amazed they’re as forward as they are. It is Disney, after all.

      And their friend, Lucas, is a pretty good role model for young men, too. Though he’s a little over traditional.

  • In the tweet about Ann Coulter that I quoted, the person tweeting (actor David Anders http://www.imdb.com/name/nm1044403/) was calling out Coulter for using an ableist slur and then followed it up by calling her a sexist slur. At no point does he seem to realize that he is literally doing the exact thing that he’s calling out Ann Coulter for doing.

  • The thing that really gets me about the whole “What did she expect?” argument is that I don’t understand how that somehow makes it ok. I mean, I know that whenever I write about feminism I’m opening myself up to harassment, but that doesn’t suddenly make that harassment ok. Women who state their opinions publicly are targeted for harassment, so…[Read more]

  • ThumbnailTW for sexist slurs and threats of gender violence.

    Recently, the online discussion over the harassment of women has become a hot topic. We’ve dealt with our share of harassment here at Skepchick and we’ve <a […]

    • Jamie – thank you so much for this. Now I have somewhere to send people with the infrequent but regular comment like “but she calls herself ‘babe,’ what did she expect?”

      • The thing that really gets me about the whole “What did she expect?” argument is that I don’t understand how that somehow makes it ok. I mean, I know that whenever I write about feminism I’m opening myself up to harassment, but that doesn’t suddenly make that harassment ok. Women who state their opinions publicly are targeted for harassment, so should all women stop being opinionated? Would that solve harassment? If she is opinionated and then gets harassed, well, what did she expect?

        I can never tell if people who make that argument truly believe the women are at fault for their own harassment or just haven’t thought through the implications all that much.

        • I’ve had limited success using the analogy that if it was a black person, who somehow either worked being black into their public identity and/or used a nickname like ChocloateBabe, would it be an open invite to use racial epithets against them? Even if they themselves, used them would it make it ethical and rational for you to use them against that person?

          For some people I’ve seen the light bulb go up after that, but then there are others to entrenched with their self narrative, or simply a fan of racist humor, that it falls deaf on.

        • They also never apply the “What did she expect?” to their side.

          Did any of them say “What did he expect?” about the Rosetta scientist that caused #shirtstorm?

          Just as it is easy to predict the kind of sexualised and violent response a woman will get on the internet, it is equally easy to predict that someone doing something sexist in a very visible way will get criticised.

          Yet they don’t apply the same bullshit argument consistently, complaining instead of how unfair it is to oppress them with criticism, and they do have a right of free speech don’t you know…

          So they try to excuse appalling behaviour on the basis that it is very common, appalling behaviour, which is somehow supposed to make it OK, but they hypocritically fail to use that argument when it comes to criticism that is in response of appalling behaviour.

    • Even if you’re not a feminist, even if you have the lowest possible opinion of women in general, this is still true: name-calling hurts your argument. Every time. Name-calling tends to happen in that last flailing attempt to win a debate, and we have ALL OF SCIENCE on our side. There is absolutely no reason to call her names, be they gendered or otherwise, when you can just micdrop some science and win every time.

      • The very first comment that Jamie quoted uses the phrase “Vani, you ignorant slut…” I doubt the commenter knows the origin of that phrase, since it is a concession that he has no rational argument and is resorting to insult.

        It comes from a segment in the early years of Saturday Night Live which parodied a segment on 60 Minutes called Point/Counterpoint, where Dan Ackroyd (playing a conservative) would “debate” Jane Curtin (taking the liberal position). Jane would invariably have facts and rational argument on her side, and Dan, having nothing, would lead his reply with “Jane, you ignorant slut”, and follow it with a string of slurs, peppered with false claims and illogical arguments. So using this phrase is a concession of defeat. Or at least it was intended that way. People like the first commenter seem to think it is a clever rejoinder.

        In the real Point/Counterpoint, James J Kilpatrick was a sexist reactionary, but I don’t think he ever called his opponent a slut. At least, not on the air. (Tried to find some original segments online, but the only one I could find was a New Year’s bit from 1978 without much rational content from either side.)

        TL;DR: Yeah, use science, not slurs.

    • To anyone not convinced that this type of argumentation is harmful, let me add this: Think of the collateral damage. For a moment, forget about the target of the abuse. The fact of this abuse being thrown out there (much of it in public) normalizes it, both in the mind of the abuser and in the mind of anyone else who sees it. So if you say it’s okay to abuse Vani Hari because she calls herself “Food Babe” and you think she’s wrong about a lot, you’re opening the door to someone else thinking, “Well, Rebecca Watson calls herself ‘Skepchick,’ and I think she’s wrong about a lot….'”

      You can’t have it both ways on this. If you try to split hairs, even for people you really really hate, you make it alright for other people to split hairs for people they really really hate – who might be people you like or love. The only ethically consistent stance, the only way to move to a culture where women in general don’t have to put up with this kind of abuse for being openly female on the internet, is to stand against this type of harassment no matter the target.

    • As the British comedian David Mitchell once said, in response to jokes about the appearance of Anne Widdicombe:

      “Is this the best we can do? Make fun of her looks? What’s bad about her is everything she says and everything she does and we call her fat? Okay then.”

    • This is a little different from Sarah’s post. Sarah wasn’t saying that no one deserves rape threats or other such abuse. She was arguing for the presumption that all criticism of famous women was based on misogyny, which I must disagree with. Obviously, we shouldn’t be sexist towards Sarah Palin. That doesn’t mean, however, that all dislike of Sarah Palin is based in misogyny.

      • That isn’t what Sarah said.

        If presumption* means what you think it means, no one could ever be found guilty of any crime.

        A presumption is a starting point, not a postulate and not a conclusion.

        [*] I hope this works. I’m at work and my PC is running Windows Server, which doesn’t play videos.

        • The precise word she used was “assume” all criticism of famous women is rooted in misogyny.

          • So much criticism of famous women comes from a misogynist position that the presumption is warranted until contrary proof can be established, call it the null hypothesis.

            I would liken it to reports of celebrity deaths on twitter, always assume it is a hoax unless confirmed by outside sources.

            It’s not that the misogyny and the hoax is always the case, just that starting there will save you some legwork and keep you from being wrong more often than not.

          • I used the word “presume” in my response to you because you used that word. Yes, she originally said “assume”, not “presume” (you are the one who misquoted her), but in context, she said “I operate under the assumption that all criticism of famous women are rooted in misogyny until I find conclusive evidence otherwise.” “assume … until” means the same as “presume”.

            My dictionary defines presume as “to take for granted; accept as true lacking proof to the contrary; assume; suppose.”

            How dare you misquote someone and then play gotcha with words because someone used the same words you used in the misquote instead of the original words with identical meaning? You are proving Sarah’s and Jamie’s point.

      • I am understanding Sarah’s argument to be that in her experience if someone is criticising a famous woman then the prior probability is that it will be based on sexism is.

        It doesn’t mean that the posterior probability will be the same as the prior probability, but by then you will have some evidence either way.

    • There is no excuse for misogyny, rape threats, comparisons to Hitler etc. Simple as that.
      Less evil but close to encouraging it, is ignoring when “your side” does it. I have yet find Vani Hari or anyone in her camp discussing or even protesting these things towards the skeptic community. It’s ignored and it goes on.
      As long as Vani does not address it in the way it is done here or as Science Babe did she is part of the problem. Science Babe got banned for politely pointing it out. Vani uses it to argue she is right. She is, in effect, a cheerleader for anti-skeptic bullying. Sadly I have to add this, lest people draw the wrong conclusion: bullying FB cs is very wrong.

    • I can never thank Dr. Gorski enough for supporting women who are targeted for misogyny-based harassment. It’s not something he’ll ever personally face, but he addresses the issue anyway. I admire that kind of integrity and compassion.

    • I prefer to stick with calling everyone an asshole. Everyone has an asshole! Even non-binary folks.

      And this is never without a well-reasoned response on why that person is an asshole. Typically requires some willful deception of others or something actually intentional. Without knowing exactly what you’re doing, when you’re doing it, you can’t be an asshole. I’d argue Food Babe knows EXACTLY what she’s doing when she does things like call out Starbucks for not having any pumpkin in their pumpkin spice syrup, and then immediately endorse another product from a company that is SPONSORING HER, which also does not have any pumpkin in it.

      Otherwise, I’ll say “ignorant.” I usually give the benefit of the doubt that someone just legitimately doesn’t know what they’re doing unless they provide a lot of proof to the contrary. I think we should move away from racial or gendered slurs anyways. It’s not like people are ever going to stop calling each other names, but at least the names can focus on the behavior, or qualities of the person, and not just on “lol u hav vagina, shut up git ‘n kitchen.”

    • I would rather spend the rest of my life locked in a room with Vani Hari, Jenny McCarthy, Sarah Palin, Ann Coulter, and Sylvia Browne than be associated with the people making those kinds of comments. Please, folks, if you can’t give up the gendered insults, comments on women’s appearance, and rape and other violent threats, then could you at least stop calling yourselves skeptics? You’re making us look bad.

      • Exactly, and while your at it stop calling people stupid just because they disagree with you, it makes you look unskeptical.

      • You do realize that at least one of those people allows such comments to remain on her pages and site when they are directed to her opponents and bans everyone politely pointing that out? Which, in my opinion, encourages such comments. I have yet to see an outcry and banning of Vani supports by Vani and her supporters. If both sides would combat it, we could get results. I would rather sit with Science Babe and the moderator(s) of Banned By Food Babe in a room, who do ban people who make vile comments, no matter who they support.
        In short:
        Food Babe says ‘they [implying every opponent] should stop’
        Science Babe says ‘those among us and those among them should stop’ and does something about it too.
        Being in a room with people encouraging it by looking the other way is as bad as being associated with those who do it.
        I’ve had the whole standard set from the idiots among the FB crowd: I should be killed, I poison my kids and the hope of dying a slow painful death has been my part. But Vani never spoke about such things.

    • Long ago, Bill Maher referred to Sarah Palin as a “twat” and a “cunt” and called her son “retarded.” No feminists — and no associations for people with disabilities — spoke up. I was furious. QUOTE: not to mention callingPalin a “cunt” (“there’s just no other word for her”) http://www.mediaite.com/tv/bill-maher-calls-sarah-palin-the-c-word-during-his-stand-up-act/

      • No feminists spoke up? I guess NOW aren’t feminists? It really only took a simple Google search to turn up this and many others like it: http://www.dailykos.com/story/2011/03/27/960371/-Why-we-must-defend-Sarah-Palin

        Yes, Maher was wrong. Yes, there were far too many on the left willing to defend him, just like in the case of the misogynist “skeptics” referenced here.

        However, it’s just flat not true to say no feminists spoke up. Plenty did.

      • “No feminists — and no associations for people with disabilities — spoke up.”
        That is completely untrue. I know various writers at Freethought Blogs were all over it, and I definitely read criticisms of Maher’s sexism on other feminist blogs at the time. “I didn’t see X,” and, “X didn’t happen,” are not equivalent statements.

    • I’ve seen countless people use misogynist slurs in criticisms of other people for using racist slurs or homophobic slurs, for example, with a total lack of awareness that they are replicating the exact same problem they’re criticizing. It’s so utterly disappointing; sexism is about as rampant in otherwise-liberal circles as it is in the culture at large.

      • In the tweet about Ann Coulter that I quoted, the person tweeting (actor David Anders http://www.imdb.com/name/nm1044403/) was calling out Coulter for using an ableist slur and then followed it up by calling her a sexist slur. At no point does he seem to realize that he is literally doing the exact thing that he’s calling out Ann Coulter for doing.

  • Yah, I thought about going into other issues with chocolate and cocoa bean growing but I didn’t have time to do all the research that would be needed for an article like that. It’s not clear to me whether cocoa substitutes are more environmentally friendly or human-rights friendly than growing cocoa beans themselves. Agriculture in general is…[Read more]

  • ThumbnailA couple days ago my Facebook feed blew up with everyone sharing a Chicago Tribune article predicting a bleak future where our tongues will never again taste the dark, nutty flavors of a slowly melting chocolate […]

    • Paid for by Big Chocolate…Now I’m thinking about a big chocolate bar fighting the Ghostbuster’s Marshmallow Man…smores…

    • Chocolate’s not disappearing. It’s just getting bad unless you get the expensive stuff. (Seriously, look at how many have replaced cocoa butter with palm oil or partially hydrogenated soybean oil.)

      There are, however, other issues with chocolate. Palm oil has deforestation issues and displacement of indigenous peoples in the Indonesian jungles. Cocoa itself has, well, slavery.

      • Yah, I thought about going into other issues with chocolate and cocoa bean growing but I didn’t have time to do all the research that would be needed for an article like that. It’s not clear to me whether cocoa substitutes are more environmentally friendly or human-rights friendly than growing cocoa beans themselves. Agriculture in general is pretty awful for the environment and often comes hand-in-hand with human rights abuses. But, we can’t just not have agriculture since we need it to feed all the humans in the world. We can survive without chocolate though so maybe chocolate is not the best use of our agriculture. On the other hand, wouldn’t this just increase demand for chocolate substitutes and aren’t those just as bad on all these fronts?

        So yah, it’s a bit complicated.

        • To be fair, there’s always the option of simply doing without.

          I also bring up indigenous issues also because most people know little about such things. And it’s hard to place indigenous issues within our traditional understanding of politics, and we’re generally inconvenient, so we’re ignored.

          • “Failure is not an option!”

            Seriously, I agree. People should always take advantage of any opportunity to learn about how their choices and actions affect other people, especially when the other people don’t have equal freedom of action. Thank you for pointing this out.

    • Sigh. So running out is the same as producing tons of it and consuming it, right? Like how the dodo being extinct means that we breed millions of them and eat them all every year?

    • “We certainly aren’t “running out of chocolate” though, so stop worrying and go eat some Ghirardelli’s.”

      That sounds remarkably like Marie Antoinette’s infamous “Let them eat cake!”

  • Yep, like mrmisconception said, it’s a slur often slung at online activists and feminists like Skepchick. It’s most often used to construct strawman arguments for why activists are just being outragey for the sake of outrage rather than actually accomplishing anything. Case in point:…[Read more]

  • ThumbnailIt’s Thursday and that means it’s time for another round of Bad Chart Thursday. This week, rather than make fun of a bad chart, I was inspired to write a bit about bell curves and more specifically Sam Harris’ […]

    • Not about the math, but the literature (Oh, the Humanities!) …

      Wasn’t the entire point of These aren’t the droids you’re looking for that they were the droids the Imperial Stormtroopers were seeking? In other words, it was a lie. Is Harris tacitly admitting to everyone who understands Jedi mind-games that he is in fact a Sexist Pig? Do we need to do math to prove it, or can we just take him at his word?

      P.S. I hope this is followed up by several hundred comments properly analyzing the tails of the bell curves (i.e. integrating from -∞ to -1 the men’s and women’s estrogen-vibe as a function of the difference in the means for each, then (by counting trolls), determine the best fit. It’s been decades since I’ve done calculus in earnest, but I think we could determine both the estrogen-vibe concentrations and the troll cut-off by making the sum of the integrals and the ratio of them match real-world troll statistics. This program of research should provide the fundamental theorems of the new sciences of statistical trollology and estrogen-vibrology.

    • Is it possible that Sam Harris made mistakes, that maybe he’s ignorant of the science that says the apparent difference in the genders is mostly cultural/experiential, not genetic, and that he’s not a sexist pig a heart?

      • Possible? Yes. Likely? No.

        The tell? He doubled-down rather acknowledging, and apologizing for, making the original ignorant comments.

        The double-down is always the tell.

      • #1) That doesn’t excuse it.

        #2) If it oinks like a pig and rolls in the mud…

        #3) Nobody is calling for a BBQ.

      • He likely is ignorant of the science. But after the response to his initial comments he could have taken the opportunity to rectify this before posting his full response.

        However, he seems to prefer to defend his position rather than consider that it could be wrong. To me this indicates that he has internalized these views to some extent.

        What is sexism if not internalized — and inaccurate — views about gender?

        • This seems to be, sadly, almost like a theme song, or a running gag?, or maybe some sort of viral disease, among the self appointed “leaders” of atheism. Self appointed, because, apparently, being there first privileges you to a) never be wrong, b) know how to do the job better, and c) magically know when a problem is real, or just imaginary. I am sure, back in the days of the alchemists, there where similar self appointed “experts” complaining about how the new crowd where totally silly for suggesting that maybe all the mumbo jumbo they where doing didn’t actually work all that well, and maybe they where, like.. missing something, or something. This is just more of the same, from people who have been so “certain” that social issues shouldn’t be part of the movement that they can’t even see the very social issues waving its collective… johnsons in their faces.

      • I don’t get what your point is. Like, “sexist pig” is just a subjective label some people think applies. You might not, but what’s the practical upshot either way? His ignorance of the science discounting genetics as a cause of gender differences is inexcusable, and therefore sexist, and it’s important to talk about. Why is it important to you that no one call him a pig in the course of discussing it?

      • I’m not trying to defend Harris or anything, but I think at least some of this article contradicts the point it was originally trying to make. It starts off talking about how men and women are not, psychologically, any different, and then says

        “Lived experience does make us women different in ways that are so large that we can see them without the need of a scientific study. We can all see that men tend to like Sam Harris much more than do women. It is far more likely that something about the experiences of men versus women in our culture is the explanation for the differences rather than some psychological difference in our brains.”

        Does it matter if the differences Harris was talking about were genetic or cultural? In his own words, he was “talking about a fondness for a perceived style of religion bashing,” which is not necessarily something that stems from a genetic source. Harris was just saying that men and women are different, which this article agrees with. Now, calling it the “estrogen-vibe” may have been somewhat of a misnomer, seeing as it is more of a cultural/experience thing rather than a hormonal one, but there is a difference between the genders.

        I guess what I’m trying to say is, this article feels unnecessarily nitpick-ey for a topic that really doesn’t need its nits picked. Just say “yes, actually he is sexist” and you’re good.

    • Now what he could have said is that women have been rather less interested in attending meetings run by the old atheist establishment because it was a rather hostile sexist environment.

      Or he could have just asked the journalist what the evidence was for the original assertion that more women buy his books than men.

      Every book bought in this house goes on my amazon account. And this is surely the norm in most houses due to the idiotic way Amazon works. If we bought books separately then we would have to have a separate Kindle for each account. So if you looked at Amazon data it would appear that there is a huge skew in the purchases of books with one person buying 100% and the rest zero.

      • PS what is an SJW?

        • SJW stands for social justice warrior, it is used (usually in a derogatory manner) to describe a certain type of online activist. It suggests that their efforts are superficial and overly broad.

          There is an attempt to reclaim the idea but I’m not sure it will help.

          • There is a specific image of a SJW, though. Usually the big problem is the tendency for white people to speak for POC, and they get quite hostile if you tell them they’re wrong about your experiences. (I know Indians who have received death threats over this kind of thing.) There’s also a certain degree of US-centrism that I find irritating.

            The problem is that this legitimate criticism is invariably taken over by racists, MRAs, and the rest of the scum of the internet.

            • I wouldn’t suggest all social justice blogs (especially those that are overly casual) are all equally committed or even correct. Believe me, as someone who spends time on Tumblr I understand the over-reach of the SJ blog. As you suggest, the term has been co-opted by racists and MRAs so as to make it even worse.

            • Yeah, but it’s pretty bad when you have people who take their memes seriously. One of these days, I’ll dedicate a blog posting on my Facebook page to it.

              For me, it’s nothing new; Indians have been tangling with animal rights groups since the 80s. But surprisingly enough, the Ted Nugents of the world sided with the animal rights activists in those cases.

        • Yep, like mrmisconception said, it’s a slur often slung at online activists and feminists like Skepchick. It’s most often used to construct strawman arguments for why activists are just being outragey for the sake of outrage rather than actually accomplishing anything. Case in point: https://twitter.com/RichardDawkins/status/482781472961859584

          I put it in as a bit of a joke (but perhaps a bad one). It’s meant to be read with an eyeroll.

          • I think most of us got that it was a jab at some phrase that members the New Misogyny use. It got a chuckle from me, even before I saw the explanation.

    • Next article by Sam Harris: “I’m Not the Not-Bell-Curves-Understanding Pig You’re Looking For.”

      • Sam Harris, turn in your nerd card. Do you even Star Wars? (Because, you know, those were the droids they were looking for.)

        Plus, as of the Clone Wars cartoon, we now have Force gods. Three of them. (There was a fourth, but the recent torpedoing of the Expanded Universe ends that.) But that’s another thing entirely.

        • So the fourth is no longer with us?

          • She was only in novels that are no longer canon.

            The cartoon is still canon, though. But the three of them are all dead. You can probably find it on YouTube. (The fact that the Holiday Special can be found on YouTube is proof Lucas has little control there.)

    • I was recently analyzing the results of a GWAS (genome wide association study) on intelligence, where (like all such studies), they make genes and environment independent and additive. I started thinking about how does that affect things when we know that genes and environments interact?

      The example I started thinking of was that of stereotype threat.

      http://en.wikipedia.org/wiki/Stereotype_threat

      The classic example is PoC doing poorly because they are expected to do poorly and if they don’t do poorly they call attention to themselves and get beat down. It is pretty well established, even if people object to it.

      The premise of GWAS studies is that genes and environments are independent. What that means is if one set of genes does poorly in the same environment that another set of genes does well in, it is the fault of the inferior genes. This is an artifact of the analysis, because gene-environment interactions are assumed to be zero; that is gene effects are independent of environment effects, there is no interaction (in the model, not IRL).

      This of course holds for XX vs XY interactions, and has been demonstrated to hold for mathematical ability. In other words, women exhibit stereotype threat and do poorly at math when they are in the presence of people who expect them to do poorly at math because they are women.

      Are behaviors in the Atheist and Skeptic community that Sam Harris is trying to rationalize as being du to XX vs XY really due to genetics, or are they do to a genetic-environment interaction, as in differential stereotype threat for XX vs XY?

      We could do an experiment. Count up the numbers of death threats, rape threats, and other threats of violence that XX people receive and that XY people receive.

      My guess is that the writers on Skepchick have each received at least an order of magnitude more threats of various types (even after going to considerable effort to reduce them) than has Sam Harris.

      I would be interested in seeing a graphic of “threats” vs number of X chromosomes for major figures in the Atheist and Skeptical community. It would probably need to be a log scale to keep the numbers on the same page.

      • Good comment, but I wish you hadn’t equated woman=XX and man=XY. Transphobia can be just as unintentional as sexism.

        • I meant no transphobia, and apologize if any was perceived. The OP was about “estrogen-vibes” and “essential” characteristics of people (which I appreciate are not genetic). The Sam Harris meme is that the “estrogen-vibe” is genetic and due to elevated estrogen due to more active lady bits.

          A graph that includes threats against every category of person would illustrate the extent that privilege shields people from threats. I know that people who are trans receive many threats too.

          As a cis, white, straight, male, I can’t remember having ever received a rape or death threat. (no, that is not an invitation to start send them to me).

          Maybe someone who knows more statistics than I do could do a multivariate analysis. and include a few dozen categories. I am pretty sure that my demographic; cis, white, straight, male would be in the lowest threat category. If someone could automate that and do it on twitter, you could get a real-time measure of which kind of privilege is trending.

    • Ask not for whom the Bell Curve tolls, it tolls for thee, Mr. Harris.

      • Among ethnic minorities, the phrase ‘bell curve’ is particularly hated because of its association with ‘race and IQ’ bullshit.

    • Jamie, this is an incredibly well-written explanation. I laughed AND I learned. I laughearned. Learghed.

      • Rebecca, I really would like to see a comparison of rape and death threats received as a function of number of X chromosomes. My guess is that you are the all time leader.

        Among the menz, maybe PZ would be the leader, but he is probably 2 orders of magnitude lower.

        This would be objective data demonstrating that the “equal environment” hypothesis is fos.

      • I agree with Rebecca. This is a really well-written piece of work! Jamie, you did a masterful job of intermixing the humor and the statistics. Probably my favorite BCT ever!

      • Legumes! Am I doing it right?

    • Hi – a friend pointed me to this post. I am also a data, stats, policy, and economics nerd so the material here interests me.

      So obviously I think you’re right that “estrogen-vibe” is a ridiculously vague, stupid quantity. But for the sake of argument let’s say it’s measurable (as you’ve already done in your graphs). I’m not quite understanding how your criticism of Harris doesn’t rely on some really strong assumptions about that distribution. I can think of three:

      1. You assume it’s normally distributed across the population.
      2. You assume the dispersion across the population is the same, and
      3. You assume the cut-off for Harris-obsession is pretty far in on the distribution.

      None of these seem like the obvious assumptions to make, but if any of them are tossed Harris could be exactly right (again, assuming the underlying theory of estrogen-vibe is sensible and measurable!).

      It’s not so much that I disagree with Harris. I don’t know enough about behavioral differences across sexes to have a strong feeling on it one way or another. I’m just confused about your argument. There’s nothing here that really demonstrates Harris is misunderstanding anything.

      • ^ The above should read “it’s not so much that I DON’T disagree with Harris”.

        Granted, that’s functionally equivalent to “it’s not so much that I disagree with Harris”, but I wanted to signal that my hunch is that there’s a lot wrong with what he’s suggesting :)

      • Hi, welcome to Skepchick.

        Since you are new here I thought I would let you know that Bad Chart Thursday is a running gag. Jamie uses it to show how you can manipulate perceptions using bad data represented poorly on a chart.

        Most of the time she tackles poorly thought out arguments being passed off as truths but here she chose a topic that was already being discussed and regarding a non-existent thing called estrogen-vibe. It was a tweak at Sam Harris’ nose as much as a dissection of bell curves.

      • 1. Assuming a normal distribution is not a poor first attempt to understand a laughably false quantity such as ‘estrogen vibe’. (yes the assumption of normality should be tested as soon as a reliable measurement can be made, but I expect that to take a long time like… forever.) There are also ways to process data to obtain a normal distribution even on non-normally distributed data, but without knowing the fictitious distribution it is difficult to compensate.
        2. Since the ‘estrogen vibe’ is fictitious, I would imagine the dispersion is reasonably comparable.
        3. Ahh, in this point you may have found the Achilles heal of a fabulously written piece. I assume the author was generous with the placement of Harris-obsession cut-off in deference to his outsized ego. But the author is likely saved by the fact that it is a safe assumption that the separation in means between the male in female distributions of ‘estrogen vibe’ are fairly identical (i.e. non-existent)

        I hope my understanding has been of service.

    • Could you suggest some curves for being sexist pigs? Does it have a strong gender bias, or is it, like cognitive differences, minor? Is it related to chromosome ownership? What are the actual causes of these differences?

      If Dawkins, Harris, Hitchens, Shermer are all sexist pigs, of varying degrees, and fall in different areas of their bell curves than, say, PZ Myers, do you have any suggestions why this might be?

      • I think it has to do with biological determinism. If you start to think “atheism is Darwin” and people tell you over and over “evopsych is Darwin”, then by the transitive property, you get “atheism is evopsych”.

        But then you have the simple fact that biological determinism really has little to do with how religious one is; after all, skull-measuring antedates Darwin (the link in this syllogism) by quite a few years.

    • I LOVE this chart. It summarizes everything about Harris.

    • I read it more as Sam Harris claiming that he’s only liked by men who are more than a couple of standard deviations off the norm. If instead of looking at the population close to the mean you look at people who are, say, more than 3 standard deviations from the norm a small difference between the genders will produce a disproportionate imbalance.

      Whether you interpret that as “only the trolls”, “only the estrogen-haters” or “just the wacko fringe”, I find it hard to see as complimentary to Sam. But maybe to him it is – “I speak to my fellow gynophobes, the rest of you can GTFO”. Why he’d bother saying that to a wider audience I’m not sure, but I expect he has a rigorously logical explanation that my estrogen-addled brain just can’t grasp.

    • I have read a couple of responses to Sam Harris’ recent statements, and I have a few comments.

      1. Everyone knows he is tactless. I am an atheist, male, and I am pretty straightforward about condemning nonsense when I see it. I do however recognize the point at which your attitude and demeanor is detrimental to your cause. I believe Sam Harris is beyond that point, and this whole situation is just one example.

      2. His response was poor. From saying: “I have a wife and daughters, how could I say something sexist?” To his general misunderstanding of why people are mad at him. This is similar to what Richard Dawkins recently went through where he was backpeddling through poorly thought out Twitter posts.

      3. What I will say however, is that I do have a distaste for the way non-traditional social science models are attacked for being intrinsically sexist.

      A biological justification for discrimination (whether it be racial or sexual) has been a tool in the asshole’s arsenal for centuries. It has been used to subjugate people, justify horrendous genocide or slavery, etc. It is encouraging to see that any possible attempt to imply similar value judgments based on sex or race is quickly attacked, as it rightfully should be.

      But I don’t think that is what is happening here.

      The leap from Sam Harris saying: “I think women find my abrasive style more distasteful than men do” to “ergo, women are gentler, and should be relegated to submissive social roles” is not a leap that Sam made.

      It is understandable why that leap WAS made by some people however, as patriarchal societies have (and still do) use that very argument constantly as justification for sexual discrimination and segregation.

      To me he was attempting to make a large scale observation, much like someone observing differences is fan bases. Both men and women are often turned off to his ideas because of his abrasiveness, his observation was that based on attendance to his talks and interactions he has, he believes this to be the reason why he sees more men than women.

      So did he make conjecture based on anecdotal observations? Yes. But what if he is right? Simply observing an empirical generalization to be true in one case doesn’t mean he is sexist.

      I tend to dislike the majority notion that humans are tabula rasa, and all gender roles are products of socialization and culture. Because if you look at other primate species (and other animals in general), it becomes clear that having the female and male brains be 100% identical and function identically makes no sense in evolution. The value judgments on superiority and inferiority ARE socialization, and ARE a product of culture, that is without a doubt. I think the massive variety of cultures and personalities shows how socialization can alter innate positions to nearly any degree, but the ability for socialization to “move the needle” doesn’t say anything at all about where that needle starts.

      If population level behaviors have both biological and social contributing factors, only addressing a social factor is ignoring part of the problem and therefore missing a more effective solution.

      4. To comment on her link, she links a pop science book (a good one) as a means of refuting observations made by evolutionary psychologists. She then backtracks and says

      “and when differences are found, it’s unclear whether they are themselves innate or a product of our culture and experiences. If differences exist at all they are quite small and can only be seen in the aggregate.”

      That is my whole point. Evo-psych is only around 25-30 years old as a formal science discipline, and the nature of Evo-psych (when specifically applied to humans) makes it really really tricky to actually prove anything, because we can’t observe human evolution and look at the brains of previous generations. So we have to rely on population level empirical generalizations and comparative biology to make predictions about our own psychology. You look at the concepts of anisogamy and fitness variance among sexes. In these species, whichever sex (usually male, but often female) has a wider fitness variance is more aggressive as it has to compete for mates, whereas the sex with the more valuable (in reproductive terms) sex cell has a narrow fitness variance and does not benefit as much from competing. So looking at population level statistics on violent crime and aggressive behavior, you can compare humans (sexually dimorphic, anisogamous, where males have higher fitness variance) and say: “well if it was another species I would say this completely jives with evolutionary biology that the males of this species are more aggressive.”

      Now the complicating factor is that humans have an immensely intricate social and cultural structure, which can completely override any and all biological impulses, even the most basic ones (sexual reproduction, hunger, self-preservation.) So while socialization can completely erase any effect on an individual (suicide cults, unics, celebates, suicide bombings, etc. etc.) for the average person, the effect of socialization is probably not that extreme.

      So Evo-psych is a really difficult field because it’s extremely difficult to tell where biology ends and culture and socialization begin, but you can be certain in saying that biology has a non-zero effect on behavior. Once everyone agrees on that, the conversation can be more open about the various degrees of influence that each party plays.

      • Because if you look at other primate species (and other animals in general), it becomes clear that having the female and male brains be 100% identical and function identically makes no sense in evolution.

        Bit of a blanket statement, there. I can’t speak for other species, but for human beings it runs contrary to biology.

        So while socialization can completely erase any effect on an individual (suicide cults, unics, celebates, suicide bombings, etc. etc.) for the average person, the effect of socialization is probably not that extreme.

        You might want to look into WEIRD.

        in the Muller-Lyer illusion, most people in industrialized societies think line A is shorter than line B, though the lines are equally long. But in small-scale traditional societies, the illusion is much less powerful or even absent. […]

        Textbooks also frequently describe people as valuing a wide range of options when making choices, being analytical in their reasoning, being motivated to maintain a highly positive self-image, and having a tendency to rate their capabilities as above average. Again, the review article contends, this picture breaks down for people from non-WEIRD societies: These groups tend to place less importance on choice, be more
        holistic in their reasoning, and be less concerned with seeing themselves as above average.

        While the authors of the actual paper are careful not to deny biological explanations, they do state:

        Thus, our thesis is not that humans share few basic psychological properties or processes; rather we question our current ability to distinguish these reliably developing aspects of human psychology from more developmentally, culturally, or environmentally contingent aspects of our psychology given the disproportionate reliance on WEIRD subjects. Our aim here, then, is to inspire efforts to
        place knowledge of such universal features of psychology on a firmer footing by empirically addressing, rather than a priori dismissing or ignoring, questions of population variability.

        Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. “The weirdest people in the world?.” Behavioral and brain sciences 33.2-3 (2010): 61-83.

    • Ah, those charts take me back. Anyway, I can’t help but share my favorite scientific paper on sex differences.

      The gender similarities hypothesis stands in stark contrast to the differences model, which holds that men and women, and boys and girls, are vastly different psychologically. The gender similarities hypothesis states, instead, that males and females are alike on most— but not all—psychological variables. Extensive evidence from meta-analyses of research on gender differences supports the gender similarities hypothesis. A few notable exceptions are some motor behaviors (e.g., throwing distance) and some aspects of sexuality, which show large gender differences. Aggression shows a gender difference that is moderate in magnitude.

      It is time to consider the costs of overinflated claims of gender differences. Arguably, they cause harm in numerous realms, including women’s opportunities in the workplace, couple conflict and communication, and analyses of self-esteem problems among adolescents. Most important, these claims are not consistent with the scientific data.

      Hyde, Janet Shibley. “The gender similarities hypothesis.” American psychologist 60.6 (2005): 581.

    • Harris then posted a response to all the criticism where he assures us all that he is totally not a sexist in a piece he calls “I’m Not the Sexist Pig You’re Looking For.” Sure, some of his statements may seem sexist, but according to Sam Harris it’s totally not sexist if it’s based in scientific fact and everyone just knows that science says ladies don’t like Sam Harris because of their estrogen-vibe.

      This makes me wonder what the heck Harris thinks sexism even *is*. I mean, his remarks are blatantly sexist, and if he doesn’t understand that, I question how much he even “gets” sexism.

    • Sure, Harris is wrong and sexist, that’s why the whole concept of targeted audiences is just bogus ::eyeroll::

      • ??? What the hell are you trying to insinuate? It may help to actually flesh out your ideas and opinions, rather than leaving very vague, rather empty comments. YOU SURE SHOWED US!

        • What I’m “insinuating” is that this is much ado about nothing. Why are there films labelled “chick flicks”? Why do men buy Tom Clancy novels? Why do politicians draft policy to “target women voters”? Why to market analyzers segment products by demographics?

          Because different things appeal to different groups of people, in aggregate, as Harris pointed out. And BTW, his use of “critical” has been equivocated to mean “critical faculties” whereas he meant it as tenor of debate. Whether you think that’s sexist or not, at least get his meaning right to begin with.

          • Because people are fucking apes. That doesn’t mean that we shouldn’t try to change things.

          • Yeah, because as everybody knows, marketers and corporations are manged by emotionless robots which are immune to cultural conditioning, right?

            • It doesn’t matter what their conditioning is, the fact is that if the targets didn’t exist their performance would suffer, the product wouldn’t be bought, the movie not watched, the candidate not voted into office. The market, as it were, would have spoken. Now, you’re free to say that “the market” operates on motivators that are determined both by nature or nurture. Incidentally, I think that’s just what Harris said. Or, you’re free to say it’s just nature or just nurture. It’s your opinion to give. Did someone solve the nature/nurture debate while I was sleeping?

              I just don’t think it’s right to accuse someone of sexism because, in his opinion (which is what it was, after all) various demographics are attracted to different things and that it’s partly based on nature. He may be wrong! That’s kind of how opinions work.

    • I had an idea on how to help with all the misogyny, hate and death threats on line, but it will take cooperation from ad providers like Google.

      The idea is that when a site posts a link or a threat or hate speech against a specific person, that the ad provider to that site diverts ad revenues to that person for all views of that hate speech.

      This wouldn’t be a big deal for sites that are not using hate speech to generate clicks, but if a site is using hate speech to generate clicks, it would backfire by diverting the click revenue to the victim of their hate speech (or the victim’s designated charity).

      It would also give feedback as to how much hate speech there is going on, and from where.

    • You know, Sam Harris has gotten lots of flak for this “estrogen-vibe” comment, as though it were some sort of vague, ill-defined concept with no scientific basis. Hogwash, I say! Sam Harris is a scientist, you guys, he wouldn’t do something so disingenuous as to refer to endocrinology to make his argument seem grounded in biology while in fact only using “estrogen” as a metaphorical stand-in for culturally prescribed feminine traits.

      “Estrogen-vibe” has an obvious meaning. The bonds in every molecule, including the estrogens, have natural vibration frequencies. These natural vibration frequencies are important for understanding the biochemical properties of the molecule. Sam Harris is obviously referring to a recent paper which characterized the vibration frequencies of two of the estrogens from quantum mechanical calculations. See, you guys, it’s science! Specifically, chemistry.

      I’ll admit to some puzzlement as to how the vibration frequencies of the estrogens relate to the rest of what Sam Harris was talking about, though…

    • Oh, my. All those boob shaped graphs.
      Men everywhere are being corrupted.

    • Sam Harris is going to be on Air Talk with Larry Mantle on KPCC (http://205.144.168.164/programs/airtalk/) in Southern California on Monday. It’s a call-in show that comes on from 11am – 1pm PDT on public radio. Maybe someone in the area who’s more well-spoken than I would like to call in? The number’s 866-893-5722.

    • I found this article funny but I must admit I’m a bit lost by your main point. What exactly does Sam Harris not seem to understand about the statistics of Bell Curves? It seems more like your saying that his estimates of the distribution of ‘estrogen vibe’ in the population is inaccurate. Isn’t that different from saying he doesn’t understand Bell Curves?

    • It strikes me that this article is intended as a polemic. However, on the off-chance that you did want to say something serious about statistics and probability theory, allow me to point out a few misconceptions:

      1. You say “Many researchers have searched for cognitive and psychological differences between the genders and taken as a whole, researchers have found little to no difference in cognitive ability or psychological differences between men and women and when differences are found, it’s unclear whether they are themselves innate or a product of our culture and experiences.” The second part of this sentence is indeed true, but the first part is false. (Or, let me say – more cautiously – that it is not obviously true.) As far as I can see, most empirical studies do find that gender is a statistically significant factor in mathematical and language abilities, for example, even after controlling for cross-national variations. Briefly, boys tend to score higher in mathematics, on average, and the variance of their mathematics scores is higher. Moreover, the tails of the distribution of mathematics scores tends to be dominated by boys. The reverse appears to be true, when it comes to language scores. Also, these results are apparently not explained by traditional proxies for gender equality. (Some studies dispute these findings, but they seem to be in the minority.)

      2. You say “The only way you would be able to see differences between the psychologies of men and women is if those differences were quite large.” Here you appear to be talking about differences between the means of the two sub-populations. This is not generally true, since it depends on the standard deviations of the two sub-populations. Specifically, the standard error of the estimate for the mean of a normally distributed population is proportional to its standard deviation. Hence, if the means of the two sub-populations are different enough, or if their standard deviations are small enough, you will be able to reject the null hypothesis that their means are equal, using only a modest sample. However, you won’t be able to reject the null hypothesis if the means of the two sub-populations are very similar, and their standard deviations are large. I agree that this is likely to be the case for most data where the sub-populations are selected on the basis of gender.

      3. It strikes me that your biggest error is to think that any statement about gender-based cognitive/psychological differences must be a statement about the means of the distributions of some trait. (This is borne out by your illustrations, which depict two distributions with the same variance, but shifted means.) Actually, you are much more likely to see differences between the two distributions when you compare their higher-order moments. You would especially like to compare the tail probabilities for the two distributions. Tests that do so require modest samples, compared with tests that compare sample means. That is to say, if it is the case that males inhabit the tails of the distribution for some cognitive ability/psychological trait, then it is likely that statistical tests with reasonable sample sizes will be able to detect this.

      4. One final point where I think you’re mistaken. Let us say, for the sake of argument, that Harris is right, in that men are “more attracted to this style of communication than women are.” Let us also assume that the standard deviations of the extent to which men and women are “attracted to this style of communication” are large. That is to say, men are – as Harris claims – generally more likely to prefer his style of communication, but there is a substantial variation in this preference across the populations of both men and women. Then two things are likely to happen: (i) Estimates of the mean preferences of men and women won’t reveal any gender-related differences (due to the problem with the standard errors of sample means mentioned earlier); and yet (ii) There will be more men than women attending Harris’ talks (due to the fact that the true – but unobservable – means of the two populations are different).

      • Actually… The studies show that “math ability” and the like are heavily influenced by the conditions of the testing. Literally, reminding women that they tend to do worse than guys causes them to do worse, and reminding men that they “do better”, causes their scores to rise slightly. Do both, and you end up with a huge gap. Language is an even dumber one. Something like 90% of the “best” linguists on the planet are male, yet, supposedly, women as better at language. This doesn’t make logical sense, at all. But, honestly, it might make sense if you recognize that men are more likely to get hired, so may be more promoted as linguists, where men, as a general trend, just from my own observations, tend to think they have “better things to do” than waste time doing a lot of reading, i.e., unless they are linguists they don’t actually bloody use their language skills a lot, unlike readers.

        Just those factors, by themselves, are enough to skew the results, and its not like you can control for them, unless you find tens of thousands of men and women, who all happen to read the same amount, and things. Same problem with math – men barely use those skills, and women…. yeah, no inherent bias, which throws a monkey wrench into the works when studying the problem…

        • Literally, reminding women that they tend to do worse than guys causes them to do worse, and reminding men that they “do better”, causes their scores to rise slightly.</blockquote

          Not only this, but merely reminding women that they are women (e.g., by first asking questions that encourage them to reflect on their gender) can cause women to do worse on math tests, so strongly entrenched and internalized is the cultural notion that “girls are bad at math”. (Honestly I’m loath to type that phrase even in scare quotes, considering how easy a message it is to reinforce. Girls: you are great at math. Or you can be, if you choose it!)

          • HTMfaiL. Sorry.

          • Girls: you are great at math. Or you can be, if you choose it!

            That’s what I keep telling my daughter but I don’t think she’s buying it.

            • A female friend of mine in college, who was extremely smart, studying chemical engineering, and in an elite honors program, once told me that she consciously chose to act less intelligent than the men around her, particularly ones she was romantically interested in, so she wouldn’t intimidate them and they’d like her better. As a naive male freshman with feminist sensibilities but little real-world experience, it was an eye-opening moment for me to see first-hand how a woman I liked and respected might see her own intelligence as an impediment rather than a strength, at least in some circumstances, in ways that would never be true for me. Of course, in her case she was consciously aware of what she was doing, but clearly this decision process acts subconsciously as well, in people with or without my friend’s level of self-awareness.

              Your daughter’s lucky to have a parent who lets her know she doesn’t have to be bad at math in order to be accepted. You’re tackling a lot of cultural inertia there!

        • kagehi wrote “Something like 90% of the ‘best’ linguists […] are male, yet, supposedly, women are better at language. This doesn’t make logical sense at all.”

          No, unfortunately you’re labouring under a misapprehension. If the distribution of languistic abilities for males has a higher variance than the distribution for females, then it is quite possible for men to have poorer language skills on average AND for the most gifted linguists to be men.

          This is in fact a very common statistical phenomenon. For example, active fund managers perform worse than passive fund managers, on average, yet the best performing funds over any period are all active. Why is that? Active funds hold riskier portfolios, which means that the variances of their returns are higher than is the case for passive funds.

          Of course, such a statistical explanation need not provide the correct interpretation of the evidence you cited – your explanation in terms of employment bias may well be the correct one. But the your evidence does not expose an inherent logical problem with an explanation that invokes inherent gender-based differences in linguistic skills (as loath as we may be to countenance it).

          • No, you’re “labouring under a misapprehension,” which is that just because 90% of people working in linguistics are men that they are somehow “the most gifted linguists.” The point kagehi is making that you seem loathe to acknowledge is that there are structural barriers in place that set up these differences, and in many cases exaggerate these differences, which cannot be attributed to some intrinsic difference between men and women. So it’s not that somehow magically the most gifted people in linguistics happen to be men and they are filling those jobs, it’s that the way academia and education more broadly is structured sets up a situation where men are hired for positions more than women. You are confusing the quantity of people in a particular position with the quality of their talent.

            • You should probably reread my last message. I don’t claim that there are not “structural barriers” that create and/or exaggerate gender imbalances; I specifically allow that such an explanation for the existence of gender imbalances may be correct. My point is simply that kagehi’s inference is incorrect. In particular, I show that it is possible for average female linguistic ability to exceed average male linguistic ability AND for the top 90% of all linguists to be male. So, kagehi’s claim that these two observations don’t “make logical sense” when taken together is false. This is simply a fact of probability theory.

              Now, your preferred explanation for the phenomenon described by kagehi may correct. But, the data he provided does not establish that. Of course, the data also fails to establish that this particular gender imbalance has a biological cause.

            • Seems to be a semantic argument here, is this better Hardy – there is no automatic reason to assume that such a bias is real, other than its prevalence in academia, but there is vast amounts of evidence implying a likely alternative explanation, which is sufficiently pervasive across all disciplines and skills, to suggest that its **unlikely** to be an accurate conclusion.

              Since you object to the suggesting that its just pure BS, without going into why.

    • The quote I start my blog post on Theory of Mind with is:

      In any great organization it is far, far safer to be wrong with the majority than to be right alone. — John Kenneth Galbraith

      Pretty obvious that if women get death threats for being good at math, it is safer to pretend to suck at math. The easiest way to pretend to suck at math is to actually suck at math.

    • @Hardy Hulley: No, look, I agree with Kagehi – you need to show the data for this. Yes it is theoretically possible to have different shaped distributions as you say, but male & female distributions for the vast majority of attributes (constructs) are similar.

      • @Jack99: If you agree when Kagehi says “Something like 90% of the ‘best’ linguists […] are male, yet, supposedly, women are better at language. This doesn’t make logical sense at all,” then I’m afraid you’re wrong. It is quite possible for women to perform better than men, on average, and yet for the best performers to be men. This is because the higher-order moments of the distributions of language skills for men and women are more important for determining the right-hand tail of the overall distribution than the means of the individual distributions. This is not my opinion, it is a (simple) probability-theoretic fact.

        Note that I have not claimed that gender *is* an important determinant of language ability; I’m simply pointing out that two facts adduced by Kagehi are not mutually inconsistent. In other words, Kagehi has drawn a false inference (which he/she shouldn’t feel bad about, because many non-statisticians make the same error).

        Okay, so what does the data say? Well, the latest analysis, based on 10 years’ of PISA data finds persistent and statistically significant differences in both the language and mathematics gender gaps. The sample is important, because it is international, and therefore accounts for the effects of gender-based policy variations across different countries. Moreover, since it is based on tests of 15 year-olds, which is the highest age of mandatory schooling across all participating nations, the PISA data also accounts for possible biases associated with variations in school participation rates across different countries.

        The main findings are:
        1. In mathematics, boys perform better than girls, on average; however, the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys.
        2. In language, girls perform better than boys, on average; however, the left-hand tail (i.e. the worst performers) is dominated by boys, while the right-hand tail (i.e. the best performers) shows little difference between boys and girls.
        3. In the cross-section (i.e. when comparing the results across different countries), the gender gaps for mathematics and language are inversely related (i.e. countries that manage to decrease the mathematics gap increase the language gap, and vice versa). Interestingly, there are no exceptions to this pattern!

        Point 3 is extremely important from a policy point-of-view, since it suggests that differences in performance between boys and girls are not corrected by gender equality and empowerment programmes. In fact, the data suggests that countries with such programmes exhibit higher gender gaps in mathematics.

        You can find a video discussion of the results at https://www.youtube.com/watch?v=m9WxvT-82Xg. There you will also find a link to the published paper (which is available for free). It appeared in the scientific journal Plos One in 2013.

        • The problem with statistics, even accurate ones, is that they can lie, while telling the truth. For example – how many of those “countries” are female dominated, or neutral on gender, instead of part of the vast majority, which have been, and continue to be, male dominated? This is the critical problem with trying to measure such things. There is literally no way, at all, to create a situation in which to collect such statistical data, in which there is not **already** a bias in the data set. Its like trying to study how wet something can get, when rained on, while 50 feet under water. It doesn’t matter how you collect your “statistics”, they are never going to be accurate.

          To put it simply, to acquire an accurate assessment of the *real* differences, you would have to do the unethical and immoral – remove thousands of people from society, into a lab, from birth, and have them grow up with “only” those skills you are testing for, while removing any possible accidental biases that might arise from literature source (in the case of language, for example), as well as any, and all possible other sources, by which they might be prone to bias their own behavior, or have it biased, by the expectations of outside influences.

          There is no way to collect statistics, from any kind of meaningful category of nations, without running head first into the **preexisting** cultural biases, which, by existing, alter the baseline, right from moment one, into a framework that would, how ever unintentionally, alter the perceptions, and expectations, of the very people you want to test.

          Show me how you **ever** successfully correct for that, without the inherent margin of error that it causes, presuming you can even reliably define the margin of error, without a neutral baseline to start with, and.. maybe I will agree with your assessment. But… the actual evidence suggests that the bias persists, all across the board, even as the presumption of just what the statistics **should show** has drifted closer and closer together, narrowing the perceived gap.

          The point being, there may be one, but you first have to show that, in fact, that gap is real, not artificially induced, by the continued perceptions and assumptions, which plague every single one of the nations in the studies. Until you can show that the data is not, in effect, poisoned at the well, you have no grounds to claim that the differences detected are “innate” characteristics, instead of statistically anomalies, arising from uncontrolled, nah.. uncontrollable variables. And, none of these studies are capable of showing this to be the case, and past experience from prior “studies”, which failed to even acknowledge such bias, have been rendered worthless, because “culture” changed, and made the assumptions change with them.

          Your hubris is in assuming that they won’t change again, that you have somehow controlled for the uncontrollable, *and* that time will some how prove out the differences as real, and not a product of a nearly universal set of cultural assumptions, which are prevalent even among people that insist they have left them long behind, but who, never the less, cannot escape the influence of the wider culture, which has not given them up, and the influence of which cannot be avoided entirely (and perhaps not even significantly).

          • @kagehi said “Until you can show that the data is not, in effect, poisoned at the well, you have no grounds to claim that the differences detected are ‘innate’ characteristics, instead of statistically anomalies,…” Please don’t put words in my mouth – I never claimed the results of the study in question were evidence of “innate characteristics.” That is your interpretation.

            “The point being, there may be one, but you first have to show that, in fact, that gap is real, not artificially induced, by the continued perceptions and assumptions, which plague every single one of the nations in the studies.” You’re clutching at straws, my friend. When somebody presents empirical evidence of this sort that you wish to refute, you have the following options: (i) replicate the study, and demonstrate that you don’t obtain the same results; (ii) identify a methodological flaw in the analysis; or (iii) explain exactly what sort of bias could bedevil the results, and demonstrate it. Vague nonsense about “…continued perceptions and assumptions…” is just pseudo-science.

            In truth, the results obtained by the study in question make it quite robust to claims of bias. In particular, what type of bias could generate a negative relationship between the gender gaps for language and mathematics? Be specific if you want to tackle that question – no “culture” mumbo-jumbo.

            Unfortunately, the rest of what you wrote only reveals a poor understanding of statistical inference. For example, the stuff about “baselines” is completely irrelevant, and implies that you really don’t understand the type of analysis conducted in the study. I suggest you actually read the article, and pay attention to the authors’ interpretation of the results, and their resulting policy recommendations. You may find yourself agreeing with them.

            • Just going to say this, since Hornbeck already explains it way better that I can. If you can’t determine, or factor for, or control, the variables that confound you statistics, then your statistics are useless when determining what underlies the system.

              Your own challenge to me is, absurdly, “Of course not, the statistics are only showing the current conditions of the system.” – i.e. “I never claimed the results of the study in question were evidence of “innate characteristics.”” But, we are not talking about whether or not the current state exists, or is real. We are talking about whether or not there are innate reasons, base purely on the gender of the individuals, without any other confounding factors, that result in the gaps seen.

              My argument is, in a nutshell, that its bloody meaningless to talk about how likely it is that your car will get wet in the rain, if your “statistics” are based on a community where 90% of the population walks every place, and all of their cars remain parked in a garage 24/7, something like 200 out of 365 days a year. Of course the statistics are going to say, “Cars don’t get wet when it rains.”, if that is the conditions under which you are bloody collecting the evidence. Only, what we have here is the opposite situation, **no one** is keeping their car in a garage, so, of course, the statistics are saying, “When it rains, everyone’s car gets wet.” And, you actually seem to think this means anything, when the question is, “What if it was parked in a garage?”

              Or, in other words, the questions isn’t, “Do these biases exist, period.”, which you rightly say they do, but, “Would they still exist, if you changed the bloody variables, so that the whole entire system wasn’t stacked against women, from how much they get paid, compared to men, to what they are told, via the mythologies of society, about their math skills, to how they are told to act, to think, to react, etc. If you removed the variables, what would be the result?”

              All you do is keep babbling, “But…. with the variables still there, there is this huge gap!”

              No one cares, because its not about the state of the system, as it exists, its about the state of the system, without the confounding variables. There is clear evidence that, when removed, the “gap” starts disappearing, but we have no way to completely remove the variables, so we have no valid statistical data to say if the gap **is** still there, when/if you remove all of the external factors. But, there is sufficient evidence to suggest that, with what ones we **can** remove dealt with, the gap all but vanishes anyway. And then, you show up again, and start rambling about, “Yes, but, when we flat out refuse to adjust for, or remove those variables, at all, there is this massive gap!!!” Argh!!!

        • 1. In mathematics, boys perform better than girls, on average

          Yeah-huh.

          Meta-analytic findings from 1990 (6, 7) indicated that gender differences in math performance in the general population were trivial, d= –0.05, where the effect size, d, is the mean for males minus the mean for females, divided by the pooled within-gender standard deviation. […]

          Effect sizes for gender differences, representing the testing of over 7 million students in state assessments, are uniformly <0.10, representing trivial differences … Of these effect sizes, 21 were positive, indicating better performance by males; 36 were negative, indicating better performance by females; and 9 were exactly 0. From this distribution of effect sizes, we calculate that the weighted mean is 0.0065, consistent with no gender difference …. In contrast to earlier findings, these very current data provide no evidence of a gender difference favoring males emerging in the high school years; effect sizes for gender differences are uniformly <0.10 for grades 10 and 11 …. Effect sizes for the magnitude of gender differences are similarly small across all ethnic groups …. The magnitude of the gender difference does not exceed d= 0.04 for any ethnic group in any state.

          So much for that part.

          the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys.

          Survey says:

          Greater male variance is indicated by VR > 1.0. All VRs, by state and grade, are >1.0 [range 1.11 to 1.21 …]. Thus, our analyses show greater male variability, although the discrepancy in variances is not large […]

          For whites [in grade 11 in Minnesota], the ratios of boys:girls scoring above the 95th percentile and 99th percentile are 1.45 and 2.06, respectively, and are similar to predictions from theoretical models. For Asian Americans, ratios are 1.09 and 0.91, respectively. Even at the 99th percentile, the gender ratio favoring males is small for whites and is reversed for Asian Americans. If a particular specialty required mathematical skills at the 99th percentile, and the gender ratio is 2.0, we would expect 67% men in the occupation and 33% women. Yet today, for example, Ph.D. programs in engineering average only about 15% women.
          Hyde, Janet S., et al. “Gender similarities characterize math performance.” Science 321.5888 (2008): 494-495.

          Ok, so there is some evidence for variance, but it’s not on the scale that would explain real-world gender disparities and there might be a strong cultural effect driving it, instead of biology.

          2. In language, girls perform better than boys, on average

          Didn’t I mention this earlier? [scrolls up] Oh, no, that meta-analysis glossed over the detail. Time for a fresh one!

          Located 165 studies that reported data on gender differences in verbal ability. The weighted mean effect size was +0.11, indicating a slight female superiority in performance. The difference is so small that we argue that gender differences in verbal ability no longer exist. Analysis of tests requiring different cognitive processes involved in verbal ability yielded no evidence of substantial gender differences in any aspect of processing.
          Hyde, Janet S., and Marcia C. Linn. “Gender differences in verbal ability: A meta-analysis.” Psychological Bulletin 104.1 (1988): 53.

          I don’t see any mention of variability there, alas, but I will note that while the general idea that men exhibit more variance than women has been around for a hundred years, the magnitude of the variance seems inversely proportional to the sample size, and there are a few external factors that can have the same effect. Boys tend to disproportionately drop out of school, for instance, which will artificially boost variability (as low-scorers will be encouraged to stay, while high-scorers desire to stay).

          So while that might be a nice study, when you gather together all studies you get a different picture.

          • @Hj Hornbeck: You quote extensively from Janet S. Hyde, et. al., “Gender similarities characterize math performance,” Science, 321:94-95, 2008. Unfortunately, you probably haven’t chosen the best article upon which to base your case.

            1. First, the data obtained by Heyde et. al. (2008) appears to be a snapshot of NAEP scores for 10 US states (presumably for one year – they’re not clear about this). By contrast, the PISA data is a panel, where the time-series covers four years (2000, 2003, 2006, and 2009), and the cross-section consists of 75 countries. You will struggle to challenge the findings based on such a broad dataset, by appealing to results of a much narrower study.

            2. You quote the following passage from Heyde et. al. (2008): “Meta-analytic findings from 1990 (6,7) indicated that gender differences in math performance in the general population were trivial, d=–0.05, where the effect size, d, is the mean for males minus the mean for females, divided by the pooled within-gender standard deviation.” Curiously, you omit the very next sentence: “However, measurable differences existed for complex problem-solving beginning in high school years (d=+0.29 favoring males), which might forecast underrepresentation of women in science, technology, engineering, and mathematics (STEM) careers.”

            3. You quote the following passage from Heyde et. al. (2008): “Effect sizes for gender differences, representing the testing of over 7 million students in state assessments, are uniformly <0.10, representing trivial differences…" Here I'm concerned about how the authors interpret their results. The statistic they're referring to is Cohen's d-statistic (I apologise if I'm telling you something you already know), and they cite Cohen's book, which offers a heuristic to the effect that d-values less than 0.2 are small. Sure, but that only deals with economic significance; what about statistical significance (which is probably more important in this case)? The authors are completely silent on this issue.

            4. You quote the following passage from Heyde et. al. (2008): "Of these effect sizes, 21 were positive, indicating better performance by males; 36 were negative, indicating better performance by females; and 9 were exactly 0. From this distribution of effect sizes, we calculate that the weighted mean is 0.0065, consistent with no gender difference…." This looks like poor methodology. First, the authors had data for 10 grades from 10 states, yet they've calculated only 36+21+9=66 d-statistics; you would expect 100 such values. Second, they report that the weighted average of the individual d-statistics is 0.0065. However, averaging d-statistics across completely different distributions is pretty meaningless. To see why, note that girls and boys may well develop at different rates from Grade 2-Grade 11, so that overall test score distributions could vary substantially over time. On top of that, the mathematics tests they write at different ages are completely different. I can't see how to interpret a composite statistic constructed as a weighted average across heterogenous distributions.

            5. You quote the following passage from Heyde et. al. (2008): "In contrast to earlier findings, these very current data provide no evidence of a gender difference favoring males emerging in the high school years; effect sizes for gender differences are uniformly 1.0. All VRs, by state and grade, are >1.0 [range 1.11 to 1.21 …]. Thus, our analyses show greater male variability, although the discrepancy in variances is not large…” In keeping with their general indifference to statistical significance, the authors provide no p-values to accompany their results (which is very poor form in any empirical study). Coincidentally, the very next issue of Science published the study by Stephen Machin and Tuomas Pekkarinen, “Global sex differences in test score variability,” Science, 322:133-134, 2008. It repeated much of the analysis performed by Heyde et. al. (2008), using PISA data for 2003. However, the authors also had the good sense to disclose the statistical significance of their results. To summarise, for the U.S. alone, they found d-statistics for reading and mathematics of -0.32 and 0.07, respectively, with p<0.01, in both cases (i.e. both results are significant at the 1% level). They also found variance ratios of 1.17 for reading and 1.19 for mathematics, with p<0.01, in both cases. Finally, the d-statistics (gender gaps) for mathematics in the top 95% and bottom 5% were found to be 0.22 and -0.11, respectively (p<0.01).

            7. Heyde et. al. (2008) simply announce that variance ratios between 1.11 and 1.21 are "not large." Really? Suppose boys' and girls' maths scores are normally distributed with the same mean, and a variance ratio of 1.2. Then a quick calculation reveals that boys will outnumber girls by about 3:1 in the top percentile of maths performers. Since those children are most likely to enter high-earning professions, how can anyone argue that such values are economically insignificant?

            8. However, variance ratios are not even that interesting – we're really interested in the tails of the distributions. (Recall, that my initial summary of the results in Stoet and Geary (2013) said "…the left-hand tail (i.e. the worst performers) reveals little difference between boys and girls, while the right-hand tail (i.e. the best performers) is dominated by boys." I said nothing about variance ratios.) Unfortunately, Heyde et. al. (2008) undermine your argument a bit on this score, by reporting that boys outnumber girls by 2:1 in the top percentile of maths performers.

            9. You write: "Ok, so there is some evidence for variance, but it’s not on the scale that would explain real-world gender disparities and there might be a strong cultural effect driving it, instead of biology." Here I agree with you. Even if boys outnumber by girls by 2:1 in the top percentile of maths performers, we should still expect to see around 33% of scientists, engineers and mathematicians being women. I also agree that the data is insufficient to infer a biological explanation for the gender gap in maths test scores – there is more likely a constellation of environmental factors at play (Stoet and Geary (2013) argue that boils down to issues with pedagogy). In any case, irrespective of the causes of the gender gap in maths, there is no doubt that it is a socially harmful phenomenon, and we have every incentive to eliminate it. Where I disagree with you is on the question of whether the gap exists in the first place, and on its economic and statistical significance; I think it is a robust empirical feature – especially in the tails of the distribution – and while denying its existence may be comforting, you can't solve the problem that way.

            10. You write: "So while that might be a nice study, when you gather together all studies you get a different picture." I think you're wrong about this. As far as I can tell, the claim that the distributions of mathematics scores for boys and girls are identical represents a minority view. Very few serious studies make such a claim – not even Heyde et. al. (2008), when you consider their evidence on the tails.

            • Unfortunately, you probably haven’t chosen the best article upon which to base your case.

              Fair enough, I’ll use yours instead.

              Curiously, you omit the very next sentence: “However, measurable differences existed for complex problem-solving beginning in high school years (d=+0.29 favoring males), which might forecast underrepresentation of women in science, technology, engineering, and mathematics (STEM) careers.”

              I omitted it because it was irrelevant. The corollary of “regression to the mean” is that “a subset or sample can vary significantly from the mean.” Finding one subset of mathematical skill that favors men does not contradict the theory that there are no sex differences overall. What it does do is demonstrate the variability in the dataset, which is associated with cultural explanations (as the realist position assumes the difference is universal and thus should not vary), so at best you’ve just shown cultural factors can cause Cohen’s d to vary by about 0.29. This will come in handy later.

              Sure, but that only deals with economic significance; what about statistical significance (which is probably more important in this case)? The authors are completely silent on this issue.

              You don’t seem to understand what statistical significance means. Suppose I demonstrate to a very high statistical significance that flipping a specific coin will result in heads 51% of the time. Should I stop using it for flipping? Probably not, as most coin flips are one-off events, and the consequences of having a slight bias are negligible. Statistical significance only tells us the certainty those results weren’t arrived at by chance, it says nothing about how those results came about or what they mean in daily life.

              Also, what I linked to was a two-page summary published in Science. There wasn’t enough room there to cover the subtleties. Hyde not only has covered your basic argument about variance, in fact, she did it over thirty years ago.

              Assuming that engineering requires a high level of spatial ability, can the gender difference in spatial ability account for the relative absence of women in this profession? The above findings of such a small gender difference would appear to argue that the answer is no. However, the question has now shifted from a discussion of overall mean differences in the population to differences at the upper end of the distribution. And relatively small mean differences can generate rather large differences at the tails of distributions, as the following sample calculation will show. Assume, conservatively, that the gender difference in spatial ability is .40 SD. Using z scores, the mean score for males will be .20 and the mean for females will be —.20. Assume also that being a successful engineer requires spatial ability at least at the 95th percentile for the population. A continuation ofthe z-score computation shows that about 7.35% of males will be above this cutoff, whereas only 3.22% of females will be. This amounts to about a 2:1 ratio of males to females with sufficient ability for the profession. This could, therefore, generate a rather large difference although certainly not as large a one as the existing one.

              The disparity would become even larger if one considered some occupational feat, such as winning a Nobel prize or a Pulitzer prize, that would require even higher levels of the ability. For example, suppose that spatial ability at the 99.5th percentile is now required. The same z-score calculations indicate that .85563% of males and .27375% of females would be above that cutoff, for an approximate 3:1 ratio of males to females.Once again, though, this is not nearly, a large enough difference to account for the small proportions of women winning Nobel prizes.
              Hyde, Janet S. “How large are cognitive gender differences? A meta-analysis using w² and d..” American Psychologist 36.8 (1981): 892.

              This is a finding she repeats nearly thirty years later: “Gender differences in math performance, even among high scorers, are insufficient to explain lopsided gender patterns in participation in some STEM fields.”

              This brings me to your points 7 through 9. Note that in order to do the above calculations, Hyde assumed those d values were entirely explained by biological factors. As you pointed out above, though, they’re within the range of what can be generated by social factors. So the central assumption behind her calculation is not established, and thus we can wave the entire thing away with Hitch’s Razor: that which can be asserted without evidence can be dismissed without it. The same applies to your calculations.

              First, the authors had data for 10 grades from 10 states, yet they’ve calculated only 36+21+9=66 d-statistics; you would expect 100 such values.

              If all ten states returned scores for all ten grades, that is. Look at the N values on the main chart; Grade 4 has 763,155 samples, while Grade 5 has 929,155. Is it more likely there was a massive demographic bump that caused an increase in the birth rate by 160,000 over those ten states, or that some of them didn’t return test scores below Grade 5? This guesswork is confirmed in the chart on page two, which breaks down the results by state and grade. Wyoming has six gold squares, indicating that it returned only six grades worth of data.

              However, averaging d-statistics across completely different distributions is pretty meaningless. To see why, note that girls and boys may well develop at different rates from Grade 2-Grade 11, so that overall test score distributions could vary substantially over time.

              Hold up, I thought you were arguing for an overall sex difference? Now you’re shifting the goalposts, quietly switching your hypothesis to another one that argues for transitory sex differences during development. It doesn’t help that your own dataset is drawn an even narrower age range (“the exact age for inclusion is 15 years and 3 months to 16 years and 2 months”), so by taking this tack you’ve also defanged your own citation.

              In keeping with their general indifference to statistical significance, the authors provide no p-values to accompany their results (which is very poor form in any empirical study).

              The p-value represents the odds of the null hypothesis being true, instead of the testing hypothesis; in this case, it’s that there is no correlation between sex and math scores. Hyde’s hypothesis is that there is no correlation between sex and math scores.

              Hyde’s hypothesis is the null hypothesis, thus it carries no p-value. And as I hinted at before, p-values are overrated.

              Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude –not just, does a treatment affect people, but how much does it affect them.

              -Gene V. Glass

              The primary product of a research inquiry is one or more measures of effect size, not P values.

              -Jacob Cohen

              These statements about the importance of effect sizes were made by two of the most influential statistician-researchers of the past half-century. Yet many submissions to Journal of Graduate Medical Education omit mention of the effect size in quantitative studies while prominently displaying the P value.
              Sullivan, Gail M., and Richard Feinn. “Using effect size-or why the P value is not enough.” Journal of graduate medical education 4.3 (2012): 279-282.

              But back to you:

              As far as I can tell, the claim that the distributions of mathematics scores for boys and girls are identical represents a minority view.

              And yet the very study you cite argues for gender similarities rather than differences. Don’t believe me? Ask yourself this: what are the effect sizes that paper found?

              Stoet G, Geary DC (2013) “Sex Differences in Mathematics and Reading Achievement Are Inversely Related: Within- and Across-Nation Assessment of 10 Years of PISA Data.” PLoS ONE 8(3): e57988. doi: 10.1371/journal.pone.0057988

              Don’t see them? That’s because the authors buried those numbers in a misleading chart, and hid the decryption key two paragraphs back:

              For all analyses, we express sex differences in PISA score points. These scores are not “raw” scores, but result from a statistical analysis that normalizes student scores … such that the average student score of OECD countries is 500 points with a standard deviation of 100 points. The advantage of this is that scores become easily comparable and differences easily to interpret. For example, a 10 point difference between boys and girls reflects approximately 1/10th of a standard deviation.

              There’s no indication of sample sizes or standard deviations for boys and girls separately, so we have to assume they’re roughly equal and approximate Cohen’s d. That makes the calculations easy here; just take the point difference and divide it by 100. So for the gender gap in median math performance, d is constant at roughly 0.1, and for median reading we get values that increase from 0.3 to 0.4, ish.

              OK, so what does those numbers mean? Here’s a handy way to visualize them; adjust the slider to the effect size you want, and watch the distributions and numbers change. The percentage of overlap is the area where both distributions visually overlap, relative to all samples. My favorite number on that page is the “probability of superiority;” invented by KO McGraw and SP Wong in 1992, it asks a simple question: if you picked a random thing from sample A, and a random thing from sample B, what are the odds that A would be “superior” to B? I invented a similar-but-slightly-different metric less than a year ago, “predictability:” if I plucked a random value from the total pool of samples, how accurately could you predict which distribution it came from? My metric tends to be a smidge smaller, but in the same ballpark; calculate it by subtracting the percentage of overlap from 100%, then dividing the result by two.

              Sorry, I got distracted. Point is: even a Cohen’s d of 0.3 isn’t much of a difference, and yet the overall gender differences in this paper struggle to hit that mark. For math, there’s a 92% overlap between boys and girls, a 53% probability of superiority, or a “predictability” of 52%. For the 2009 overall reading difference of 0.4, we find an overlap of 84%, a 61% probability of superiority, or a “predictability” of 58%. Not impressive.

              But it gets worse, because those are overall differences. There’s very little genetic difference between human beings, so any variation we see in the numbers should be indicative of culture, not biology. So to get a true sense of who’s in change, we need to consider the variation between countries as well. A look at the relevant chart shows a huge amount of variation. Math differences, translated back into Cohen’s d, range between 0.32 and -.175; reading differences range between 0.7 and 0.05. That’s about 0.5 and 0.6, respectively, or twice the overall mean difference for each category. Culture not only has quite a bit more sway over scores than biology, it looks capable of obliterating any biological gap.

              Except the picture is even worse than that. Which countries were sampled?

              The number of countries contributing to the PISA data sets include both OECD and OECD-partner countries. The number of participating countries/regions (e.g., Hong Kong) has increased to 74 in 2009

              “Organization for Economic Cooperation and Development.” Not only are were the sampled countries skewed towards the richest on the planet, they were skewed towards countries that engage in substantial economic trade with one another… and this would tend to homogenize their cultures. But sitting behind the assertion “the central value represents biological tendencies” is the assumption “all our samples were taken from heterogeneous cultures, representative of humanity as a whole.” That plainly isn’t true, and worse still, the paper conveniently provides evidence both of this homogeneity, and that a more heterogeneous sample would show a smaller gender gap.

              It’s all in this chart. I’ll let the paper speak for itself here:

              The OECD countries not only have higher overall scores, their mathematics gap, favoring boys, is more tightly clustered between ?5.5 and 17.5 points (M = 10.5,SD = 5.1). The two outliers are Iceland (HDI rank = 2, GGGI rank = 1) and Georgia (HDI rank = 61, GGGI rank = 40). In contrast, there is considerable variability in the non-OECD countries (between ?15.0 and 30.0 points, M = 5.4, SD = 10.5), with boys’ having higher mathematics achievement in some of them (e.g., Costa Rica) and girls having higher mathematics achievement in others (e.g., Albania).

              The included “non-OECD” countries still skew towards the rich and trade-happy, but nonetheless form a more culturally heterogeneous sample. And they demonstrate both a smaller gender gap and greater variation when separated out from the official OECD countries.

              I could go on (students demonstrate a much greater gender gap than the general population, for instance, and I never discussed sample sizes), but this comment is getting a bit long. So, let’s cut it short with a summary of what your own citation demonstrates:

              – It demonstrates the gender differences in math scores are negligible, and small-ish when looking at reading scores.
              – It demonstrates cultural influences have a greater influence than any biological ones, and that they’re capable of wiping out any biological difference.
              – It is drawn from a pool of rich countries that have some degree of cultural homogeneity, and demonstrates the central scores are likely contaminated by cultural overlap, further minimizing the influence of biology.

              The authors were oblivious to all this, too, as they approached their dataset from the assumption of difference, rather than similarity.

          • @Hj Hornbeck wrote: “… I will note that while the general idea that men exhibit more variance than women has been around for a hundred years, the magnitude of the variance seems inversely proportional to the sample size…”

            The idea that variance estimates could be inversely proportional to sample sizes doesn’t make much sense. After all the standard variance estimator is unbiassed. Of course, the standard errors of estimated variances certainly do depend on sample sizes, which in turn means that the statistical significance of the estimates depend on sample sizes as well. But any perceived sample size bias is bound to be spurious.

            What could happen (but you’d have to provide evidence for this) is that more modern studies are progressively using larger datasets, while *population* variances are simultaneously decreasing (for exogenous reasons).

            You wrote: “Boys tend to disproportionately drop out of school, for instance, which will artificially boost variability (as low-scorers will be encouraged to stay, while high-scorers desire to stay).” That’s hard to believe. Drop-outs are bound to affect the left tails of performance distributions disproportionately, as the poorest students opt out of a school education. Since truncating left tails should reduce variances, the effect should be the opposite of what you describe.

            In any case, survivorship bias is not a big issue with the PISA data, since schooling is mandatory until the age of 15 in all participating countries.

  • I’m glad you kept bugging me about it. This was originally a 3-post series on statistical significance but I had so much trouble with it and it went through 2 complete rewrites before I decided to just strip it of all but the Bayes stuff. Explaining math concepts in writing is so much more difficult than explaining it to someone in person.

  • Yes! That would be fantastic! I’m glad I succeeded in making Bayes understandable and fun. <3

  • Yes! That is… a restating of my entire post? The post was disproving the myth that if a test reaches statistical significance than it must be a true effect and vice versa. I apologize if it wasn’t clear. I did start out with some misconceptions so that I could later rip them apart.

    I do disagree that P(no psychic powers) is unknowable. We…[Read more]

  • Back in 2010, researcher Daniel Daryl Bem at Cornell University supposedly proved that psychic precognition exists. He did a variety of tests on college students in which he reversed some common psychological tests and found that the students were able to predict something that had not yet happened. For example, he put pornographic photos behind one of two curtains. The students had to guess which curtain the photo was behind before seeing the answer and they were correct a statistically significant amount of the time. When statisticians use the term “statistically significant” they are typically referring to the p-value. If the study had a p-value cut-off of 5%, that would mean that if students were truly guessing randomly and did not have any psychic powers, they would still “pass” the test 5% of the time just by coincidence.  So, supposing no bias, intentional or otherwise, answer the following question:

    If Bem’s study on psychic powers had a p-value of 5%, what is the probability that the students were able to “pass” the test merely by coincidence rather than a true psychic ability?

    My guess is that you probably answered this question with “Uh…Jamie, you already told us. There would be a 5% probability that the students passed the psychic test by coincidence.” Now, let me ask you a slightly different version of the same question:

    Do you think it is more likely that the student’s in Bem’s study are truly psychic or that they just got really lucky?

    Since you are currently reading Skepchick, you’re probably a skeptic and probably do not believe in psychic powers, so you likely answered this question by saying that you believe that the students in Bem’s study are not actually psychic. However, this means that you think it’s more likely that the students fell into the 1 in 20 tests that would pass by coincidence rather than that they are actually psychic. In other words, you believe there is an over 50% chance that the students are not psychic even though you previously said that according to the p-value there was only a 5% chance of the study being wrong. Regardless of what the p-value says, you know that the chance of there being real psychic powers is extremely unlikely based on the fact that there is no previous evidence of psychic powers. The problem with p-value is that it doesn’t take any past evidence into consideration.

    Black Cat
    I know some of you may get anxiety around math, so I’ll be peppering the rest of this post with photos of my cat. If you ever start to feel anxiety, just look at my cat!

    So, if using a basic p-value for statistical significance of such an unlikely phenomena as psychic powers is flawed, then what can we use? Well, this is where Bayes’ Theorem comes in. You see, Bayes’ Theorem is a way of considering our prior knowledge in our calculation of probability. Using Bayes’ Theorem we can calculate the probability that the students in Bem’s study are really psychic or just got lucky in their guesses, while considering prior evidence as well as the new evidence. Lucky for us Bayes’ Theorem has a simple formula: Bayes' Theorem Formula A formula is not very intuitive though, so let’s just ignore that for now because an equation is not needed for actually understanding Bayes’ Theorem. Instead of looking at a formula, let’s forget about Bem and his psychic students for a second and instead talk through the following scenario. One of our newest contributors here at Skepchick is Courtney Caldwell. The posts she’s done so far are pretty good, but what if she’s not who she says she is? What if she’s actually a spy! I mean, she says she a skeptic and a feminist and all that, but can we really know? She could have joined Skepchick just to spy on us for …. ummm….. whatever, look the point is that we just don’t know that she’s not a spy.

    You see, it turns out that in this data that is totally 100% true and not something I’m just making up for purposes of this scenario, 15% of new writers at Skepchick turn out to be spies. To determine if Courtney is one of them, we’ll give her a polygraph test. Polygraphs are correct in determining lying versus truth-telling 80% of the time. Courtney says she is not a spy during the polygraph, but the polygraph tells us she is lying! So, what is the probability that Courtney is a real spy?

    Black Cat
    I have a totally legitimate reason for putting cat photos in a post about Bayes’ Theorem.

    If polygraphs are right 80% of the time and the polygraph says Courtney is a spy, does that mean there’s an 80% chance she is an actual spy? How does the 15% probability of new Skepchicks being spies factor in? To answer this, let’s step back for a second and consider a theoretical scenario where we have 100 new Skepchicks. The following box represents 100 new Skepchicks. The blue skepchicks are the ones telling the truth while the purple ones are the secret spies (15% —  but not to scale in the image just to make it clearer).

    15 Spies and 85 Not Spies We don’t know which people fall in the purple box or which in the blue, so we give them all a polygraph. They all claim they are not spies. The polygraph is correct 80% of the time, so out of our spies, 80% of them will be correctly identified as spies, while 20% will be misidentified as telling the truth even when they are not. In the following box, you can see that the 80% of the true spies that the polygraph labels as spies are represented in a red box and the 20% of the true spies that the polygraph has mislabeled as not spies are in the green box. 3 Spies the polygraph says are not spies and 12 spies the polygraph says are spies

    However, the spies aren’t the only ones that took the polygraph. All the new Skepchicks that were telling the truth about not being a spy also took the test. Because the polygraph is correct 80% of the time, 80% of the true Skepchicks were correctly labeled as not being spies, whereas 20% of them were misidentified as spies. The following box adds the polygraph results for the truth-tellers. Those in the green boxes were identified as telling the truth by the polygraph whereas those in the red boxes were identified as being lying liars.

    68 skepchicks are identified as not spies and 17 are misidentified as being spies

    However, we don’t know which Skepchicks are actual spies versus those that are telling the truth. All we know are their polygraph results. When considering the above box, all we can see is whether a person falls in the green box (pass polygraph) or red box (fail polygraph). For the sake of clarity, let’s label each of those boxes with a letter to make it easier to identify which box I am talking about at any given moment.

    Bayes' Theorem Figure 4

    Let’s consider Courtney again. Courtney failed the polygraph, so what is the probability she is a real spy?

    Considering the above boxes, you can see that Courtney falls only in one of the bottom two red boxes, boxes C or D, with her failed polygraph. We know that she is either a spy correctly identified by the polygraph (box C) or she is telling the truth but received a false positive on the test (box D). We don’t need to consider box A or B at all because we already know she failed the polygraph. Another way to phrase our question then would be: Given that we know Courtney failed the polygraph, what is the probability that she is in box C versus box D? Or, you can think of it as: C/(C+D) =  probability Courtney is a spy!

    Black Cat in Box
    Some boxes may have cats in them.

    In fact, we already have enough information to calculate the size of each box.

    Box C is the probability that Courtney is a spy (15%) multiplied by the probability the polygraph is correct (80%): 15% * 80% = 12%

    Box D is the probability that Courtney is not a spy (85%) multiplied by the probability that polygraph is wrong (20%): 85% * 20% = 17%

    We can now easily get C/(C+D) –> 12%/(12%+17%) = 41.4% and…HOLY SHIT YOU GUYS WE JUST DERIVED THE FORMULA FOR BAYES’ THEOREM! Seriously, this complicated thing, which is the formula from before modified to incorporate our spy scenario:

    Screen Shot 2014-09-03 at 7.08.44 PM Just means that the probability that Courtney is a true spy given that she failed the polygraph is equal to the probability that the polygraph says she is a spy and she is a spy divided by the probability that the polygraph says Courtney is a spy regardless of whether she is or not. It’s just C/(C+D) in our box figure. In this version of the formula, I labeled each section to show how it relates to our box: Bayes Theorem So, what does this mean exactly and how does it relate to Bem’s psychic research? The probability of Courtney being a spy pre-polygraph was 15%, but since we know she failed the polygraph the probability she was a spy went up to 41.4%. That means that even though the polygraph says that Courtney is a spy and the polygraph is 80% accurate, it is still more likely that Courtney is not a spy. In fact, there is a 58.6% chance that Courtney is not a spy! Yay!

    You can apply Bayes’ Theorem to any type of “test” where a true positive result is quite rare. For example, the fact that breast cancer is so rare is why if you get a positive result from a mammogram you still have a pretty big chance of not having breast cancer even if the test if fairly accurate. It’s also why the World Anti-Doping Association’s protocol requires athletes taking tests for doping to take a second different test if they get a positive result on the first one, rather than just trusting one test. Psychic abilities being real are almost infinitely more unlikely than breast cancer or doping so even though Bem’s research passed a 5% p-value threshold, it’s more likely that the test was wrong than not.

    In fact, now that we know the formula for Bayes’ Theorem we can calculate it. Let’s say the chance of psychic powers being real was 1%, which is frankly a lot higher than the lack of evidence for such a phenomena would suggest. Let’s further say that Bem’s test has a 5% p-value, which is close to what he claims to have had on some of his tests. The p-value may suggest that there is a 95% chance that Bem’s students passed the test because they are truly psychic, but we can use Bayes’ Theorem to consider the previous evidence regarding psychic powers.

    Black CatGoing back to our boxes, we want to calculate the values of C and D.

    Box C = The probability psychic powers are real (1%) multiplied by the probability Bem’s test is correct (95%) = 95% * 1% = 0.95%

    Box D = The probability psychic powers are not real (99%) multiplied by the probability Bem’s test is wrong (5%) = 5% * 99% = 4.95%

    Baye’s Theorem = C / (C+D) = 0.95%/(0.95%+4.95%) = 16.1%

    In other words, even in extremely favorable conditions where the chance of psychic powers existing is 1% and Bem had a 5% p-value, the probability that psychic powers are real is still only a hair above 16%. The probability that psychic powers exist is much, much lower than 1% and none of this rules out the rather high probability of there being some sort of bias in the test protocols. It’s extremely likely that the positive result he got on his research is not a real one and in fact future researchers have been unable to replicate his results. Bem is certainly not the only researcher to “prove” a phenomena that does not actually exist. This is why many science-based researchers and skeptics have argued that Bayes’ Theorem should be used when considering whether results are statistically significant in research involving paranormal phenomena that are extremely unlikely to exist. Next time a study like Bem’s is released and you read articles asking why p-value was used instead of Bayes’ Theorem, you’ll know exactly what they are talking about.

    • Great post! Statistics are so interesting, and so confusing to me. Luckily cats pix are soothing to a confused mind. :-)

    • Wait… there was something other than cat pictures in that article? I think my brain has finally blocked out math due to all the trauma it has caused me. Look I know you don’t want to hear my sad story. It really is sad… look that’s not important what is that math and I have come to terms with each other we avoid each other at events we may both show up at, and I retained the right to do basic math you know like where money is involved or cooking but that’s it.

      Oh and I am not psychic… and neither is anyone else…

      okay one last thing I am reasonably sure cats will take over the planet someday opposable thumbs or not.

      I’m not crazy

    • Jamie, I love this. When you explained this in person at SkepchickCON, something clicked and I finally understood Bayes’ Theorem. I’m so glad you were able to make it into a post. Wonderful job.

      • I’m glad you kept bugging me about it. This was originally a 3-post series on statistical significance but I had so much trouble with it and it went through 2 complete rewrites before I decided to just strip it of all but the Bayes stuff. Explaining math concepts in writing is so much more difficult than explaining it to someone in person.

    • That is an excellent, pass-around-in-class-able rundown on Bayes Theorem.

      But I want to know more about the 15% of new Skepchick bloggers that turn out to be spies. That’s not really true, is it?

      For one thing, only a certain percentage of spies are ever uncovered. Maybe one in 10. So, that 15% is subject to further analysis…

      • I think I know where the 15% number came from. I think it’s the posterior estimate of the probability of being a spy after observing how suave, sophisticated, witty, and charming your typical Skepchick is. See, spies are known for being all of the above, P(suave | spy) = 1. Based on my totally great guessing skills, I estimate the number of spies employed by the CIA to be around 5,000, which is 0.16% of the U.S. population of 3.2 million people, so with a little rounding P(spy) = 0.0015. And the Skepchicks are clearly in the top 1% of suave, sophisticated, witty, and charming-ness, P(suave) = 0.01. Plugging into Bayes’ Theorem, the probability of being a spy given that you’re as suave, sophisticated, witty, and charming as a typical Skepchick is given by:

        P(spy | suave) = P(suave | spy) * P(spy) / P(suave)
        P(spy | suave) = 1 * .0015 / .01
        P(spy | suave) = .15 = 15%

        So just another example of the power of Bayes’ Theorem!

    • Love this post! It shows the math behind Carl Sagan’s (popularization of the) quote “Extraordinary claims require extraordinary evidence.”

      If you’re making an extraordinary claim — something people think is very unlikely to start with, thus very low P(A) — you need very good evidence (way better than a 5% p-value) for P(A|B) to be convincing.

    • May i show this article my statistics class? Besides being a beautifully clear depiction of Bayes’ Theorem, its pretty damn funny, and i like to think my classmates will get a kick out of it.

    • This is fine as long as the researcher is genuinely open-minded about the truth. Since there is plenty of evidence from previous publications to suggest that Bem’s prior belief on precognition is very close to 100%, attempting to argue with him on this basis is comparable to an atheist arguing with a religious believer. And I suspect – given the number of depressing studies showing the extent of p-hacking and publication bias that seem to be appearing every week – that “doing whatever it takes to get my theory confirmed” instead of “attempting to find anything that might disprove my idea” may be more prevalent than we would like to think.

    • Great explanation! And really important not only for understanding pseudoscience but popular reporting of science more generally.

      Some of my colleagues (in neuroscience) actually advocate for the abolition of the p-value in peer-reviewed work, arguing that it’s a number that’s so prone to misinterpretation, even by professional scientists who ostensibly know what it means, that it is as likely to mislead the reader as it is to accurately communicate the scientific results. But then, I’m at an unusually Bayesian school — this is probably real minority opinion in the neuroscience community at large, so you may wish to set your priors appropriately (har har I made a stats joke).

    • Excellent article, Jamie. You are my favorite blogger.

    • Great description of Bayes’ theorem! However, you make a very common error in your discussion of p-values. You first state, correctly, that a p value of 5% means that if the student did not have psychic powers, they would pass the test 5% of the time. Unfortunately, you then restate this conclusion as “there would be a 5% probability that the student passed the psychic test by coincidence”. This is not true.

      In Bayesian terms, it is true that P(passing test | no psychic powers) = 0.05. However, you are interested in P(no psychic powers | passing test). P(A|B) is not the same as P(B|A).

      In plain terms, you want to know whether the students have psychic powers, knowing that the students passed the test. But all you know is that there is a 5% probability of passing the test, knowing that the students have no psychic powers.

      What’s worse is that there is no good way to actually determine whether the students have psychic powers given any experimental results. You can’t use Bayes theorem, because first you would need to determine P(no psychic powers) – and of course, that’s part of the unknown.

      This is a longstanding problem in statistics: scientists are forced to make with an expression of P(B|A), because P(A|B) is unknowable.

      • You’re actually restating the whole point of the article as though you were contradicting it, which makes me think you’ve misread it — another quick read might be warranted. Jamie put the incorrect interpretation “there would be a 5% probability that the student passed the psychic test by coincidence” in quotes, rhetorically attributed to the naive reader, precisely as the reasonable-seeming but actually-wrong interpretation which she then disproves, using the same reasoning from Bayes’ Theorem that you did.

      • Yes! That is… a restating of my entire post? The post was disproving the myth that if a test reaches statistical significance than it must be a true effect and vice versa. I apologize if it wasn’t clear. I did start out with some misconceptions so that I could later rip them apart.

        I do disagree that P(no psychic powers) is unknowable. We know from a history of tests of psychic powers that they are very, very unlikely to exist (approaching 0%) because no other test of psychic powers has ever found evidence of them. Therefore we would need a lot of tests like Bem’s that all come up with positive results consistently in replications before we could really even think about considering it a real effect. As Carl Sagan said (and a commenter pointed out above) “extraordinary claims require extraordinary evidence.” That’s Bayes’ Theorem in a nutshell.

        • My totally informal understanding is that one way to interpret Bayes’ Theorem is as a way of representing how we should rationally change our beliefs on the basis of new information, though I’m no statistician, and I hope you’ll correct me if I’m wrong, Jamie.

          That is, Skeptics who are familiar with tests of psychic powers might have the prior belief that P(psychic powers) is very, very small. People who are philosophically inclined toward a supernaturalist view of the universe might have a prior belief that P(psychic powers) is very large. People who haven’t ever put any thought into the issue might shrug and guess that P(psychic powers) = 50%. Then, Bem performs his experiment. No matter what your prior belief, you can apply Bayes’ rule to see how your beliefs should change to accommodate the new evidence.

          Part of the problem for Skeptics is that if somone’s prior belief is that P(psychic powers) = 50%, then maybe their posterior belief, represented by P(psychic powers | Bem), should actually be to favor psychic powers. So for an ignorant but rational person, it could actually be totally reasonable to conclude that Bem’s experiment is convincing that psychic powers probably exist. The Skeptic’s job, then, is to show this person that P(psychic powers | Bem AND decades of failed paranormal research AND known physics, biology, and psychology) really is actually very small. A further problem, of course, is that most people (including Skeptics) aren’t actually rational in this way, with all the cognitive biases we all know about, including the bias to favor whatever information comes first. Living as we do in a society where experiments like Bem’s get more attention than negative results do, most people are probably more likely to encounter the individual small positive result before they encounter the huge mass of negative results (if they ever do).

    • In order to use Bayes’ theorem, as you note, you need to calculate Box C and Box D. Both depend on the probability that Bem’s are psychic, given the test results, i.e. P(students psychic | test results). In other words, you need the type I error rate. You seem to assume that this is equal to the p-value, i.e. 1 – P(test results | students psychic). And you start your calculations with this assumption:

      “The p-value may suggest that there is a 95% chance that Bem’s students passed the test because they are truly psychic”.

      “the probability Bem’s test is wrong (5%)”

      Both of those statements are false. The p-value is *not* the type I error rate. In fact, determining the type I error rate usually requires making a bunch of extra assumptions.

      The p-value only tells you the probability that you *would have* gotten the same results if you assumed the null hypothesis were true. Loosely, the p-value pertains to a hypothetical world, whereas the type I error rate pertains to the real world.

      If I am just confusing you more, this explanation might help:
      http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values

      • Hi StackExchangeUser1,

        I’m not sure why you’re so interested in explaining “errors” which simply aren’t present in the original article. Once again, a more careful reading of what Jamie actually wrote should make it clear that she’s saying the same thing as you. You seem to have a strong… prior… that there must be an error in this article. Why is that?

        You wrote:

        In other words, you need the type I error rate. You seem to assume that this is equal to the p-value, i.e. 1 – P(test results | students psychic).

        But Jamie wrote:

        If the study had a p-value cut-off of 5%, that would mean that if students were truly guessing randomly and did not have any psychic powers, they would still “pass” the test 5% of the time just by coincidence.

        (Emphasis mine.) The p-value cut-off (commonly denoted by alpha) is the false positive rate. By definition. Under standard frequentist hypothesis testing, you get to pick the false positive rate, which is kind of the whole point of having a p-value cut-off. So when you say:

        In fact, determining the type I error rate usually requires making a bunch of extra assumptions.

        I have no idea what you mean.

        So yes, the p-value is not the false positive rate, but Jamie is pretty clearly not claiming that.

        I do agree with you that out of context, Jamie’s sentence “The p-value may suggest that there is a 95% chance that Bem’s students passed the test because they are truly psychic” is a bit misleading. In context, I think it’s pretty clear what she meant (particularly given that the next word is literally “but”), but I suppose she might have been more clear by writing “may seem to suggest”. I’m not really sure what you’re going for by picking apart the phrasing so finely here — under the larger structure of the argument, it’s obvious that she’s making the exact same point that you are.

        This is twice now that you’ve rather condescendingly explained an “error” that was actually just your own misreading of the article. If I am just confusing you more, this explanation might help: http://skepchick.org/2014/09/so-you-want-to-understand-bayes-theorem

    • Hi biogeo, I don’t mean to be condescending. I actually find this discussion quite interesting. But I still think you are wrong, I hope that’s ok.

      You cannot use a p-value to calculate the probability that a single experiment is a false positive. Don’t take my word for it, though:

      “Fisher was insistent that the significance level of a test had no ongoing sampling interpretation. With respect to the .05 level, for example, he emphasized that this does not indicate that the researcher “allows himself to be deceived once in every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained” (Fisher 1929, p. 191). For Fisher, the significance level provided a measure of evidence for the “objective” disbelief in the null hypothesis; it had no long-run frequentist characteristics.”(1)

      It’s true that you could ignore Fisher and, following Neyman and Pearson, use alpha/beta to guide your experiments. In this case, though, you are following an a priori procedure to ensure that your false positive rate never exceeds alpha. You are not allowed to examine individual results to determine the level of evidence. Your experiments could have a 4.999% probability of error or a 0.0001% probability of error, you’ll never know. Again:

      “Moreover, while Fisher claimed that his significance tests were applicable to single experiments (Johnstone 1987a; Kyburg 1974; Seidenfeld 1979), Neyman–Pearson hypothesis tests do not allow an inference to be made about the outcome of any specific hypothesis that the researcher happens to be investigating. The latter were quite specific about this: “We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis (Neyman and Pearson 1933, pp. 290-291). But since scientists are in the business of gleaning evidence from individual studies, this limitation of Neyman–Pearson theory is severe.”(1)

      To use Bayes’ theorem, you need to know the probability of error. I’m afraid neither the p-value approach (which does not give you the probability of error) nor the alpha approach (which only tells you its maximum value) will work.

      Source:
      (1) http://drsmorey.org/bibtex/upload/Hubbard:Bayarri:2003.pdf

      • Moderator(s), in retrospect I believe I linked to a source pdf which is not yet in the public domain. I’m sorry, I can’t seem to edit it. Please remove the link as necessary.

        The citation follows:

        The American Statistician. Volume 57, Issue 3, 2003. Confusion Over Measures of Evidence (p’s) Versus Errors (?’s) in Classical Statistical Testing. Raymond Hubbarda & M. J Bayarria

      • Sorry for taking a while to respond; I had some internet connection trouble last night.

        I believe that you didn’t mean to be condescending. But, it comes across as condescending when you respond to an article that has the argument structure “You might think X, but actually Y, because Z” by saying “Well, you’re wrong about X, it’s actually Y, because Z.” Of course, a simple misreading means anyone can do that once, but after doing it twice… well, perhaps you’ll understand why that updates my posterior belief.

        It’s totally okay that you think I’m wrong! I agree that this is an interesting topic for discussion, and exploring points of disagreement can be very informative. I’m having a bit of a hard time understanding what exactly you think I’m wrong about, though. You said, “You cannot use a p-value to calculate the probability that a single experiment is a false positive.” But, I never made any such claim. Nor would I make such a claim, because I think you’re absolutely right. In fact, as far as I can tell, no one here has made such a claim, so as far as I know we’re all in agreement here.

        I do appreciate your bringing in some discussion of the Fisher and Neyman-Pearson interpretations of frequentist stats. I will admit that I always have a bit of trouble keeping Fisher’s interpretations straight, as I find them a bit abstruse, so it’s helpful to have it laid out here. I find the Neyman-Pearson interpretation much more straightforward. I will say that I find their epistemology which you quoted (“no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis”), to be a philosophically sound position to hold, but so restrictive as to have little practical value, and I don’t think it’s an epistemological position we are required to take if we use frequentist statistics (if it were, statistics would be useless for science).

        You wrote:

        To use Bayes’ theorem, you need to know the probability of error.

        Again, in my opinion this is more of an epistemological issue rather than a statistical one per se. Certainly your position that P(B|A) cannot be known absolutely is consistent with the frequentist position, but in general Bayesians are quite comfortable using a less restrictive definition of “knowledge” and estimating P(B|A) according to some reasonable standard. At any rate, this strikes me as a rather technical point on which even professional statisticians disagree, and seems pretty far afield from Jamie’s article.

        • > in general Bayesians are quite comfortable using a less restrictive definition of “knowledge” and estimating P(B|A) according to some reasonable standard.

          Yes, I think this is the crux of the issue! Bayesian interpretation is a wonderful way to express the added value of new knowledge, but it requires a consensus on what the new knowledge actually is.

          For instance, you and I might have a hunch that Bem’s study has a false positive probability of ~5% (i.e. close to the upper limit imposed by alpha). But Bem could counter-argue that our intuition is wrong, and his intuition tells him that his work has only a ~1% false positive probability. There’s no easy way to resolve our differences. If he’s right, then the posterior probability of psychic powers existing would be 50%, not 16%. And note that I’m assuming that Bem is actually *skeptical* of psychics, and *shares* our 1% prior probability of the existence of psychic powers. The problem of priors is yet another issue, which needs no elaboration given your familiarity on the subject.

          In short, to me Bayesian analysis seems best suited when dealing with non-controversial issues (e.g. using a diagnostic test for medical decision-making), but it runs into serious problems when used to back up arguments in an open debate.

          • You wrote:

            For instance, you and I might have a hunch that Bem’s study has a false positive probability of ~5% (i.e. close to the upper limit imposed by alpha). But Bem could counter-argue that our intuition is wrong, and his intuition tells him that his work has only a ~1% false positive probability. There’s no easy way to resolve our differences.

            This is a very interesting point. My expectation would be that good science demands that we take the most conservative position when interpreting our statistics. I.e., if I a priori say, “I will be applying a p-value cutoff of 0.05 when performing hypothesis tests in this study,” then I should also be committed to saying “I will accept the largest possible false positive rate estimate consistent with this cutoff, namely, 5%.” I think I can see why that isn’t a requirement based on the stats alone, but it seems like bad science to me to argue for a more generous false positive rate than what is imposed by alpha.

            In short, to me Bayesian analysis seems best suited when dealing with non-controversial issues (e.g. using a diagnostic test for medical decision-making), but it runs into serious problems when used to back up arguments in an open debate.

            What’s funny is I think the Bayesians I know would make the exact opposite argument! I think they would say that frequentist analysis works fine when there’s general consensus on the methods and prior plausibility of a study’s possible outcomes, but that when there’s less general agreement about how we should proceed in interpreting a study, the Bayesian requirement that you make your assumptions very explicit aids in identifying points of disagreement.

          • For instance, you and I might have a hunch that Bem’s study has a false positive probability of ~5% (i.e. close to the upper limit imposed by alpha). But Bem could counter-argue that our intuition is wrong, and his intuition tells him that his work has only a ~1% false positive probability.

            No, that’s not how hypothesis testing works. I tracked down one of Bem’s papers, and in his pooled results he saw 2,790 hits out of 5,400 trials of “negative stimuli.” The Binomial test calculates the odds of that arising by chance as 0.7%.

            His false positive probability is thus 0.7%. No more, no less, and no room for hunches. Neither of us can assert otherwise without providing additional evidence; I could find evidence he falsified some results, he could find evidence he recorded his trial data wrong, and actually had more hits, and so on.

            • So we’ve been walked through one example, but here’s where I always get stuck: what do we do with that result in the future? If my understanding is correct, the final result gets incorporated into the interpretation of the next test. So consider the two opposing scenarios: Jamie states her probability is 16.1% of psychic powers existing (using the 5% alpha) based on Bem’s 2010 study and prior evidence, HJ working with different assumptions (0.7% alpha) and get a result of 58.9% probability of psychic powers existing. So there’s some disagreement–that’s to be expected as new data frequently causes controversy.

              We’ll settle it as all good scientists should by running another test. Let’s say Bem redoes his study exactly as he previously had. But this time the results are inconclusive, returning a nonsignificant pvalue = 0.30. How would this affect each person’s interpretation that psychic powers exist?

              • Sorry for the delay in response(s)!

                alwayscurious: It’s to be expected that new data cause new controversy. But here, everyone agrees on the new data, and in principle everyone agrees on the aggregate meaning of all of the old data (summarized as the prior probability of psychics existing). Surely we should be able to reach a logical consensus in such a straightforward case?

                biogeo: I’ve been thinking a while about your post, and I think I see a problem. You say that a conservative scientist would “accept the largest possible false positive rate estimate consistent with this cutoff, namely, 5%.” I would suggest that a conservative should only accept that FPR <= 5%. Isn't it more conservative to accept an inequality, with its maddening limitations, than force it into an identity? After all, science would look a lot different if we could actually solve for position using Heisenberg's inequality. Instead, we just accept the inequality, along with its epistemic implications.

                Hj Hornbeck: I'm afraid you've fallen for the same trap (also reinforced by Santosh Manicka). There is a 0.7% probability of getting Bem's results if the students are not psychic. That's not the same as a 0.7% false positive rate (i.e. 0.7% probability that the students are not psychic).

                Let me offer an imperfect analogy. Most medical researchers are familiar with the concepts of sensitivity, specificity, positive predictive value, and negative predictive value. Specificity, in particular, tells you what to expect *when you already know there is nothing to see*. Specificity can't be used to make useful predictions by itself. For that you need to combine it with an extra bit of information about the real world – disease prevalence – and derive the negative predictive value.

                P-values are like specificity. They tell you what to expect when you already know there is nothing to see. Sadly, unlike specificity, it's very difficult to derive any predictive value from p-values. There is no good analogue for disease incidence in experimental science.

            • StackExchangeUser1 wrote “But here, everyone agrees on the new data, and in principle everyone agrees on the aggregate meaning of all of the old data (summarized as the prior probability of psychics existing).”

              Sure in principle everyone should agree on a prior probability, but in practice they don’t. And there appears to be a debate about the exact nature of the probabilities in between. My question is this: Can a Bayensian approach eventually lead everyone to agree on a conclusion even if they have different starting points?

              If the original prior probability firmly defines the final outcome, than Bayes seems rather weak: I firmly assume that no one will ever get a perfect or agreed upon prior–different people read different literature, accept different assumptions, etc. But if the original prior probability can eventually be overcome by a collection of evidence, than I can really appreciate its strength. Sit down with someone, run through the same gambit of research results and hammer out a narrowly defined posterior probability. But I’ve yet to find a multi-study example that demonstrates how that would happen. [And I’m not statistically literate enough to work through an effective one on my own]

            • StackExchangeUser1 wrote:

              I would suggest that a conservative should only accept that FPR <= 5%. Isn't it more conservative to accept an inequality, with its maddening limitations, than force it into an identity?

              That’s an interesting point. I’m not particularly accustomed to seeing calculations involving false positive rates worked using inequalities, but it seems like there would be advantages to doing so. This actually fits with some (pretty informal) thinking I’ve been doing on the false negative rate regarding the reporting of negative results in science. It’s unfortunately very common (at least in my field) to read studies in which hypothesis tests yield p > .05, and the authors interpret this as “there is no effect”. Without having thought all that rigorously about it, it seems to me that it would typically be better to report an inequality on the effect size based on the false negative rate. I.e., something like “From our results we conclude that the effect of treatment X on observation Y is unlikely to be greater than E, if there is indeed any such effect.” I suspect it might be challenging to do this in a frequentist framework (though it should be possible), but it seems to my naive perspective a reasonably natural thing to do in a Bayesian framework. It has the advantage of turning a relatively weak negative statement (“we failed to reject the null hypothesis”) into a stronger positive statement (“the effect size is no greater than E”), which may have greater use in comparing similar studies and guiding future studies.

            • always curious:

              Can a Bayensian approach eventually lead everyone to agree on a conclusion even if they have different starting points?

              You bet. Here’s a spreadsheet to help demonstrate that. Download a copy, tweak the variables, and prove it to yourself.

              StackExchangeUser1:

              There is a 0.7% probability of getting Bem’s results if the students are not psychic. That’s not the same as a 0.7% false positive rate (i.e. 0.7% probability that the students are not psychic).

              I disagree. See my reply to Santosh Manicka for why.

    • Oops, my quotes actually came from a pre-publication version of that paper, and the last line of the second quote was omitted in the final version. The gist is the same, though.

    • Okay. So let me get this straight. You’ve got a 55/45 chance that it’s Bo Hopkins, and not Jerry Reed, on the Burt Reynold’s set. Then you’ve got a 33/33/33 probability split that it’s Dom DeLuise, Paul Prudhomme, or Luciano Pavarotti, in Dick Van Dyke’s kitchen behind the opera house. Burt Reynolds meets Dick Van Dyke for a drink. Aida is playing at the Met. Reed and Hopkins have aged equally poorly. There’s a Ford Gran Torino parked out back, big-block 7.0 429. Clint Eastwood is nowhere to be seen. But you hear that ooo-wooo-oo-wooo-oooooooo, from The Good The Bad & The Ugly.

      Quick, who do you toss the keys to, Bo or Jerry?

    • Your cat is adorable and there’s no such thing as psychics because math. That’s what I got out of the post.

    • Actually, you might have just made a good argument for pre-cognition, as I’m shopping around an article on statistical analysis that invokes both the box metaphor and Bayes’ Theorem. Mind you, thinking of probabilities in terms of partitioning the space of all possibilities isn’t all that new or unheard of, so P(coincidence | two articles) > P(precog | two articles).

    • I agree with “StackExchangeUser1″‘s questioning of the assumption that “95% chance that Bem’s students passed the test because they are truly psychic”. The p-value of 5% only suggests that the probability of passing the test (getting some x or more answers to be correct) while being not-a-psychic is 5%. That is Pr(Pass | Not psychic) = 5%. This does not imply that Pr(Pass | Psychic) = 95%, because no assumption is being about what the “ground truth” is, that is what Pr(Psychic) and Pr(Not psychic) are.

      To be precise, let me use the following notations:
      P = Psychic; P’ = Not psychic; T = Pass test; T’ = Fail test

      Pr(T) = Pr(T, P) + Pr(T, P’)
      = Pr(P).Pr(T | P) + Pr(P’).Pr(T | P’)

      We only know that Pr(T | P’) = 0.05
      In order to calculate Pr(T | P), we ought to know Pr(T), Pr(P) and Pr(P’).

      If and only if we (arbitrarily) assume these values are 50%, then it would turn out that:

      Pr(T | P) = 1 – Pr(T | P’) = 1 – 0.05 = 0.95 or 95%.

      • The p-value of 5% only suggests that the probability of passing the test (getting some x or more answers to be correct) while being not-a-psychic is 5%. That is Pr(Pass | Not psychic) = 5%. This does not imply that Pr(Pass | Psychic) = 95%, because no assumption is being about what the “ground truth” is, that is what Pr(Psychic) and Pr(Not psychic) are.

        From Bem’s experimental setup, specifically for experiment 2:

        As in Experiment 1, the participant was seated in front of the computer and asked to respond to the two items on the stimulus seeking scale. This was followed by the same 3-min relaxation period. Then, on each of 36 trials, the participant was shown a low-arousal, affectively neutral picture and its mirror image side by side and asked to press one of two keys on the keyboard to indicate which neutral picture he or she liked better. Using the Araneus Alea I hardware-based RNG, the computer then randomly designated one of the two pictures to be the target. Note that the computer did not determine which of the two neutral pictures would be the target until after the participant had registered his or her preference. Whenever the participant had indicated a preference for the target-to-be, the computer flashed a positively valenced picture on the screen subliminally three times. Whenever the participant had indicated a preference for the nontarget, the computer subliminally flashed a highly arousing, negatively valenced picture.

        Bolding mine. Bem carefully set up the conditions so that if people were not precognitive, their success rate would be 50% in the long run. So we know that given an infinite number of trials, Pr( T | P’ ) = 0.5 and Pr( T’ | P’ ) = 0.5. We don’t know Pr( T | P ) or Pr( T’ | P ), but we know neither can equal 50%, and that random variation can cause Pr( T | P’ ) != 0.5 for small numbers of trials. The solution is obvious: run the experiment, calculate Pr( T ), then determine Pr( Pr(T) | P’ ).

        I think that’s where you’re going wrong, by confusing Pr( T | P’ ) for Pr( Pr(T) | P’ ). We cannot dispute Pr(T), if the experiment was flawless, and as there are only two possibilities in the entire probability space (P and P’), it must follow that Pr( Pr(T) | P ) + Pr( Pr(T) | P’ ) = 1.

        Ergo, if Pr( Pr(T) | P’ ) = 5%, Pr( Pr(T) | P ) = 95% by definition.

        • I have two responses to make:

          1) By Pr(T) I meant the probability of *passing the test*. I haven’t read Bem’s paper, but I suspect that passing the test requires getting much more than 50% of the responses to be correct. If that is the case, then Pr(T) would be some value much less than 0.5. But I agree that if one randomly chooses their responses, they would be expected to get approximately 50% of them to be correct, assuming there is no other bias in the questions.

          2) I am not sure what you mean by Pr(Pr(T)). Whatever it might be, it still should satisfy the following if it is random variable:
          Pr(Pr(T)) = Pr(Pr(T), P) + Pr(Pr(T), P’)
          = Pr(P).Pr(Pr(T) | P) + Pr(P’).Pr(Pr(T) | P’)
          So, determining Pr(Pr(T) | P) still requires knowledge of Pr(P) and Pr(P’). Of course, it makes total sense to assume only P and P’ given the experimental conditions. But that says nothing about what proportions P and P’ are a priori.

          • 1) There is no “pass” or “fail” for the test itself, because Bem had no idea how strong the precognition effect is. All he knows is that it cannot be equivalent to blind-chance guessing, as I said last comment. There were no questions, either, just photos; the methodology portion I quoted is for one experiment, but similar to the others in the paper.

            2) What are the odds of determining the correct image? We can find that by doing multiple rounds, and dividing the total number of correct predictions (T) by the total number of predictions, which is denoted by Pr(T). If the person was guessing by blind chance, without precognition, then Pr(T) -> 50% as the number of trials reaches infinity. We’re not doing infinite trials, though, so we’ll get some number around 50%. So we need to take one more step, and calculate the odds, (Pr( …. )), of getting that successful test odds, (Pr(T)), assuming blind chance (Pr(T) | P), which translates to Pr( Pr(T) | P ).

            I already explained why Pr( Pr(T) | P ) + Pr( Pr(T) | P’ ) = 1, and I hinted that Pr(Pr(T)) = 1 (“We cannot dispute Pr(T), if the experiment was flawless”).

  • Load More