Results of The Great Apple Experiment
Since I began my own apple experiment last week, it appears as though now hundreds of apples have been subjected to cruel abuse and endearing flattery in the interest of scientifically testing Nikki Owens’ hypothesis that speaking to apples will affect their decomposition. I’m very interested in how they’re coming along . . . you can follow the progress on the Facebook group or on some YouTube channels like SkepticallyPwned, who really brought the emotion, and Hayley Stevens who FYI is adorable. I’ll post another blog entry to publicize their results (and any others I find with good controls) when they’re finished.
For my own experiment, I used three pieces of the same apple: one was spoken to lovingly, one with hate, and one with neutral statements (“the control apple of indifference”). If we assume that any three pieces of apple will decompose at varying rates, we could assume that each apple has a 1/3 chance of producing the desired result, so by chance alone we’d expect to see one apple of our three correspond. If Nikki Owens’ theory was correct, though, we would expect that 3 out of 3 apples would correspond, meaning that the love apple would look the best, the hate apple would look the worst, and the neutral apple would be somewhere in the middle.
Read on to discover if that’s what happened.
After I posted my first video, many people offered adjustments to the study in order to control for all variables. These included sterilizing the knife and cutting board to ensure no microbes would transfer to one piece but not another, and holding each jar for the same amount of time to control for variations caused by the transfer of heat from my hands.
The most interesting thing to me was how many of Nikki Owens’ fans offered adjustments, too. They suggested I use “love” and “hate” labels instead of mustaches, because mustaches are inherently funny and would cause the hate apple to “feel” better. They also suggested that I should use halves instead of quarters (because that’s what Owens did), and that I get angrier at the hate apple.
And you know what?
They’re absolutely right.
Many of us might laugh at the idea that mustaches instead of labels might affect the mood of an apple, because let’s face it, that’s completely absurd. The apple slices couldn’t even see the mustaches. If they’d had eyes. But think of it this way: these people, who bought into Owens’ story without question, were questioning me in a very skeptical way. Owens didn’t record her process, or even note how she prepared the apples and jars. She also didn’t acknowledge the statistics involved: with two apple halves, there is near 100% certainty that they will decompose in unique ways, which means that there’s a 50% chance that the “love” apple will look better than the “hate” apple.
My on-call mathematician Matt Parker commented on this statistical quirk, saying, “when people have done the ‘positive words’ experiment at home with two halves of an apple, 50% of people have gotten a strong ‘positive’ result, and 50% a strong ‘negative’. It’s not hard to guess which half went rushing to upload their photos and which half thought they might try it again. It explains the supply of seemingly supporting photos of apples halves.”
The point is that my experiment was much better than Owens’ because I described my process, considered the variables, used a control, and publicized every step. But as many people pointed out, my experiment was far from perfect. Even after I posted the photos here on Skepchick and asked you all to tell me which apple looked the best and which looked the worst, you all quite rightfully pointed out that there were too many shadows in the pics and that each apple slice should have been shot with direct light, preferably from multiple angles.
I think it’s awesome that so many of you and so many of Owens’ fans are thinking about this admittedly silly experiment in a critical manner, because it exposes exactly how much the press and believers missed when Owens presented her “results.”
That said, I’ll be repeating the experiment next week with all the controls suggested.
But first: on to the results!
Your votes were, according to Matt, were quite conclusive on both polls:
I’ll quote Matt’s results in full so you can see the statistics:
You don’t have to do much maths to see that Apple 3 has a mighty big bar. I’ve still crunched the numbers and your confidence value is off the proverbial chart. (Also: the literal chart. Yes, I drew a chart. It was off it.)
For both cases, Apple 3 definitely is the best (or least worst). There is a zero chance this is by chance, either it does look the best or everyone is playing a trick on you.
I then discounted damn Apple 3 and just looked at Apples 1 and 2. In both cases the difference between them is “very significant” in a statistical sense. Your voters had a very definite preference for Apple 1 looking better than Apple 2.
If you want to stick some numbers on it, if Apple 1 and 2 actually did look identical and voters were just picking at random which looks best, there is only a 0.36% chance you would that difference in votes (anything below 0.5% is “very significant”).
Matt has offered to write more about the statistics for a future post, for you math geeks.
So basically what this means is that Apple #3 looks the best, Apple #2 looks the worst, and Apple #1 is somewhere between the two.
Though my photos weren’t ideal, I will say I photographed each slice in the same place under the same lighting, and I’ll also mention that your results mirror my own in-person notes. Apple #3 was just about as discolored as the other two but had no mold spores. Apple #1 had a spot of mold, and Apple #2 definitely had the most mold and the most discoloration.
The results?
Exactly what we’d expect by chance! Apple #3, voted the best looking, was in fact the love apple. But the control apple, which received no love and no hate, was Apple #2, voted the worst looking. The hate apple, Apple #1, was somewhere in between.
Conclusion? The experiment failed to prove Nikki Owens’ hypothesis that saying loving or hateful things to an apple will affect its rate of decay.
Did you do an experiment? Leave the link in the comments and I’ll include it in a follow-up post!
I wonder if there could be a prayer variant to this test, in that volunteers could pray for the health of one slice, and for the destruction of another.
In any event, this is all wonderful, and while I have not yet done the experiment, I am thinking of doing so.
Also, I was wondering if quantity could be a factor as well. What if I did the experiment at work, and asked my coworkers to add their hate or love to the slices. It might be difficult to control for that, but it could be an interesting variation.
Not that I predict any difference to the result, mind you….
Maybe the apples prefer negative attention to no attention at all. Your apple just doesn’t want to be ignored!
Your experiment might show that saying loving things can have a positive effect, but there’s no negative effect for saying hateful things.
I guess apples have thick skin.
@Zapski:
If you do a prayer experiment, you can’t use apples, you’ll have to use bananas.
My experiment resulted in the quarantined control piece becoming more rotten than the others. If we go with Nikki’s hypothesis this means living on your own in my closet will make you look nasty.
Hmm.
@Hayley: not true. I’ve been living in your closet for years, and I don’t look THAT nasty.
…Ok, maybe I do.
So you do an experiment and prove that when you say loving things to an apple and give it a nice mustache it looks prettier, and you conclude that there’s no effect!!?? Where’s the skepticism? You just demonstrated that 72% of people will think an apple slice looks better when it’s told loving things EVEN WHEN THEY DIDN’T KNOW THAT. I’m citing this study as positive in my next Youtube video.
Happy April 1st.
I wonder if the results were biased by forcing people to choose – I wonder how many people, given a ‘they all look the same to me, maybe Nikki Owen’s talking out of her arse’ option, would have chosen it.
Just something to ponder if anyone repeats the experiment.
@Narvi: I want rent. Now.
I totally predicted a comment like Banyans to show up. For people like him or Nikki Owens, it simply doesnt matter what the stats, math or logic show.
That said, this test could have had the the correct Owens expected result and still not proven her hypothesis to be correct.
For that, you need to compile the data from tens of trials. I hope you will be able to do that from the other people who were doing this.
And to correct Banyan,
No this experiment was not to prove (or disprove) that when you say nice things to an apple it looks better. It was to give evidence to the hypothesis that “apples are effected by what you say to them”
What you just did is called cherry picking. Its pretty much the most basic of fallacies.
@Techskeptic: And what you just did is ignore banyan‘s “Happy April 1st”, also known as the Fool’s Fallacy.
So I left this comment on the video as well…
LOL! I did. I read it on my iphone and missed that.
[feels small and foolish]
Then I predict someone else will say exactly that same thing and mean it.
@Kimbo Jones: /snort Yes exactly!
@Marsh: I agree… there should be 4 choices available.
@Marsch
I would have definitely chosen “they all look the same to me” option. The only reason I clicked number 3 as the best looking one was because I thought it looked a little whiter than the rest. As for the other two, I guessed.
@Hayley:
“If we go with Nikki’s hypothesis this means living on your own in my closet will make you look nasty.”
Do you have a problem with gay apples?
Just because they don’t look attractive to you doesn’t mean they’re not attractive to someone.
Rebecca, when you re-do the test, I think you should make a point of periodically eating an apple slice in front of a hate apple. If you have a particularly suspicious apple, it might not believe your hate speech — but if you show how far you’re willing to go, it’ll know you’re not fucking around.
Meh. Techskeptic is right; you need more than one trial before you can conclude “The experiment failed to prove Nikki Owens’ hypothesis” or conclude anything at all, no matter what the statistical strength of the ordering. For 3 apple pieces you have 3! = 6 possible orderings, i.e. 1 chance in 6 that your pieces will be classified in Owens’ ‘right’ order. That it didn’t in a single trial is not terribly surprising; if it had that wouldn’t be terribly surprising either. You need a lot more trials to obtain statistics on the correlation between how you speak to apple pieces and how rapidly they age.
That said, if you sliced me into three pieces I would be pretty pissed no matter what you said to me.
@dahduh: Er, no, in fact I can conclude the experiment failed to prove her hypothesis. Because that’s what happened.
Yes, more experiments would be nice (which is why there are more going on), but the fact remains that this experiment did not support the hypothesis.
The results prove that Love is a much stronger force in the world than Hate.
Duh.
The largest issue here is statistical. As you point out, with only a 50/50 chance, confirmation bias is a very large problem. Adding a control still leaves a great deal of room for chance to slip into the results. What you need is larger sample size. With many people doing this experiment for you, you can do a metasurvey and attempt to use all of their results together, but that leaves room for many variables and variability of study quality.
What you need is one large study of apples.
Start with a large sample of apples. Say, a dozen.
Sterilize your work area, and wear gloves (without the powder, it may have an effect on the apples and the amount on the gloves would decrease with time).
Sterilize a very sharp knife. A dull knife requires more pressure and could lead to bruising along the edges.
Cut each apple into thirds, or quarters and discard one.
Place each apple into a container with the appropriate label, but that you cannot see into. This prevents you from biasing the results by subconsciously decreasing praise or criticism if it looks like the apples are changing too much or too little.
Each apple slice should be alone, not with its fellow emotional category members. As apple’s decay, they give off a gas that causes fruit to decay faster, which is where the saying about one bad apple ruining the bunch comes from. All it would take is one apple decaying faster to skew the results for all of a category.
The boxes of apple slices should be kept in a controlled area where they will all experience the same temperature and humidity. Do not touch or move the boxes when giving them their praise or criticism.
This would provide a far more statistically significant data set, that can easily be further improved by increasing the number of apples used beyond one dozen. With a large enough sample size, you can start to make a statement about whether or not Ms. Owens claims are based on fact, or based on confirmation bias and anecdote.
Now see, the elaborate testing protocols you have to set up just to (dis)prove one stupid inane claim from some batshit crazy woman should help expose the problems in trying to debunk something as complicated and pervasive as homeopathy or accupuncture.
If you want to do a proper test, you’re gonna be spending quite a bit more money and time on this than this one woman and her wasted apple did.
The biggest problem is the creduloids get the cheaper, quicker, easier alternative. Sure, we could set the record straight, but by the time we get our results, they’ve moved on to something else already, and we’re out a bunch of money and time (and probably not a single step closer in convincing anyone). Not to mention lagging behind trying to disprove the next crazy notion on the list.
Will there ever come a time when we’re finally allowed to outright dismiss someone’s harebrained ideas UNTIL they have done this kind of testing themselves?
I agree whole-heartedly. The cost of a dozen apples plus separate containers and the time to prep, sterilize, praise/berate, and then analyze the data is far greater than the investment needed to cut up one apple and then leap to grand conclusions without attempting to analytically examine the results. The cost of the greater experiment need not be borne by the skeptical community, though.
Luckily, in this case the experiment is far easier than doing clinical trials with animals or humans, and far cheaper.
In a case such as this, those who believe Ms. Owen’s theories should leap at the opportunity to prove it. If you are afraid to test it, how solid is your belief? And if it proves false, wouldn’t they rather know?
When presented with an absurd claim, we should analyze that claim and provide a clear methodology for testing it; one that is designed to prove or disprove the principle and that is kept as simple and affordable as possible. The goal is not to make the testing burdensome, but to allow them to test their theories in a way that prevents fallacy as much as possible.
If Ms. Owen or one of her followers wants to repeat the experiment in a controlled and statistically significant way, we should encourage that. If, after doing a proper controlled and statistically significant study, their results continue to support their claim, then, and only then, should the skeptical community make the investment to repeat the experiment.
@Quintero:, the only reason we have this problem with homeopathy and accupuncture is because they refuse to follow this exact same mode of operation WRT testing (unlike any other medical product or procedure, for which it is required by law).
Which, come to think of it, is only fair as the company that produces a remedy is also the one that’s going to make a profit off of it.
Homeopathy doesn’t have this burden because they weasel out of the “medicine” definition at first to avoid the testing phase, yet they try to sneak back in later on to reap the benefits of marketing their products as “medecines” or “cures” (even though they never tested this claim and thus don’t deserve that title).
It’s like making a brass plaque and attaching it to your door then claim you’re a doctor because the plaque says so, and not because you graduated from university after half a decade of studying.
(Come to think of it, the homeopaths do that too. Sneaky bastards …)
*sigh*
Rigorously, only one instance of testing proves nothing. But what this one actually suggests to me is what many people are saying: indiference affects apples worse than hate. You still have love having an overwhelmingly positive effect.
I do believe with enough experiments you would be able to zero the results around control. I don’t believe emotions affect apples at all. But I don’t feel you pointed in that direction with this one result.
I agree with Elyse. The apples prefer attention but will thrive on more positive attention than negative attention. Imagine if your mum was very loving to you. You would thrive. Imagine if your mum was always criticizing you? You would flag but not as much if she completely ignored you.
The experiment reflects real life. An apple a day keeps the doctor away and always remember to pray before eating. The food appreciates a nice prayer.
Unfortunately all the pictures are broken :(