Afternoon Inquisition

Sunday AI: Correlations

Google Labs has released a new tool:

Google Correlate is an experimental new tool on Google Labs which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.”

Basically, Google takes a search term that you enter (“insects”) and examines search patterns for other search terms in its database to calculate a correlation coefficient (r).  It’s an extension of Google Trends; it’s looking to see which search terms trend together.

In case you barely remember that statistics class from your misspent youth, the correlation coefficient is a value between -1 and +1.  The closer to ±1 the r value is, the more closely correlated the patterns are.  The closer the value to zero, the less the two patterns are related.

Of course, we’ve all heard the “correlation is not causation” trope a million times. It’s especially true here; when you don’t even have a hypothesis about a relationship, the data points are just amusing.

For example, searches for “birds” appear to be correlated with “Eliza Doolittle.”

Of course, you know what I searched for.   “Pubic Lice” is most strongly correlated with….a host of civil war search terms.

I am frankly rather baffled about why these search terms should be seasonally correlated, unless Civil War Re-enactors are taking things a little too seriously in their search for authenticity.

Give it a spin–what fun correlations can you discover with Google Correlate?



Bug_girl has a PhD in Entomology, and is a pointy-headed former academic living in Ohio. She is obsessed with insects, but otherwise perfectly normal. Really! If you want a daily stream of cool info about bugs, follow her Facebook page or find her on Twitter.

  1. The civil war correlates LMAO!!!, its just so funny on so many levels.

    I would love to play around with this it looks more fun than ngram viewer, but I have finals next week and am using my non-existent free time to fight creationists

  2. Seasonal forcing in things like the civil war, or martin luther king jr, etc follow from the school year. There is a HUGE wintertime dip as kids go on their winter breaks, and a mild summer dip for summer vacation (during which time some kids take summer classes, or have summer reading lists or whatever, hence the less impressive dip).

    I discovered this while comparing “butter” and “exercise” on google trends. Huge spike in “butter” around xmas and thanksgiving, followed with a spike in “exercise” due to new years resolutions and the post-holiday pounds.

  3. You get that seasonal school-forced pattern with most science topics (confounded by pseudoscience topics, quantum woo, etc). “neuron” for example gives the same thing. Interestingly, “magnets” and “does music affect plant growth” are highly correlated. Science fair stuff, I guess.

  4. “Porn” “Pornography” “Fucking” and “Sex” as well as “Vagina” all come up with

    “No results found for {X}. Suggestions:”

    Seeing as these are easily some of the most searched terms on google, I was hoping they wouldn’t self censor. Thanks for being douchebags, google.

  5. Hi bug_girl and all. I’d like to point out that the way Google Correlate is being used here is not the primary way it’s supposed to be used.

    The real input to it is a time series with values, to generate as output some terms whose search patterns correlate with that time series.

    As a way to make it easy to use right away, it provides the search box so you can input some terms and get out the time series values for that term, which then is automatically fed to Google Correlate to generate the terms matching that trend.

    That part is what has been shown in this post, but the interesting part is to provide external time series that may be interesting like in the case of Flu Trends, cases of flu per day.

    The tutorial explains a bit about how to input the data:

    1. You are absolutely right Isaac. However, we are having a lot of fun using it incorrectly, and I am sure no one thinks there is a causal relationship between these terms.

      Well, I hope no one does.

    Whoops, don’t know how I managed to reproduce ALL of them, since only the ones down to ‘baby scoop’ were even displayed, but… almost all of these are incomprehensible to me!

    Also… ‘montana’! Shudder!

  7. “Science” correlates best with “Kitten behaviour”!
    (but it is still only r=0.74)

    However, “mad scientist” is rather better correlated with “mini lobster” (r=0.93). Fear the venomous flying mini lobsters with lasers!

    Here’s a scary one: “truth”s best correlate is “google adsense”.

    Bizarrely, ‘quark’ has been declining for years, has no obvious academic year correlation, and best matches “midi file” (r=0.98!)

  8. Put in beer and you get an interesting list of correlations including ice cube trays 0.8190, cheap sports 0.8082, and dog benadryl 0.8040; my favorites are cat peeing 0.8131 and, of course, cat pooping 0.7964 this make perfect sense, beer makes me need to pee more that poop too.
    Did I just overshare?

  9. Skepchick number one correlation was star celeb at 0.9286. Well, that’s really no surprise, the Skepchicks are star celebs!! Oddly, when I added an “s” at the end and tried Skepchicks, it correlated best with doggy steps. Hmm..not sure how to interpret that.

  10. Correlated with penile:

    0.8828 wallpaper designs
    0.8811 ford explorer problems
    0.8808 images for websites
    0.8777 samples free
    0.8774 vortec engine
    0.8708 beads wholesale
    0.8692 find people by phone number
    0.8658 fax number lookup
    0.8632 belly pictures
    0.8596 repair estimates

