Afternoon Inquisition

Sunday AI: Correlations

Google Labs has released a new tool:

Google Correlate is an experimental new tool on Google Labs which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.”

Basically, Google takes a search term that you enter (“insects”) and examines search patterns for other search terms in its database to calculate a correlation coefficient (r).  It’s an extension of Google Trends; it’s looking to see which search terms trend together.

In case you barely remember that statistics class from your misspent youth, the correlation coefficient is a value between -1 and +1.  The closer to ±1 the r value is, the more closely correlated the patterns are.  The closer the value to zero, the less the two patterns are related.

Of course, we’ve all heard the “correlation is not causation” trope a million times. It’s especially true here; when you don’t even have a hypothesis about a relationship, the data points are just amusing.

For example, searches for “birds” appear to be correlated with “Eliza Doolittle.”

Of course, you know what I searched for.   “Pubic Lice” is most strongly correlated with….a host of civil war search terms.

I am frankly rather baffled about why these search terms should be seasonally correlated, unless Civil War Re-enactors are taking things a little too seriously in their search for authenticity.

Give it a spin–what fun correlations can you discover with Google Correlate?

 

Bug_girl

Bug_girl has a PhD in Entomology, and is a pointy-headed former academic living in Ohio. She is obsessed with insects, but otherwise perfectly normal. Really! If you want a daily stream of cool info about bugs, follow her Facebook page or find her on Twitter.

Related Articles

26 Comments

  1. The civil war correlates LMAO!!!, its just so funny on so many levels.

    I would love to play around with this it looks more fun than ngram viewer, but I have finals next week and am using my non-existent free time to fight creationists

  2. Seasonal forcing in things like the civil war, or martin luther king jr, etc follow from the school year. There is a HUGE wintertime dip as kids go on their winter breaks, and a mild summer dip for summer vacation (during which time some kids take summer classes, or have summer reading lists or whatever, hence the less impressive dip).

    I discovered this while comparing “butter” and “exercise” on google trends. Huge spike in “butter” around xmas and thanksgiving, followed with a spike in “exercise” due to new years resolutions and the post-holiday pounds.

  3. You get that seasonal school-forced pattern with most science topics (confounded by pseudoscience topics, quantum woo, etc). “neuron” for example gives the same thing. Interestingly, “magnets” and “does music affect plant growth” are highly correlated. Science fair stuff, I guess.

  4. I had been hoping that “montana” would return at least something with Frank Zappa. The actual correlations were much more disturbing.

  5. “unless Civil War Re-enactors are taking things a little too seriously in their search for authenticity.”

    Knowing some serious re-enactors I am not sure you can rule that out :-)>

  6. “Porn” “Pornography” “Fucking” and “Sex” as well as “Vagina” all come up with

    “No results found for {X}. Suggestions:”

    Seeing as these are easily some of the most searched terms on google, I was hoping they wouldn’t self censor. Thanks for being douchebags, google.

    1. Google also considers “Gay” and “Lesbian” to be bad words as well.

  7. Hi bug_girl and all. I’d like to point out that the way Google Correlate is being used here is not the primary way it’s supposed to be used.

    The real input to it is a time series with values, to generate as output some terms whose search patterns correlate with that time series.

    As a way to make it easy to use right away, it provides the search box so you can input some terms and get out the time series values for that term, which then is automatically fed to Google Correlate to generate the terms matching that trend.

    That part is what has been shown in this post, but the interesting part is to provide external time series that may be interesting like in the case of Flu Trends, cases of flu per day.

    The tutorial explains a bit about how to input the data: http://correlate.googlelabs.com/tutorial/

    1. You are absolutely right Isaac. However, we are having a lot of fun using it incorrectly, and I am sure no one thinks there is a causal relationship between these terms.

      Well, I hope no one does.

      1. Maybe if you think your home is liable to be invaded in the middle of the night, you’re less inclined to sleep naked, and more inclined to sleep with a gun.

  8. I found that ‘avocado’ correlates with ‘for rent’, and that ‘kitteh’ correlates with…

    1. 0.9328bluecotton
    2. 0.9310accidental facial
    3. 0.9285texting program
    4. 0.9275paypal account number
    5. 0.9245dealer services
    6. 0.9222sfmta
    7. 0.9204unas
    8. 0.9201ratatat ratatat
    9. 0.9201tierra cali
    10. 0.9200philips color kinetics
    11. 0.9199make it so
    12. 0.9198??
    13. 0.9196yelp seattle
    14. 0.9194myvetsmeds
    15. 0.9190celebrity baby scoop
    16. 0.9186emo girl
    17. 0.9185ashford university
    18. 0.9185lo mas nuevo
    19. 0.9185whos in jail
    20. 0.9184baby scoop
    21. 0.9183cummins forum
    22. 0.9182convert xps
    23. 0.9182loving someone
    24. 0.9181aries man
    25. 0.9178free lightroom
    26. 0.9178county inmate search
    27. 0.9176m1330 specs
    28. 0.9176option one credit union
    29. 0.9175that look
    30. 0.9174edit google
    31. 0.9174save text messages
    32. 0.9172fiddler2
    33. 0.9172email to text
    34. 0.9172the sharpest rides
    35. 0.9171a youtube
    36. 0.9171elitefts
    37. 0.9167how to check
    38. 0.9165find youtube
    39. 0.9165sharpest rides
    40. 0.9164soapui
    41. 0.9164for sale craigslist
    42. 0.9164watchuseek
    43. 0.9164stouffers dinner
    44. 0.9164hsbc premier
    45. 0.9163meijer pharmacy
    46. 0.9163how much is the
    47. 0.9163san jose yelp
    48. 0.9162wilson bank and trust
    49. 0.9162fleshjack
    50. 0.9162godaddy email
    51. 0.9162house music blog
    52. 0.9162taurus man
    53. 0.9162mixed chicks
    54. 0.9161ex gf
    55. 0.9159pa docket
    56. 0.9159birthday youtube
    57. 0.9159colored lines
    58. 0.9159zolpidem side effects
    59. 0.9159light brown
    60. 0.9159rockstar rims
    61. 0.9158johns creek ga
    62. 0.9157drupal jquery
    63. 0.9157chicago yelp
    64. 0.9157u pull and pay
    65. 0.9157linda ikeji
    66. 0.9156how to not
    67. 0.9156facepalm
    68. 0.9156besties
    69. 0.9156radgrid
    70. 0.9156wyoming at work
    71. 0.9155vintage photoshop
    72. 0.9154cmdlets
    73. 0.9152yelp chicago
    74. 0.9152part1
    75. 0.9152make someone
    76. 0.9152ms krazie
    77. 0.9152asian homemade
    78. 0.9151to check
    79. 0.9151btjunkie
    80. 0.9151prodigy msn
    81. 0.9150cedar rapids craigslist
    82. 0.9150sudoku kingdom
    83. 0.9150artists similar to
    84. 0.9149how to stop
    85. 0.9149girl plays
    86. 0.9149observablecollection
    87. 0.9147blue cotton
    88. 0.9147youtube train
    89. 0.9146take snapshot
    90. 0.9146json_encode
    91. 0.9146blackberry keypad
    92. 0.9146slow mo
    93. 0.9146how much is
    94. 0.9145recover deleted text
    95. 0.9145girl strips
    96. 0.9145powershell for
    97. 0.9145deleted text messages
    98. 0.9144or gtfo
    99. 0.9144hot emo

    Whoops, don’t know how I managed to reproduce ALL of them, since only the ones down to ‘baby scoop’ were even displayed, but… almost all of these are incomprehensible to me!

    Also… ‘montana’! Shudder!

  9. The best correlate with ‘Atheist’ is ‘very’. The rest are also highly varied, as it is pretty much a secular trend rather than cyclic.

  10. “Science” correlates best with “Kitten behaviour”!
    (but it is still only r=0.74)

    However, “mad scientist” is rather better correlated with “mini lobster” (r=0.93). Fear the venomous flying mini lobsters with lasers!

    Here’s a scary one: “truth”s best correlate is “google adsense”.

    Bizarrely, ‘quark’ has been declining for years, has no obvious academic year correlation, and best matches “midi file” (r=0.98!)

  11. Put in beer and you get an interesting list of correlations including ice cube trays 0.8190, cheap sports 0.8082, and dog benadryl 0.8040; my favorites are cat peeing 0.8131 and, of course, cat pooping 0.7964 this make perfect sense, beer makes me need to pee more that poop too.
    Did I just overshare?

  12. Skepchick number one correlation was star celeb at 0.9286. Well, that’s really no surprise, the Skepchicks are star celebs!! Oddly, when I added an “s” at the end and tried Skepchicks, it correlated best with doggy steps. Hmm..not sure how to interpret that.

  13. Correlated with penile:

    0.8828 wallpaper designs
    0.8811 ford explorer problems
    0.8808 images for websites
    0.8777 samples free
    0.8774 vortec engine
    0.8708 beads wholesale
    0.8692 find people by phone number
    0.8658 fax number lookup
    0.8632 belly pictures
    0.8596 repair estimates

Back to top button

Discover more from Skepchick

Subscribe now to keep reading and get access to the full archive.

Continue reading