Sunday AI: Correlations
Google Labs has released a new tool:
“Google Correlate is an experimental new tool on Google Labs which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.”
Basically, Google takes a search term that you enter (“insects”) and examines search patterns for other search terms in its database to calculate a correlation coefficient (r). It’s an extension of Google Trends; it’s looking to see which search terms trend together.
In case you barely remember that statistics class from your misspent youth, the correlation coefficient is a value between -1 and +1. The closer to ±1 the r value is, the more closely correlated the patterns are. The closer the value to zero, the less the two patterns are related.
Of course, we’ve all heard the “correlation is not causation” trope a million times. It’s especially true here; when you don’t even have a hypothesis about a relationship, the data points are just amusing.
For example, searches for “birds” appear to be correlated with “Eliza Doolittle.”
Of course, you know what I searched for. “Pubic Lice” is most strongly correlated with….a host of civil war search terms.
I am frankly rather baffled about why these search terms should be seasonally correlated, unless Civil War Re-enactors are taking things a little too seriously in their search for authenticity.
Give it a spin–what fun correlations can you discover with Google Correlate?
Kittens and mango chutney
http://correlate.googlelabs.com/search?e=kittens&t=weekly#
The civil war correlates LMAO!!!, its just so funny on so many levels.
I would love to play around with this it looks more fun than ngram viewer, but I have finals next week and am using my non-existent free time to fight creationists
creationist and the population of Alaska: http://correlate.googlelabs.com/search?e=creationist&t=weekly#
Also: skepchick and star celeb :D
http://correlate.googlelabs.com/search?e=skepchick&t=weekly#
I let the adolescent out:
http://correlate.googlelabs.com/search?e=poop&t=weekly
poop & your nose?
All 8 correlations with “gravitas” are variations of “what does gravitas mean?”
http://correlate.googlelabs.com/search?e=gravitas&t=weekly#default,20
Seasonal forcing in things like the civil war, or martin luther king jr, etc follow from the school year. There is a HUGE wintertime dip as kids go on their winter breaks, and a mild summer dip for summer vacation (during which time some kids take summer classes, or have summer reading lists or whatever, hence the less impressive dip).
I discovered this while comparing “butter” and “exercise” on google trends. Huge spike in “butter” around xmas and thanksgiving, followed with a spike in “exercise” due to new years resolutions and the post-holiday pounds.
You get that seasonal school-forced pattern with most science topics (confounded by pseudoscience topics, quantum woo, etc). “neuron” for example gives the same thing. Interestingly, “magnets” and “does music affect plant growth” are highly correlated. Science fair stuff, I guess.
I had been hoping that “montana” would return at least something with Frank Zappa. The actual correlations were much more disturbing.
“unless Civil War Re-enactors are taking things a little too seriously in their search for authenticity.”
Knowing some serious re-enactors I am not sure you can rule that out :-)>
“Porn” “Pornography” “Fucking” and “Sex” as well as “Vagina” all come up with
“No results found for {X}. Suggestions:”
Seeing as these are easily some of the most searched terms on google, I was hoping they wouldn’t self censor. Thanks for being douchebags, google.
Google also considers “Gay” and “Lesbian” to be bad words as well.
pregnancy and lower back tatoos. (.7425)
Hi bug_girl and all. I’d like to point out that the way Google Correlate is being used here is not the primary way it’s supposed to be used.
The real input to it is a time series with values, to generate as output some terms whose search patterns correlate with that time series.
As a way to make it easy to use right away, it provides the search box so you can input some terms and get out the time series values for that term, which then is automatically fed to Google Correlate to generate the terms matching that trend.
That part is what has been shown in this post, but the interesting part is to provide external time series that may be interesting like in the case of Flu Trends, cases of flu per day.
The tutorial explains a bit about how to input the data: http://correlate.googlelabs.com/tutorial/
You are absolutely right Isaac. However, we are having a lot of fun using it incorrectly, and I am sure no one thinks there is a causal relationship between these terms.
Well, I hope no one does.
Gun sales and pyjamas
http://correlate.googlelabs.com/search?e=gun+sales&e=pyjamas&t=weekly#
I just realized that both spiked after the last election; I can understand the paranoid gun sales spike but pyjamas? WTF?
Maybe if you think your home is liable to be invaded in the middle of the night, you’re less inclined to sleep naked, and more inclined to sleep with a gun.
Ear lobe and pre-owned cars!
I found that ‘avocado’ correlates with ‘for rent’, and that ‘kitteh’ correlates with…
1. 0.9328bluecotton
2. 0.9310accidental facial
3. 0.9285texting program
4. 0.9275paypal account number
5. 0.9245dealer services
6. 0.9222sfmta
7. 0.9204unas
8. 0.9201ratatat ratatat
9. 0.9201tierra cali
10. 0.9200philips color kinetics
11. 0.9199make it so
12. 0.9198??
13. 0.9196yelp seattle
14. 0.9194myvetsmeds
15. 0.9190celebrity baby scoop
16. 0.9186emo girl
17. 0.9185ashford university
18. 0.9185lo mas nuevo
19. 0.9185whos in jail
20. 0.9184baby scoop
21. 0.9183cummins forum
22. 0.9182convert xps
23. 0.9182loving someone
24. 0.9181aries man
25. 0.9178free lightroom
26. 0.9178county inmate search
27. 0.9176m1330 specs
28. 0.9176option one credit union
29. 0.9175that look
30. 0.9174edit google
31. 0.9174save text messages
32. 0.9172fiddler2
33. 0.9172email to text
34. 0.9172the sharpest rides
35. 0.9171a youtube
36. 0.9171elitefts
37. 0.9167how to check
38. 0.9165find youtube
39. 0.9165sharpest rides
40. 0.9164soapui
41. 0.9164for sale craigslist
42. 0.9164watchuseek
43. 0.9164stouffers dinner
44. 0.9164hsbc premier
45. 0.9163meijer pharmacy
46. 0.9163how much is the
47. 0.9163san jose yelp
48. 0.9162wilson bank and trust
49. 0.9162fleshjack
50. 0.9162godaddy email
51. 0.9162house music blog
52. 0.9162taurus man
53. 0.9162mixed chicks
54. 0.9161ex gf
55. 0.9159pa docket
56. 0.9159birthday youtube
57. 0.9159colored lines
58. 0.9159zolpidem side effects
59. 0.9159light brown
60. 0.9159rockstar rims
61. 0.9158johns creek ga
62. 0.9157drupal jquery
63. 0.9157chicago yelp
64. 0.9157u pull and pay
65. 0.9157linda ikeji
66. 0.9156how to not
67. 0.9156facepalm
68. 0.9156besties
69. 0.9156radgrid
70. 0.9156wyoming at work
71. 0.9155vintage photoshop
72. 0.9154cmdlets
73. 0.9152yelp chicago
74. 0.9152part1
75. 0.9152make someone
76. 0.9152ms krazie
77. 0.9152asian homemade
78. 0.9151to check
79. 0.9151btjunkie
80. 0.9151prodigy msn
81. 0.9150cedar rapids craigslist
82. 0.9150sudoku kingdom
83. 0.9150artists similar to
84. 0.9149how to stop
85. 0.9149girl plays
86. 0.9149observablecollection
87. 0.9147blue cotton
88. 0.9147youtube train
89. 0.9146take snapshot
90. 0.9146json_encode
91. 0.9146blackberry keypad
92. 0.9146slow mo
93. 0.9146how much is
94. 0.9145recover deleted text
95. 0.9145girl strips
96. 0.9145powershell for
97. 0.9145deleted text messages
98. 0.9144or gtfo
99. 0.9144hot emo
Whoops, don’t know how I managed to reproduce ALL of them, since only the ones down to ‘baby scoop’ were even displayed, but… almost all of these are incomprehensible to me!
Also… ‘montana’! Shudder!
The best correlate with ‘Atheist’ is ‘very’. The rest are also highly varied, as it is pretty much a secular trend rather than cyclic.
“Science” correlates best with “Kitten behaviour”!
(but it is still only r=0.74)
However, “mad scientist” is rather better correlated with “mini lobster” (r=0.93). Fear the venomous flying mini lobsters with lasers!
Here’s a scary one: “truth”s best correlate is “google adsense”.
Bizarrely, ‘quark’ has been declining for years, has no obvious academic year correlation, and best matches “midi file” (r=0.98!)
Put in beer and you get an interesting list of correlations including ice cube trays 0.8190, cheap sports 0.8082, and dog benadryl 0.8040; my favorites are cat peeing 0.8131 and, of course, cat pooping 0.7964 this make perfect sense, beer makes me need to pee more that poop too.
Did I just overshare?
I tried to correlate overshare, but came up with nothing on Google Correlate. But I did find this..
http://www.poopreport.com/Doctor/Knowledgebase/beer_and_poop.html
Sick, sick, sick!
.
Heehee
Skepchick number one correlation was star celeb at 0.9286. Well, that’s really no surprise, the Skepchicks are star celebs!! Oddly, when I added an “s” at the end and tried Skepchicks, it correlated best with doggy steps. Hmm..not sure how to interpret that.
Correlated with penile:
0.8828 wallpaper designs
0.8811 ford explorer problems
0.8808 images for websites
0.8777 samples free
0.8774 vortec engine
0.8708 beads wholesale
0.8692 find people by phone number
0.8658 fax number lookup
0.8632 belly pictures
0.8596 repair estimates