New Study: Twitter Predicted COVID-19 Outbreaks

This post contains a video, which you can also view here. To support more videos like this, head to!

I often talk about the dangers of social media, particularly how easy it is for misinformation to spread on outlets like Twitter and Facebook, and how easy it is to push someone into a bigoted echo chamber and then radicalize them in places like Reddit and YouTube.

I don’t often get a chance to talk about how awesome social media is — at least anymore. In the beforetimes, when people could gather in groups inside without fear of catching a deadly disease, and when skeptic organizations didn’t think I was the antichrist for politely suggesting they treat women like humans, I used to give a talk about how social media isn’t necessarily ALL bad. For instance, there’s research that shows sites can build algorithms that are able to detect and remove bots that try to spread what is now known as “fake news.” We have a term for it because none of the social media sites decided to act on that research. The research from 2010. Ten years ago.

Anyway, that research led to the creation of Truthy and other programs that could root out misinformation. Those systems worked so well in part because social media provided researchers with an absolutely bonkers amount of data. Like, an unending stream of information that also allowed them to do cool studies like this one from last year, that found that in the aftermath of a 7.1 earthquake hitting Mexico, people used social media to share important information and receive help. Through analyzing millions of posts on Twitter, Facebook, and other outlets, the researchers found that users actually developed their own techniques to sort false information from true. They also found that a subset of responsible users highlighted information that was both true and very important, and then took the time to spread that information from one social network to another.

Anyway, that massive amount of data being freely shared has led to a lot of cool studies in recent years, and not all of them about how misinformation spreads. This week I learned about one that honestly filled me with a weird mix of awe and horror: Italian researchers discovered that social media posts in December of 2019 through January of 2020 contained a warning about the COVID-19 pandemic before it exploded in Europe. The new paper, published in Nature’s journal Scientific Reports, shows that before the World Health Organization officially recognized COVID-19 as a serious threat, people across seven European countries, speaking seven different languages, were posting about pneumonia at a rate far above and beyond such posts in the previous five years.

The researchers chose to look for mentions of “pneumonia” because that is the most dangerous result of a COVID-19 infection (and in fact the World Health Organization originally labeled COVID-19 as “cases of pneumonia of unknown etiology”) and because the flu season from last year was particularly mild, meaning that cases of pneumonia (and therefore people discussing it on Twitter) should be lower compared to previous years. Also, they took care to filter out any mentions that referred to the growing epidemic in China or other news reports.

After establishing that every country (except, strangely, Germany) saw a significant increase in mentions of pneumonia well before organizations and governments acknowledged COVID’s presence in Europe, the researchers then dug into the data more deeply, finding the location data for more than 13,000 users. During December and January, the preponderance of users discussing pneumonia on social media were residing in areas like Lombardy, Madrid, and Paris — areas that were about to be the earliest places to experience the worst COVID-19 outbreaks in the world.

To check if this was all just a fluke, the researchers then went back and did it all again but with “dry cough” instead of “pneumonia.” They got the exact same results: an overall large increase in discussion of the symptom, and elevated discussion in areas that were just about to be COVID hotspots.

Finally, the researchers did it all again with the word “coronavirus,” sort of as a control. “Coronavirus” wouldn’t necessarily correlate to someone experiencing personal symptoms of unknown origin, but would most likely be directly related to the news of the soon-to-be-pandemic. Sure enough, they found that discussion of “coronavirus” was more uniformly spread across the various countries and didn’t correlate to hotspots.

I love this study because it’s one of those things that in retrospect I’m like “oh, of course!” And yet it still feels revealing, like pulling the sheet off a new work of art. Think about it: humanity had the ability to see that something was about to go very wrong but we didn’t know where to look. We just went about our business until it was too late. It’s like a good film or play or book with dramatic irony — that’s when the audience knows something important that the characters in the work don’t, like when we watch that idiot Romeo kill himself even though Juliet’s not really dead. Um, spoiler alert if you’ve been distracted since the 17th century.

The researchers point out the almost hopelessness of this in their paper: “Since the detection and geo-localization of potential viral outbreaks are based on suitable keywords clearly linked to well-known symptoms, our approach cannot be directly used for the forecasting of otherwise unknown diseases. Indeed the usage of the word “pneumonia” on Twitter could have served as a useful proper predictor only before pneumonia was publicly linked to COVID-19, and not at a time when news outlets and the public in general were already discussing it widely.”


They point out that now that we know pneumonia is a symptom, once the news dies down about it then an algorithm might be able to catch a recurrence of COVID in the future, or some other disease in which we already know the symptoms.

That said, I’m kind of surprised they’re not a little more optimistic about the possibility of this research leading to early detection of future unknown diseases. We may not know the symptoms of the next big pandemic or other public health crisis, but with enough computing power and a well-written algorithm we could certainly set up a system that detects unusual amounts of activity on any of a number of possible symptoms. For instance, we could have been monitoring “pneumonia” because even if COVID-19 never existed, an increase in posts about pneumonia could clue us in to possible influenza outbreaks. So why not have a system in place that monitors for things like “pneumonia,” or, I don’t know, “numb fingers” or “headaches” or any number of other words and phrases. Filter out news and memes and you may be left with an indicator when something has gone wrong somewhere. For instance, maybe someplace has groundwater that’s slowly being infected with fracking chemicals. The right algorithm could highlight a localized group of similar symptoms being talked about across social media sources.

I know, it’s all a bit pie in the sky — we’d need a government or large medical organization to pour the money and time into building something like that, and for social media networks to share data, and someone to protect the privacy of social media users whose data is being scraped, and a million other logistical concerns that I’m not even going to go into. But it’s certainly feasible in a near-future scifi utopia, and hey, I’ll take all the positive news I can get these days.

Rebecca Watson

Rebecca is a writer, speaker, YouTube personality, and unrepentant science nerd. In addition to founding and continuing to run Skepchick, she hosts Quiz-o-Tron, a monthly science-themed quiz show and podcast that pits comedians against nerds. There is an asteroid named in her honor. Twitter @rebeccawatson Mastodon Instagram @actuallyrebeccawatson TikTok @actuallyrebeccawatson YouTube @rebeccawatson BlueSky

Related Articles

One Comment

  1. Monitoring social media for reports of blue triangles with spots in each corner could have detected an outbreak of Scott Sigler’s alien space infection nanobots before we lost Chicago.

    The CDC should do this routinely for dozens of symptoms, searching for common as well as medical terms. For example,
    most people know “pneumonia”, but don’t know the medical terms for many less common symptoms such as “anosmia” for “loss of sense of smell”. The study included searching in the seven most common European languages. A global monitoring system should include similar phrases in the most common languages world-wide.

    The CDC could produce weekly, monthly or annual reports of each of a great many symptoms, which would be of no interest to anyone until either the next pandemic, or until right before the next pandemic, when Republicans cut off the minimal funding required right before the next pandemic.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button