Home - FluTracking.net
Join flutracking.net now!

Join now!

Recommend to a friend or workplace

 
  • Home
  • Contact
  • Join
  • About Flutracking
 

Comparing Flutracking to Google Flu search terms

We were very excited to learn about Google's entry into influenza surveillance since it is similar to Flutracking in that it is an internet based syndromal surveillance system.  Google Flu Trends - uses search term frequencies for symptoms of influenza and other influenza-associated searches to detect influenza activity. For more information on how Google Flu Trends works see the Google.org site here or the New York Times article for a more lay perspective.  We thought it would be interesting to see what happened if we used Google Trends for Australian influenza related search terms and then compared the results to our Flutracking.net and influenza laboratory notifications.

Our methods

We used Google Trends restricted to Australian searches for 2007 and 2008 to assess the frequency of searches for symptoms associated with influenza such as "cough", "fever", "sore throat", "muscle aches", "headache", and on terms such as "flu" and "influenza" and looked for seasonality of searches associated with the usual influenza season in temperate Australia.  The data publicly available from Google Trends is aggregated and individual search data is not available.

Looking first at the 2004 to 2008 data, we see some of the more common terms displayed below in Figure 1 by relative frequency.



Figure 1.  Relative frequency of searches in Google Trends 2004 - 2008 Australia for:
Cough = blue, Fever = red, Sore throat = yellow, Flu = green, Influenza = dark blue

Note: the "News reference volume" on the bottom axis is based on global news coverage in Google news and not restricted to Australia so it may not be related or impact upon Australian Google searches.  "flu" and "influenza" had multiple peaks in late 2005 and early 2006 most likely due to the international and Australian media coverage associated with avian influenza and pandemic influenza - these peaks track the peaks for the search terms"avian flu" and "pandemic flu".

2008

The Google Trends data for 2008 is shown in Figure 2 demonstrating that no one term showed a substantial spike during the typical influenza season of July to September.


Figure 2.  Relative frequency of searches in Google Trends 2008 Australia for:
Cough = blue, Fever = red, Sore throat = yellow, Flu = green, Influenza = dark blue



We used aggregated search terms for 2008 from Australia as follows:

  • "fever"
  • "cough"
  • "sore throat"
  • "flu"
The term "headache" did not have any significant seasonal fluctuation so it was not deemed to be useful.  The terms "muscle aches" and "influenza" were too infrequently used to be useful. To remove the potential bias of individual terms driving the algorithm too greatly we performed a very simple averaging of the search terms so that they all approximated the weekly relative frequency of the "cough" term by dividing the "fever" term frequency by 3 and multiplying the "sore throat" by 3.  This is is an extremely simplistic methodology and falls far short of the modeling performed by Google in collaboration with CDC which explored 50 million candidate search queries and identified 53 high scoring search queries related to influenza-like illness  to develop the best model for predicting influenza.  However, given these limitations we were surprised with the results. It is important to note that we used "Google Trends" and not "Google Flu Trends" data which is not available to us.The axes for the Google data are not real and are adjusted to place the Google data close to the other data sets for ease of  comparison between data series.

2007

As demonstrated in Figure 3 below, 2007 Google Trends search terms contrasted significantly with 2008 in that in 2007 there was an obvious seasonal peak for the search term "flu" and for "influenza" but not for any of the other search terms we explored. It was determined that "flu" would be the best term to use for comparison with other data sources for 2007.


Figure 3. Relative frequency of searches in Google Trends 2008 Australia for:
Cough = blue
, Fever = red, Sore throat = yellow, Flu = green, Influenza = dark blue

Note that the Google data has been adjusted to a scale that complements the scale on the vertical axis on the left in all graphs in the results section so it can be better compared with the other data series.  The data supplied by Google Trends is always a calculated figure relative to other search terms and not an actual count of searches.


Our results

2008 Results


We found the Flutracking and Google Trend search terms tracked very closely as can be seen in Figure 4 below.
FlutrackvsGoogle

Figure 4.

The Google Trends algorithm begins to rise in the the first week of August at about the same time that both influenza unvaccinated and vaccinated Flutracking participants cough and fever rates are on the rise.  The greatest divergence between unvaccinated and vaccinated Flutracking participants is in late August and early September - we believe this divergence indicates the impact of influenza activity.  Google Trends peaks across the same 3 week period.  One limitation to be noted is we are as yet unable to determine if the Google Trends data is based on the week ending or week beginning so the Google curve may shift one week backward.

Now comparing the Google Trends influenza associated search terms to national laboratory notifications for influenza we find a similar correlation in Figure 5, again suggesting that there is merit in further analysis of the Google Trends data for surveillance of influenza in Australia. 

 

Figure 5.

2007 Results

In 2007 we can see that there is again a correlation between the Flutracking data and the laboratory notifications and the Google "flu" search term in Figure 6.  While it may appear from this graph that the Flutracking data signals many weeks ahead of the laboratory notifications and the Google "flu" search term, this may be because the Flutracking numbers were relatively small in early 2007 and restricted mostly to the Hunter New England area of NSW and may not be comparable to national laboratory or national Google search data. 



Conclusion

While acknowledging  the limitations of this simplistic analysis, Google Trends offers great promise for surveillance of influenza and other public health conditions even with publicly available aggregated data.  We will apply to Google for access to less aggregated data to explore its surveillance potential in real time in 2009 with comparisons to Flutracking data.

Last updated: Nov 21 2008.



Hunter New England NSW Health HMRI University of Newcastle
Copyright FluTracking.net | Site concept by Design Niche