Comparing FluTracking to Google Flu Search
We were very excited to learn about Google's entry into influenza surveillance. It is similar to FluTracking in that it is an internet based syndromal surveillance system.
Google Flu Trends uses search term frequencies for symptoms of influenza and other influenza-associated searches to detect influenza activity. For more information on how Google Flu Trends works see the Google explanation or the New York Times article for a more lay perspective.
We thought it would be interesting to see what happened if we used Google Trends for Australian influenza related search terms and then compared the results to our Flutracking.net and influenza laboratory notifications.
Our Methods
We used Google Trends restricted to Australian searches for 2007 and 2008 to assess the frequency of searches for symptoms associated with influenza such as "cough", "fever", "sore throat", "muscle aches", "headache", and on terms such as "flu" and "influenza". We looked for seasonality of searches associated with the usual influenza season in temperate Australia. The data publicly available from Google Trends is aggregated and individual search data is not available.
Looking first at the 2004 to 2008 data, we see some of the more common terms displayed below by relative frequency.

Relative frequency of searches in Google Trends 2004-2008 Australia for:
Cough=blue Fever=red Sore throat=yellow Flu=green Influenza=dark blue
Note that the "News reference volume" on the bottom axis is based on global news coverage in Google news and not restricted to Australia so it may not be related or impact upon Australian Google searches.
"flu" and "influenza" had multiple peaks in late 2005 and early 2006 most likely due to the international and Australian media coverage associated with avian influenza and pandemic influenza - these peaks track the peaks for the search terms "avian flu" and "pandemic flu".
For 2008
We used aggregated search terms for 2008 from Australia as follows:
- "fever"
- "cough"
- "sore throat"
- "flu"
The term "headache" did not have any significant seasonal fluctuation so it was not deemed to be useful. The terms "muscle aches" and "influenza" were too infrequently used to be useful.
To remove the potential bias of individual terms driving the algorithm too greatly we performed a very simple averaging of the search terms so that they all approximated the weekly relative frequency of the "cough" term by dividing the "fever" term frequency by 3 and multiplying the "sore throat" by 3
This is is an extremely simplistic methodology and falls far short of the modeling performed by Google in collaboration with CDC which explored 50 million candidate search queries and identified 53 high scoring search queries related to influenza-like illness to develop the best model for predicting influenza.
However, given these limitations we were surprised with the results. It is important to note that we used "Google Trends" and not "Google Flu Trends" data which is not available to us. The axes for the Google data are not real and are adjusted to place the Google data close to the other data sets for ease of comparison between data series.
The Google Trends data for 2008 is shown below, demonstrating that no one term showed a substantial spike during the typical influenza season of July to September.

Relative frequency of searches in Google Trends 2008 Australia for:
Cough=blue Fever=red Sore throat=yellow Flu=green Influenza=dark blue
For 2007
As demonstrated below, 2007 Google Trends search terms contrasted significantly with 2008 in that in 2007 there was an obvious seasonal peak for the search term "flu" and for "influenza" but not for any of the other search terms we explored. It was determined that "flu" would be the best term to use for comparison with other data sources for 2007.

Relative frequency of searches in Google Trends 2007 Australia for:
Cough=blue Fever=red Sore throat=yellow Flu=green Influenza=dark blue
Note that the Google data has been adjusted to a scale that complements the scale on the vertical axis on the left in all graphs in the results section so it can be better compared with the other data series. The data supplied by Google Trends is always a calculated figure relative to other search terms and not an actual count of searches.
Our Results
We then proceeded to compare the Google Trends search terms results with the FluTracking results.
2008 Results
We found the Flutracking and Google Trend search terms tracked very closely as can be seen below.
![]()
The Google Trends algorithm begins to rise in the the first week of August at about the same time that both influenza unvaccinated and vaccinated FluTracking participants cough and fever rates are on the rise.
The greatest divergence between unvaccinated and vaccinated FluTracking participants is in late August and early September - we believe this divergence indicates the impact of influenza activity. Google Trends peaks across the same 3 week period.
One limitation to be noted is we are as yet unable to determine if the Google Trends data is based on the week ending or week beginning so the Google curve may shift one week backward.
Now comparing the Google Trends influenza associated search terms to national laboratory notifications for influenza, we find a similar correlation below, again suggesting that there is merit in further analysis of the Google Trends data for surveillance of influenza in Australia.

2007 Results
In 2007 we can see that there is again a correlation between the FluTracking data, the laboratory notifications and the Google "flu" search terms.

While it may appear from this graph that the FluTracking data signals many weeks ahead of the laboratory notifications and the Google "flu" search term, this may be because the FluTracking numbers were relatively small in early 2007 and restricted mostly to the Hunter New England area of NSW. As a result, they may not be comparable to national laboratory or national Google search data.
Conclusion
While acknowledging the limitations of this simplistic analysis, Google Trends offers great promise for surveillance of influenza and other public health conditions even with publicly available aggregated data.
We will apply to Google for access to less aggregated data to explore its surveillance potential in real time in 2009 with comparisons to FluTracking data.
Last updated: Nov 21 2008.




