Did Google Searches for HIV Differ Between Before and After the World Cup?

Thuy Luu, PharmD, BCPS, MPH Candidate

Introduction

                Human immunodeficiency virus (HIV) is spread by bodily fluids which attack the immune system and over time can diminish the body’s ability to fight off infection and disease [1].  This virus has no cure and can affect people of any social and economic status.  The health community took notice of HIV and its ramifications on the immune system in the early 1980’s with case reports of the rare diseases Pneumocystis carinii pneumonia and Kaposi Sarcoma in previously healthy gay men [1]. Soon, what began as a disease affecting the gay community began to spread to the Indiana hemophiliac teenager Ryan White, to the professional NBA star Earvin “Magic” Johnson, and most recently to actor Charlie Sheen [1].    

                HIV can have negative connotations particularly since it does not have a cure and its reported origins in the gay community. However, when HIV affects public figures, such as Charlie Sheen, it can bring attention and interest in the disease to a greater audience.  In an information technology savvy world, such attention can manifest itself as hashtag phrases on social media or as Google searches for “HIV”.

                Below is an image of the Google searches for “HIV” worldwide, from the inception of Google Trends data starting at April 2004 to February 2017 [2]. The two highest peaks below correlate to reported cases of HIV in the adult film industry in April 2004 and to Charlie Sheen’s announcement of being HIV positive in November 2015 [2].

               It would be of interest if an event such as the World Cup, would have any effect on Google searches for the term “HIV”.  The World Cup is a tournament style soccer competition hosted by a selected country occurring every four years with men’s national teams qualifying to compete and is overseen by soccer’s global governing body, Fédération Internationale de Football Association (FIFA) [3] drawing global spectatorship. With this competition drawing immense international interest and in an effort to explore public health corollaries associated with the World Cup, this paper seeks to find if there is a difference between Google searches for the term “HIV” before and after the start of the World Cup competition.

Method

                Google Trends is a search engine for terms searched in Google [4]. It provides the relative volume of the search term of interest relative to the volume of when that search term was most searched during a given time frame [4]. Google Trends was used to determine the relative volume of interest in the search term “HIV” in the countries hosting the last three World Cups, which occurred Germany in 2006, South Africa in 2010, and Brazil in 2014 [5]. 

                Tableau is a visualization software [6] which was used to graphically display the Google searches for the term “HIV” in the country hosting the World Cup relative to when to World Cup occurred.

                R is a statistical software [7] which was used to quantitatively determine if there was a different between Google Searches for the term “HIV” before and after the start of the World Cup.

Results and Discussion

                Figure 2 is the Tableau visual representation of Google searches for HIV relative to when the World Cup occurred in Germany, South Africa, and Brazil [2].  The line graph represents the Google search and the bar graph represents the start of the World Cup.  The highest peak for each line graph represents when the search term was most searched from February 2004 to February 2017.  The other peaks are relative to the highest peak, for example if there was 50% as much searches for “HIV” in a given month, then that peak would be half a tall as the tallest peak.

                 Google searches for “HIV in Brazil appeared to slowly decline up until the World Cup where there appeared to be an increase.  From February 2004 to February of 2017, “HIV” was searched most often on April 2015 which was associated with Big Brother Brazil season 15 [2].  It does appear that there is difference in Google searches for “HIV” before and after the World Cup. 

                Germany appears to have a steady search of the search term before and after the World Cup with April 2009 being when the term was searched the most.  In April 2009 a television actress Nadja Benaissa was associated with an increase in Google searches for “HIV” [2]. There does not appear to be a difference in Google “HIV searches before and after the World Cup. 

                South Africa Google searches for “HIV” appear more labile where it is difficult to tell if there was a difference in searches for the search term before and after the World Cup.  The search term was searched most often in September 2004 where Google Trend provided only two related searched topics which were “HIV – virus” and “AIDS – illness” [2].  This is different from Brazil and Germany where the peak was related to a television show and public figure.

                R was used to determine if there was statistically significant difference between Google searches before and after the World Cup in each respective country.  A Two sample t-test was used to determine if a statistical difference existed where the assumptions needed to meet parametric testing include independent observations, normal distribution of the data and equal variance for each of the two groups.  If these assumptions were not met, then the Wilcoxon Rank sum test, a nonparametric test, would be used instead.  R was also used to test if the data had normal distribution with histograms, scatterplots and the Shapiro Wilk test as well to test for equal variances with the variance test. It is assumed that the data fulfills the assumption of independent observations. 

                Figures 3, 4, and 5 are visually testing the assumption of normal distribution while the Shapiro Wilk test tests for normal distribution of data statistically.  Figure 3 is data from Brazil where it appears to have normally distributed data from the bell shaped histogram of the BEFORE World Cup data.  The data of the BEFORE World Cup scatterplot is mostly tight around the line of normal distribution. However, the AFTER World Cup data doesn’t appear to be of normal distribution as there is a missing bar in the histogram and this is also reinforced in the scatterplot where the upper tail is trailing away from the line of normal distribution.  The Shapiro-Wilk test echoed these findings statistically showing the BEFORE World Cup data is of normal distribution (p value = 0.212, W = 0.954) where the AFTER World Cup data is not of normal distribution (p value 1.029e-7, W = 0.612). The Google trend data from Brazil is not normally distributed.

                The variance for each of the two groups are not equal (p value 6.752e-5, F = 0.210, CI 0.099-0.441). Please see table 1 for complete Shapiro-Wilk and variance statistics. 

                Nonparametric testing will be used to determine if there was a difference in Google searches for “HIV” before and after the World Cup in Brazil since the data did not meet the assumptions of normally distributed data and equal variances.

                Figure 4 is data from Germany where it appears to have normally distributed data from the bell shaped histogram of the BEFORE and AFTER World Cup data, however there is a bar missing from each of the histograms.  The data of the BEFORE and AFTER World Cup scatterplot is mostly tight around the line of normal distribution except for the trailing tails on the BEFORE scatterplot and the trailing upper tail on the AFTER scatterplot.  The Shapiro-Wilk test showed the BEFORE and AFTER data did have a normal distribution with p values of 0.166 (W=0.948) and 0.592 (W=0.971), respectively.  The Google Trend data from Germany is normally distributed.

                The variance for each of the two groups are equal (p value 0.150, F = 1.738, CI 0.818-3.702). Parametric testing will be used to determine if there was a difference in Google searches for “HIV” before and after the World Cup in Germany since the data did meet the assumptions of normally distributed data and equal variances.

                Figure 5 is data from South Africa where it appears to have normally distributed data from the bell shaped histogram of the BEFORE World Cup data.  The data of the BEFORE World Cup scatterplot is mostly tight around the line of normal distribution with the upper tail trailing slightly. The AFTER World Cup data doesn’t appear to be of normal distribution as the bell shaped histogram loses its shape and this is seen in the scatterplot where the tails trail from the line of normal distribution particularly at the upper tail.  The Shapiro-Wilk had different findings showing both the BEFORE and AFTER World Cup data are of normal distributions with p values of 0.840 (W=0.990) and 0.163 (W=0.976), respectively.  The Google trend data from South Africa will be considered to be of normal distribution.

                The variance for each of the two groups for South Africa are not equal (p value 1.219e-6, F = 3.146, CI 2.000-4.949). Nonparametric testing will be used to determine if there was a difference in Google searches for “HIV” before and after the World Cup in Brazil since the data failed meet the assumption of equal variances. 

                In Brazil, it was found that the median Google searches for HIV was not the same BEFORE the World Cup as the median Google searches for HIV AFTER the World Cup (p value 2.089e-9,, W=45). In Germany, the mean Google searches for HIV was not the same BEFORE the World Cup as the median Google searches for HIV AFTER the World Cup (p value 2.015e-6, t=5.301, df=56, CI 0.816-3.703). Lastly, in South Africa, the median Google searches for HIV was not the same BEFORE the World Cup as the median Google searches for HIV AFTER the World Cup (p value 3.148e-7,, W=4379.5).  There was noted to be a statistically significant difference in Google searches for “HIV” before and after the start of the World Cup in Brazil, Germany and South Africa. 

Conclusion

                Looking at Brazil in Figure 2, it appears like there was a difference in Google searches for “HIV” before the World Cup compared to afterwards.  This was confirmed through the Wilcoxon Rank sum test showing a statistically significant difference.  Germany looked more stable and it was more difficult to visually determine if there was a difference in Google searches.  A difference existed as determined by R using the two sample t-test.  Unlike Germany, South Africa had more labile Google searches for “HIV” and it was hard to tell if there was a difference before and after the World Cup.  The Wilcoxon Rank sum test reports a difference in Google Searches for “HIV” before the World Cup compared to after the World Cup. 

                Additionally, similar to Charlie Sheen’s announcement of being positive for HIV causing a worldwide increase in Google searches for “HIV”, the times most searched for “HIV” in Brazil and Germany were also tied to events that could reach a greater audience in the form of a television show and a television actress.  South Africa could not be correlated in the same manner however it may possibly be secondary to limited data which is beyond the scope of this paper.  

                A weakness to this research include having to limit the number data points included since there needed to be the same number of data points before and after the World Cup.  For example, Germany hosted the World Cup beginning in June 2006, approximately 29 months after the start of Google Trends reported data, so only the 29 month after the World Cup in Germany were included in determining if there was a difference in Google searched for “HIV”.  This data limitation could have skewed the data in Brazil as there is a large increased in Google Searches for “HIV” after the start of the World Cup

                  Although beyond the scope of this paper, additional research in determining the direction of the difference, whether there as an increase or decrease in Google searches for “HIV” before and after the World Cup, would add more benefit to this research. 

                 This research was able to determine that there was a statistical difference in Google searches for “HIV” before and after the World Cups in each respective country.  Visually it appeared that there was likely an increase in Google searches for “HIV” in Brazil, however it was more difficult to determine this change in Germany, because it was so stable, and in South Africa, because it was so labile. Such visual difference did not compare to that seen when Charlie Sheen announced his HIV status.

References

 

[1] “What Is HIV/AIDS?” [Online]. Available: https://www.aids.gov/hiv-aids-basics/hiv-aids-101/what-is-hiv-aids/. [Accessed: 02-May-2017].

[2] “Google Trends,” Google Trends. [Online]. Available: /trends/explore. [Accessed: 03-May-2017].

[3] “FIFA World Cup,” Wikipedia. 28-Mar-2017.

[4] “Google Trends,” Wikipedia. 19-Mar-2017.

[5] "Data Source: Google Trends (https://trends.google.com/trends)"

[6] “Tableau Software,” Tableau Software. [Online]. Available: https://www.tableau.com/. [Accessed: 21-Feb-2017].

[7] “R: The R Project for Statistical Computing.” [Online]. Available: https://www.r-project.org/. [Accessed: 21-Feb-2017].