A recent project within the Quality, Safety and Outcomes Policy Research Unit (QSO PRU) programme of work explored the feasibility of using Twitter data to understand public views about care delivery. In this blog we share some key learnings from this exploratory approach, of which there were two parts:
Twitter is being increasingly used as a data source in research as it allows quick and relatively easy access to people’s views. With a reported 0.5 billion messages shared each day, it’s easy to see the interest: the platform offers an enormous and highly public database that could potentially be investigated to explore a wide range of topics. Views shared on social media can complement traditional qualitative methods and may in some cases capture the views of those who are more critical about a given topic who are possibly underrepresented using other methods
Using Twitter data provided an opportunity to explore people’s views of changes to care delivery as a result of Covid-19. ‘Scraping’ Twitter data using specific search terms allowed us to access data about people’s views of care at a time when it was difficult to undertake primary research.
Despite carefully planned search terms, the extracted Twitter data was not always relevant to our research. Our search terms generated almost 50,000 tweets but the majority of these were not directly related to people’s views on changes to care delivery during Covid-19. For example, some tweets were about Covid-19 but did not refer to care delivery (such as tweets about Covid-19 symptoms or about the virus in general). The difficulty in defining the search terms to extract relevant tweets may have been exacerbated by the nature of the research topic; Covid-19 generated considerable social media activity due to the unprecedented circumstances. Using Twitter to extract data in other subject areas, where the search terms can be more defined, may generate fewer irrelevant tweets.
Given the large volumes of tweets that our search terms generated, a manual data cleaning process (reviewing each tweet to determine relevance) would have been extremely time consuming. Therefore, we used both manual and automated approaches to identify comments that were irrelevant. The automated approach used a Natural Language Processing Model: this was a necessity due to the volume of tweets, although it was likely some relevant comments were missed.
Following the data cleaning, just over 400 tweets were identified as relevant for inclusion in our analysis. There were two main themes that arose within these tweets; Access to care (n=340) and impact of delayed or changed care (n=103). Within the overarching theme of ‘access to care’ we identified three main subthemes, which included access to care in terms of cancellations and delays (n=179); Covid-19 treatment and care being prioritised over other conditions or illnesses (n=120); and changes in access to care or services (n=67).
Another challenge we experienced was gaining detailed insight into people’s views on the impact Covid-19 had on care delivery. The limit in the number of characters allowed per tweet could make it difficult for people to express in depth how the changes impacted them, which may be more complex. Furthermore, some replies to tweets were included without the original tweet: these were difficult to code due to lack of context and in some cases were therefore excluded from the analysis.
There are some limitations to consider when using Twitter data for research purposes.
Findings cannot be generalised to reflect public opinion, as they are not only limited to Twitter users but to those who engage with the platform to share their views. There is evidence that British Twitter users are not representative of the general population: they are generally younger, wealthier, and better educated (Blank, 2017). This means we were unable to examine the impact of the changes to care to those with limited digital literacy and/or lack of access to technology, as such groups are less likely to use social media platforms such as Twitter. Although the number of internet nonusers has been declining over time, there is evidence of a “digital divide” as internet use and digital skills vary for different groups.
Although we have successfully used Twitter data to explore differences in views over time, for this research question, we were unable to explore any differences in the views or experiences of Twitter users by demographic factors, such as age, gender and ethnicity, as such information was not available. Due to the relatively low volume of relevant tweets, we were also unable to examine any variation in views by geographical region.
Our search terms sought to explore any changes in care delivery for people living with long-term conditions, but there was an insufficient number of tweets referring to particular long-term conditions to allow for analysis by health condition. As long-term conditions are more prevalent in older and in more deprived groups – the same groups experiencing greater digital exclusion – it is likely that those with long term conditions were underrepresented in the tweets analysed. However, it is also likely that these groups are difficult to identify in the data as people with a long-term condition may not always specify their condition in a tweet.
We found that using Twitter as a source of data in research can be extremely useful. It allowed us to explore people’s views on how Covid-19 impacted their experiences of care when it would have been difficult to conduct primary research. However, due to the limitations of using Twitter as a data source we would recommend that this approach is used alongside additional research methods to complement the findings. For example, conducting focus groups or in-depth interviews with patients could provide additional insight into their experiences of changes to care delivery during Covid-19.
Our top tips for using Twitter as a data source are to:
This research is funded by the Institute for Health Research Policy Research Programme, conducted through the Quality, Safety and Outcomes Policy Research Unit, PR-PRU-1217-20702. The views expressed do not necessarily represent those of the NIHR or the Department of Health and Social Care.
items marked with * are required