Can Natural Language Processing Estimate Patient Preferences?

Project Summary

When researchers want to understand patient preferences, we often use interviews and focus groups. But, these have some limitations. For example, they often only involve a small number of participants, and can take a lot of work for recruitment and analysis. It is also difficult to explore changes in patient preferences over time.

“Social media listening” is a new method of gathering information about patient preferences that overcomes some of these limitations, by drawing from patient discussions held over social media, such as Reddit forums. However, social media listening can still require a lot of work to analyse the data, since there is usually a lot of text available for analysis.

To try to address this problem we developed a “natural language processing” tool (EXPECT-NLP (EXploration of Patient Experiences in Collected Texts using Natural Language Processing). This tool can automatically extract key themes from discussions, group them by sentiment, relatedness, and topic, and allows users to explore the underlying body of text.

Project Findings

We used this tool do a “preference exploration”—specifically, we analysed Reddit discussions about different drug therapies for multiple sclerosis. The results were similar to results researchers found before, which suggests that our natural language processing tool works for this purpose.

To use the tool on these Reddit discussions, we had to build a lexicon (word bank) of relevant terms using the tool and some human curation. This was a practical step, because patients using social media express opinions about a limited number of concepts. In our case, the initial list of “aspect-opinion pairs” extracted using our tool was around 1000, even as more text was added. This suggests our tool is usable at larger scales.

We also found that these curated lexicons could be used for other areas of health, suggesting that this tool can be versatile. Specifically, we used the lexicon curated from MS, rheumatoid arthritis, and cancer forums to instead analyse data on COVID-19 forms successfully.

Finally, one limitation of the tool is that it will likely be much more challenging to use to understand patient “trade-offs” (e.g., whether they would prefer a less effective but cheaper treatment vs a more effective but more expensive treatment).

Overall, our hope is that this will allow potential users to easily and quickly use the vast amount of social media data available to generate insights and hypotheses on patient experiences and preferences, and this will inform the development of new medical products, health services, and policies.


Oct 2020: SMDM (Society for Medical Decision Making) 42nd Annual North American Meeting: “Relationships in Medical Decision Making.” Presentation Abstract

This project is part of the Health Economics and Simulation Modelling Cluster.


Larry Lynd, PI
Nick Dragojlovic
Raymond Ng
Giuseppe Carenini
David Johnson
Nicola Kopac
Marilyn Lenzen
Sarah le Huray
Yifu (Charles) Chen
Samantha Pollard
Mark Harrison
Dean Regier
Kennedy Borle
Amy George

Methods Matters Webinar

If you’re in a rush, check out these snack-size highlights:

Duration: 1:22

Duration: 1:06

Duration: 2:12

Watch the full webinar:

Duration: 57:50