Harmony (Wellcome Data Prize in Mental Health entry)

Harmony (Wellcome Data Prize in Mental Health entry)

Wellcome Data Prizes

Four health challenges

£500,000 to be shared between three teams

Harmony is an open source NLP-driven data harmonisation tool developed for the Wellcome Data Prize.

What does Harmony do?

  • Psychologists and social scientists often have to match items in different questionnaires, such as “I often feel anxious” and “Feeling nervous, anxious or afraid”.
  • This is called harmonisation.
  • Harmonisation is a time consuming and subjective process.
  • Going through long PDFs of questionnaires and putting the questions into Excel is no fun.
  • Enter Harmony, a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items, even in different languages.

We developed Harmony using Natural Language Processing to allow researchers to conduct meta-analyses of mental health studies in collaboration with the University of Ulster, University College London, and the Universidade Federal de Santa Maria in Brazil, for the Wellcome Trust’s Data Prize in Mental Health. You can read more on the project website.

The Wellcome Trust has established the Wellcome Data Prizes, which are targeted at multidisciplinary teams who are using existing data to answer important research questions. The prizes are focused on health challenges in four areas: climate and health, infectious disease, mental health and discovery research. In our case, we focused on the problem of data harmonisation using NLP.

Fast Data Science participated in the Wellcome Trust Data Prize in Mental Health in a team led by Dr. Eoin McElroy at the University of Ulster to develop a natural language processing data harmonisation tool called Harmony. You can read more about Harmony on Ulster’s website.

Researchers in psychology and the social sciences often have to conduct meta-analyses of research across long time periods and cultures, to identify trends. For example, our team was investigating the effect of social isolation and loneliness on mental wellbeing over time, focusing on two societies (the UK and Brazil).

Primary care psychologists often use the Generalized Anxiety Disorder 7 (GAD-7) questionnaire as a tool for quantifying anxiety, but other questionnaires used in the past include the Beck Anxiety Inventory. The Beck questionnaire focuses more on physical symptoms whereas the GAD-7 includes more questions about psychological state. It can be hard to compare datasets using different questionnaires.

Our tool, Harmony, allows researchers to upload a set of mental health questionnaires in PDF or Excel format, such as the GAD-7 anxiety questionnaire. It identifies which questions among questionnaires are identical, similar in meaning, or antonyms of each other, and generates a network graph. This allows researchers to harmonise datasets.

Interested in open source?

Join open source project

Interested in FOSS? The Harmony project is an open source project looking for contributors. You can join Harmony’s Discord server or visit harmonydata.ac.uk/community/ for more information.

Uniquely, Harmony relies on a transformer neural network (large language model/LLM) architecture and is not dependent on a dictionary approach or word list. This is a departure from previous approaches used in social sciences. Our approach allows for multilingual support (English and Portuguese are our languages of focus, but we recently extended support to over eight languages), and Harmony is able to correctly map the GAD-7 used in the UK to the GAD-7 used in Brazil, despite the Brazilian questionnaire being in Brazilian Portuguese.

Using Harmony, our team was able to conduct groundbreaking research into social isolation and anxiety with NLP supplying a quantitative measure of the equivalence of the different mental health datasets.

GAD-7 anxiety questionnaire

We also released our tool on Github as a Python library and R library, as well as a REST API and Docker container.

Making sustainable software

Open source project

Read about our consultation with the Software Sustainability Institute on making the Harmony project more sustainable after the end of Wellcome’s funding

You can read how we validated Harmony’s performance in predicting real-world data in this pre-print, awaiting publication:

The Harmony team

Harmony team

The team working on Harmony was made up of:

Dr. Eoin McElroy, Lecturer in the School of Psychology at the University of Ulster, Northern Ireland

Dr. Bettina Moltrecht, Research Fellow in Population Health and Quantitative Social Science at University College London

Prof. George Ploubidis, Professor of Population Health and Statistics at the Social Research Institute at University College London

Dr. Mauricio Scopel Hoffmann, Associate Professor in the Department of Neuropsychiatry at Universidade Federal de Santa Maria, Brazil

Thomas Wood - data scientist and natural language processing expert at Fast Data Science Ltd

How to cite Harmony?

You can cite our validation paper:

McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry 24, 530 (2024), https://doi.org/10.1186/s12888-024-05954-2

A BibTeX entry for LaTeX users is

@article{mcelroy2024using,
  title={Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data},
  author={McElroy, Eoin and Wood, Thomas and Bond, Raymond and Mulvenna, Maurice and Shevlin, Mark and Ploubidis, George B and Hoffmann, Mauricio Scopel and Moltrecht, Bettina},
  journal={BMC psychiatry},
  volume={24},
  number={1},
  pages={530},
  year={2024},
  publisher={Springer}
}

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us