Harmony is an open source NLP-driven data harmonisation tool developed for the Wellcome Data Prize.
We developed Harmony using Natural Language Processing to allow researchers to conduct meta-analyses of mental health studies in collaboration with the University of Ulster, University College London, and the Universidade Federal de Santa Maria in Brazil, for the Wellcome Trust’s Data Prize in Mental Health. You can read more on the project website.
The Wellcome Trust has established the Wellcome Data Prizes, which are targeted at multidisciplinary teams who are using existing data to answer important research questions. The prizes are focused on health challenges in four areas: climate and health, infectious disease, mental health and discovery research. In our case, we focused on the problem of data harmonisation using NLP.
Fast Data Science participated in the Wellcome Trust Data Prize in Mental Health in a team led by Dr. Eoin McElroy at the University of Ulster to develop a natural language processing data harmonisation tool called Harmony. You can read more about Harmony on Ulster’s website.
Researchers in psychology and the social sciences often have to conduct meta-analyses of research across long time periods and cultures, to identify trends. For example, our team was investigating the effect of social isolation and loneliness on mental wellbeing over time, focusing on two societies (the UK and Brazil).
Primary care psychologists often use the Generalized Anxiety Disorder 7 (GAD-7) questionnaire as a tool for quantifying anxiety, but other questionnaires used in the past include the Beck Anxiety Inventory. The Beck questionnaire focuses more on physical symptoms whereas the GAD-7 includes more questions about psychological state. It can be hard to compare datasets using different questionnaires.
Our tool, Harmony, allows researchers to upload a set of mental health questionnaires in PDF or Excel format, such as the GAD-7 anxiety questionnaire. It identifies which questions among questionnaires are identical, similar in meaning, or antonyms of each other, and generates a network graph. This allows researchers to harmonise datasets.
Interested in open source?
Uniquely, Harmony relies on a transformer neural network (large language model/LLM) architecture and is not dependent on a dictionary approach or word list. This is a departure from previous approaches used in social sciences. Our approach allows for multilingual support (English and Portuguese are our languages of focus, but we recently extended support to over eight languages), and Harmony is able to correctly map the GAD-7 used in the UK to the GAD-7 used in Brazil, despite the Brazilian questionnaire being in Brazilian Portuguese.
Using Harmony, our team was able to conduct groundbreaking research into social isolation and anxiety with NLP supplying a quantitative measure of the equivalence of the different mental health datasets.
We also released our tool on Github as a Python library and R library, as well as a REST API and Docker container.
Making sustainable software
You can read how we validated Harmony’s performance in predicting real-world data in this pre-print, awaiting publication:
The team working on Harmony was made up of:
• Dr. Eoin McElroy, Lecturer in the School of Psychology at the University of Ulster, Northern Ireland
• Dr. Bettina Moltrecht, Research Fellow in Population Health and Quantitative Social Science at University College London
• Prof. George Ploubidis, Professor of Population Health and Statistics at the Social Research Institute at University College London
• Dr. Mauricio Scopel Hoffmann, Associate Professor in the Department of Neuropsychiatry at Universidade Federal de Santa Maria, Brazil
• Thomas Wood - data scientist and natural language processing expert at Fast Data Science Ltd
You can cite our validation paper:
McElroy, Wood, Bond, Mulvenna, Shevlin, Ploubidis, Scopel Hoffmann, Moltrecht, Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry 24, 530 (2024), https://doi.org/10.1186/s12888-024-05954-2
A BibTeX entry for LaTeX users is
@article{mcelroy2024using, title={Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data}, author={McElroy, Eoin and Wood, Thomas and Bond, Raymond and Mulvenna, Maurice and Shevlin, Mark and Ploubidis, George B and Hoffmann, Mauricio Scopel and Moltrecht, Bettina}, journal={BMC psychiatry}, volume={24}, number={1}, pages={530}, year={2024}, publisher={Springer} }
What we can do for you