We developed a harmonisation tool using Natural Language Processing to allow researchers to conduct meta-analyses of mental health studies in collaboration with the University of Ulster, University College London, and the Universidade Federal de Santa Maria in Brazil, for the Wellcome Trust’s Data Prize in Mental Health.

The Wellcome Trust has established the Wellcome Data Prizes, which are targeted at multidisciplinary teams who are using existing data to answer important research questions. The prizes are focused on health challenges in four areas: climate and health, infectious disease, mental health and discovery research. 

Fast Data Science participated in the Wellcome Trust Data Prize in Mental Health in a team led by Dr. Eoin McElroy at the University of Ulster to develop a natural language processing data harmonisation tool called Harmony.

Researchers in psychology and the social sciences often have to conduct meta-analyses of research across long time periods and cultures, to identify trends. For example, our team was investigating the effect of social isolation and loneliness on mental wellbeing over time, focusing on two societies (the UK and Brazil).

Primary care psychologists often use the Generalized Anxiety Disorder 7 (GAD-7) questionnaire as a tool for quantifying anxiety, but other questionnaires used in the past include the Beck Anxiety Inventory. The Beck questionnaire focuses more on physical symptoms whereas the GAD-7 includes more questions about psychological state. It can be hard to compare datasets using different questionnaires.

gad 7 scanned min

Our tool, Harmony, allows researchers to upload a set of mental health questionnaires in PDF or Excel format, such as the GAD-7 anxiety questionnaire. It identifies which questions among questionnaires are identical, similar in meaning, or antonyms of each other, and generates a network graph. This allows researchers to harmonise datasets.

Uniquely, Harmony relies on Transformer neural network architectures and is not dependent on a dictionary approach or word list. This allows for multilingual support (English and Portuguese are our languages of focus), and Harmony is able to correctly map the GAD-7 used in the UK to the GAD-7 used in Brazil, despite the Brazilian questionnaire being in Brazilian Portuguese.

Using Harmony, our team was able to conduct groundbreaking research into social isolation and anxiety with NLP supplying a quantitative measure of the equivalence of the different mental health datasets.

wellcome logo black

Wellcome Data Prizes

Four health challenges

£500,000 to be shared between three teams

The Harmony team

The team working on Harmony was made up of:

• Dr. Eoin McElroy, Lecturer in Psychology at the University of Ulster, Northern Ireland

• Dr. Bettina Moltrecht,
Research Fellow in Population Health and Quantitative Social Science at University College London

• Prof. George Ploubidis, Professor of Population Health and Statistics at the Social Research Institute at University College London

• Dr. Mauricio Scopel Hoffman, Associate Professor in the Department of Neuropsychiatry at Universidade Federal de Santa Maria, Brazil

• Thomas Wood – data scientist and natural language processing expert at Fast Data Science Ltd