Legal chatbot using natural language processing to answer corporate insolvency questions

· Thomas Wood

Your NLP Career Awaits!

Ready to take the next step in your NLP journey? Connect with top employers seeking talent in natural language processing. Discover your dream job!

Find Your Dream Job

Insolvency bot, taking into account some statute law and some forms and some case law

We have developed a bot using natural language processing to demonstrate the power of legal AI and legal NLP.

Online demo of the tool

Try the insolvency bot

Fast Data Science have been working with a team of AI and legal experts at Royal Holloway University’s Department of Law and Criminology and the University of Surrey’s Department of Computer Science to generate a chatbot which can answer questions on corporate insolvency in England and Wales.

Using prompt engineering, generative models, and the text of key UK statute law such as the Insolvency Act 1986, important case law from the National Archives, and information on procedures from HMRC’s website, the system triages incoming queries and sends a smart and informative prompt to a generative model.

You can try the insolvency chatbot at this link.

Screenshot of the insolvency bot

The bot uses Retrieval Augmented Generation (RAG). RAG is a design pattern where we add an information retrieval component to a large language model (LLM), allowing us to add internal knowledge to the LLM’s capabilities.

The information provided on this website does not, and is not intended to, constitute legal advice.

Validating the Insolvency Bot

We have used an innovative approach to evaluating the output of the bot, since it is a generative model, which are typically hard to evaluate. We use a human-defined mark scheme and use the LLM to assess the bot’s answers to test questions, and mark it as if it were taking a law exam.

We will take the insolvency bot’s response and pass it to GPT-4 with an accompanying “criterion” question such as Does the lawyer mention that piercing the corporate veil may occur as a result of the director breaching their fiduciary duties towards the company?. If the answer comes back ‘yes’, ‘maybe’, or as a yes with caveats, then points are awarded accordingly.

Validating the insolvency bot

We have some validation scripts in our Github repo at: https://github.com/fastdatascience/evaluate_insolvency

We tried a number of variants of the bot, including one built around GPT-3.5 Turbo and GPT-4, and tested it head-to-head against the unmodified versions of GPT.

We found that GPT-4 is much slower to respond than GPT-3.5 Turbo, but is considerably more precise in its answers.

Insolvency Bot response times

The team

Our team on this project has been cross-disciplinary, with members from different universities and industries. You can read their profiles here.

Presentation at JURIX 2023

The Insolvency Bot was presented by Marton Ribary at JURIX 2023 (the 36th International Conference on Legal Knowledge and Information Systems), held in Maastricht University, the Netherlands, on 19 December 2023. At this conference, we were able to connect with a number of fascinating projects which also involved use of AI and LLMs to improve access to justice (A2J), such as Toivonen et al’s presentation Beyond Debt: The Intersection of Justice, Financial Wellbeing and AI, and Margaret Hagan’s presentation Good AI Legal Help, Bad AI Legal Help: Establishing quality standards for responses to people’s legal problem stories.

Citing the Insolvency Bot, DOIs, and resources

Our paper was published in the JURIX conference proceedings. You can cite the project using the following citation:

Paper: DOI Evaluation scripts: DOI PDF of presentation from JURIX 2023: [Click here to download the slideshow presented at the JURIX 2023 conference](/downloads/insolvency-llm-jurix-2023.pdf).
@software{Ribary_Prompt_Engineering_and_2023,
author = {Ribary, Marton and Krause, Paul and Orban, Miklos and Vaccari, Eugenio and Wood, Thomas Andrew},
doi = {10.3233/FAIA230979},
month = dec,
title = {{Prompt Engineering and Provision of Context in Domain Specific Use of GPT}},
url = {https://fastdatascience.com/insolvency/},
year = {2023}
}

References

Ribary, M., et al. Insolvency Bot: A GPT-based Legal Advice Tool for Small Businesses in Distress [Long Paper Version]. Zenodo, 12 Sept. 2023, doi:10.5281/zenodo.10029735.

Bilgin, O., Fields, L., Laverghetta, A. Jr., Marji, Z., Nighojkar, A., Steinle, S., & Licato, J. (2023). AMHR Lab 2023 COLIEE Competition Approach. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Tenth International Competition on Legal Information Extraction/Entailment (COLIEE 2023) in association with the 19th International Conference on Artificial Intelligence and Law (pp. 77–86).

Bittner, M. (1990). The IRAC Method of Case Study Analysis: A Legal Model for the Social Studies. Social Studies, 81(5), 227–230.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. ArXiv. https://doi.org/10.48550/arXiv.2209.14500

Celikyilmaz, A., Clark, E., & Gao, J. (2020). Evaluation of text generation: A survey. ArXiv. https://doi.org/10.48550/arXiv.2006.14799

Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., & Androutsopoulos, I. (2020). LEGAL-BERT: The Muppets straight out of Law School. Findings of the Association for Computational Linguistics: EMNLP 2020, 2898–2904. https://doi.org/10.18653/v1/2020.findings-emnlp.261

Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, Y., Wang, X., Dehghani, M., Brahma, S., Webson, A., Gu, S. S., Dai, Z., Suzgun, M., Chen, X., Chowdhery, A., Castro-Ros, A., Pellat, M., Robinson, K., … Wei, J. (2022). Scaling Instruction-Finetuned Language Models. In arXiv. https://doi.org/10.48550/arXiv.2210.11416

Code of Business Crisis and Insolvency 2022 (Italy). (2022). https://www.normattiva.it/uri-res/N2Ls?urn:nir:stato:decreto.legislativo:2019-01-12;14

Companies (Rescue Process for Small and Micro Companies) Act 2021 (2020). (2021). https://www.irishstatutebook.ie/eli/2021/act/30/section/3/enacted/en/html

Corporate Insolvency and Governance Act 2020 (UK). (2015). https://www.legislation.gov.uk/ukpga/2020/12/contents/enacted

Corporations Amendment (Corporate Insolvency Reforms) Act 2020 (Cth) (Act) (Australia). (2020). http://classic.austlii.edu.au/au/legis/cth/num_reg/cairr2020202001654694/

Debbarma, R., Prawar, P., Chakraborty, A., & Bedathur, S. (2023). IITDLI: Legal Case Retrieval Based on Lexical Models. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Tenth International Competition on Legal Information Extraction/Entailment (COLIEE 2023) in association with the 19th International Conference on Artificial Intelligence and Law (pp. 40–47).

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org/10.18653/v1/N19-1423

European Commission. (2003). Commission Recommendation of 6 May 2003 concerning the definition of micro, small and medium-sized enterprises (Text with EEA relevance) (notified under document number C(2003) 1422 (Techreport 2003/361/EC). http://data.europa.eu/eli/reco/2003/361/oj

European Commission. (2022). Proposal for a Directive of the European Parliament and of the Council harmonising certain aspects of insolvency law (Techreport COM/2022/702). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52022PC0702

Fujita, M., Kiyota, N., & Kano, Y. (2021). Predicate’s Argument Resolver and Entity Abstraction for Legal Question Answering: KIS teams at COLIEE 2021 shared task. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Eigth International Competition on Legal Information Extraction/Entailment (COLIEE 2021) (pp. 15–24).

Glaese, A., McAleese, N., Trębacz, M., Aslanides, J., Firoiu, V., Ewalds, T., Rauh, M., Weidinger, L., Chadwick, M., Thacker, P., Campbell-Gillingham, L., Uesato, J., Huang, P.-S., Comanescu, R., Yang, F., See, A., Dathathri, S., Greig, R., Chen, C., … Irving, G. (2022). Improving alignment of dialogue agents via targeted human judgements. ArXiv. https://doi.org/10.48550/arXiv.2209.14375

Hardcastle, D., & Scott, D. (2008). Can we evaluate the quality of generated text? In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, & D. Tapias (Eds.), Proceedings of the 6th Language Resources and Evaluation Conference (pp. 3151–3158).

Hart, H. L. A. ([1961] 2012). The concept of law (Third edition). Oxford University Press.

Hutchinson, G. B. (2021). The Small Companies Rescue Act – false hope for failing companies? Company Law Practice?, 7.

Katz, D. M., Bommarito, M. J., Gao, S., & Arredondo, P. (2023). GPT-4 passes the Bar Exam. SSRN. https://doi.org/10.2139/ssrn.4389233

Kim, M.-Y., Rabelo, J., Goebel, R., Kano, Y., Satoh, K., & Yoshioka, M. (2023). COLIEE 2022 Summary: Methods for Legal Document Retrieval and Entailment. In Y. Takama, K. Yada, K. Satoh, & S. Arai (Eds.), New Frontiers in Artificial Intelligence. JSAI-isAI 2022 (Issue 13859, pp. 51–67). Springer. https://doi.org/10.1007/978-3-031-29168-5_4

Kojima, T., Shane Gu, S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2023). Large Language Models are Zero-Shot Reasoners. ArXiv. https://doi.org/10.48550/arXiv.2205.11916

Lin, S., Hilton, J., & Evans, O. (2021). TruthfulQA: Measuring how models mimic human falsehoods. ArXiv. https://doi.org/10.48550/arXiv.2109.07958

Liu, Yiheng, Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., Wu, Z., Zhu, D., Li, X., Qiang, N., Shen, D., Liu, T., & Ge, B. (2023). Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models. ArXiv. https://doi.org/10.48550/arXiv.2304.01852

Liu, Yiqun, Li, H., Su, W., Wang, C., Wu, Y., & Ai, Q. (2023). THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained Language Models for Legal Case Retrieval. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Tenth International Competition on Legal Information Extraction/Entailment (COLIEE 2023) in association with the 19th International Conference on Artificial Intelligence and Law (pp. 1–6).

Lynch, K. (2023). Is ChatGPT a threat to the crative industries? University of Derby Magazine, 17. https://www.derby.ac.uk/magazine/issue-17/chat-gpt-threat-creative-industries/

Microsoft. (2023). Azure Functions. Computer software. https://azure.microsoft.com/en-gb/products/functions

Miklos, O. (2023). Mickey-bot. Computer software.

Mischcon de Reya. (2023). Mishcon de Reya’s exploration of AI technologies featured in the media. Mischcon de Reya. https://www.mishcon.com/news/mishcon-de-reyas-exploration-of-ai-technologies-featured-in-the-media

Mokal, R. J., Davis, R., Madaus, S., Mazzoni, A., Mevorach, I., Romaine, B., Sarra, J. P., & Tirado, I. (2018). Micro, small, and medium enterprise insolvency: A modular approach. Oxford University Press.

National Statistics. (2023). Company Insolvency Statistics: April to June 2023 [Techreport]. https://www.gov.uk/government/statistics/company-insolvency-statistics-april-to-june-2023

Nguyen, M. L., Bui, Q. M., Do, D.-T., Le, N.-K., Nguyen, D.-H., Nguyen, K.-H., & Anh, T. P. N. (2023). JNLP@COLIEE 2023: Data Augmentation and Large Language Model for Legal Case Retrieval and Entailment. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Tenth International Competition on Legal Information Extraction/Entailment (COLIEE 2023) in association with the 19th International Conference on Artificial Intelligence and Law (pp. 17–26).

Norton III, W. L., & Bailey, J. B. (2020). The pros and cons of the Small Business Reorganization Act of 2019. Emory Bankruptcy Developments Journal, 36(2), 383–393. https://scholarlycommons.law.emory.edu/ebdj/vol36/iss2/2

Novaes, L. P., Vianna, D., & da Silva, A. (2023). A Topic-Based Approach for the Legal Case Retrieval Task. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Tenth International Competition on Legal Information Extraction/Entailment (COLIEE 2023) in association with the 19th International Conference on Artificial Intelligence and Law (pp. 27–31).

OpenAI. (2022a). Introducing ChatGPT. Blog post. https://openai.com/blog/chatgpt

OpenAI. (2022b). New and improved embedding model. Blog post. https://openai.com/blog/new-and-improved-embedding-model

OpenAI. (2023). GPT-4 Technical Report. In arXiv [Techreport]. https://doi.org/10.48550/arXiv.2303.08774

Oppenheimer, D. (2023). ChatGPT has arrived – and nothing has changed. Times Higher Education. https://www.timeshighereducation.com/campus/chatgpt-has-arrived-and-nothing-has-changed

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. ArXiv. https://doi.org/10.48550/arXiv.2203.02155

Rabelo, J., Goebel, R., Kano, Y., Kim, M.-Y., Satoh, K., & Yoshioka, M. (2022). Overview and Discussion of the Competition on Legal Information Extraction/Entailment (COLIEE) 2021. The Review of Socionetwork Strategies, 16, 111–133. https://doi.org/10.1007/s12626-022-00105-z

Rattray, K. (2022). Will ChatGPT replace lawyers? Clio. https://www.clio.com/blog/chat-gpt-lawyers/

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using siamese BERT-networks. In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 3982–3992). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410

Schilder, F., Chinnappa, D., Madan, K., Harmouche, J., Vold, A., Bretz, H., & Hudzina, J. (2021). A Pentapus Grapples with Legal Reasoning. In J. Rabelo, R. Goebel, Y. Kano, M.-Y. Kim, K. Satoh, & M. Yoshioka (Eds.), Proceedings of the Eigth International Competition on Legal Information Extraction/Entailment (COLIEE 2021) (pp. 60–68).

Shidiq, M. (2023). The use of artificial intelligence-based ChatGPT and its challenges for the world of education: from the viewpoint of the development of creative writing skills. Proceedings of the International Conference on Education, Society and Humanity, 1(1), 353–357.

Small Business, Enterprise and Employment Act 2015 (UK). (2015). https://www.legislation.gov.uk/ukpga/2015/26/contents/enacted

Small Business Reorganization Act of 2019 (US). (2019). https://www.congress.gov/bill/116th-congress/house-bill/3311

The World Bank. (2017). Report on the Treatment of MSME Insolvency [Techreport]. https://documents1.worldbank.org/curated/en/973331494264489956/pdf/114823-REVISED-PUBLIC-MSME-Insolvency-report-low-res-final.pdf

Thomson Reuters Institute. (2023). ChatGPT and Generative AI within Law Firms Law firms see potential, eye practical use cases and more knowledge around risks [Techreport]. https://www.thomsonreuters.com/en-us/posts/wp-content/uploads/sites/20/2023/04/2023-Chat-GPT-Generative-AI-in-Law-Firms.pdf

UNCITRAL. (2021). Legislative Recommendations on Insolvency of Micro- and Small Enterprises [Techreport]. https://uncitral.un.org/en/ilmse

Vaccari, E. (2022). A Modular Approach to Restructuring and Insolvency Law: Executory Contracts and Onerous Property in England and Italy. Norton Journal of Bankruptcy Law and Practice, 5.

Vaccari, E., Ehmke, D., & Burigo, F. (2023). MSMEs in Distress: Regulatory Costs and Efficiency Considerations in the Implementation of Preventive Restructuring Mechanisms: An Anglo-German-Italian Perspective. Journal of International and Comparative Law.

Vaccari, E., & Ghio, E. (2022). English corporate insolvency law: A primer. Edward Elgar.

Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual: (Python Documentation Manual Part 2). CreateSpace.

Wakeling, D. (2023). A&O announces exclusive launch partnership with Harvey. Allen & Overy. https://www.allenovery.com/en-gb/global/news-and-insights/news/ao-announces-exclusive-launch-partnership-with-harvey

Walters, A. (2020). The Small Business Reorganization Act: America’s new tool for SME restructuring for the COVID and post-COVID era. The Company Lawyer, 10, 324–325.

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-consistency improves Chain of Thought reasoning in Language Models. ArXiv. https://doi.org/10.48550/arXiv.2203.11171

Warzel, C. (20230208). Talking to AI might be the most important job skill of this century. https://www.theatlantic.com/technology/archive/2023/02/openai-text-models-google-search-engine-bard-chatbot-chatgpt-prompt-writing/672991/

Wood, T. (2023a). Evaluate insolvency. GitHub. https://github.com/fastdatascience/evaluate_insolvency

Wood, T. (2023b). Evaluation script for insolvency bot. Zenodo. https://doi.org/10.5281/zenodo.8292105

Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., & Schiele, B. (2016). Latent embeddings for zero-shot classification. https://doi.org/10.48550/arXiv.1603.08895

Ye, X., & Durrett, G. (2022). The unreliability of explanations in Few-shot Prompting for textual reasoning. ArXiv. https://doi.org/10.48550/arXiv.2205.03401

Yu, F., Quartey, L., & Schilder, F. (2022). Legal prompting: Teaching a Language Model to think like a lawyer. ArXiv. https://doi.org/10.48550/arXiv.2212.01326

Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning with Reasoning. ArXiv. https://doi.org/10.48550/arXiv.2203.14465

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT. ArXiv. https://doi.org/10.48550/arXiv.1904.09675

Elevate Your Team with NLP Specialists

Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.

Hire NLP Experts

Fast Data Science and Harmony at Google with AI Camp on 10/12/2024
Ai in research

Fast Data Science and Harmony at Google with AI Camp on 10/12/2024

Above: video of the AICamp meetup in London on 10 December 2024. Harmony starts at 40:00 - the first talk is by Connor Leahy of Conjecture

What is an AI hackathon and how can I join one?
Ai for businessAi in research

What is an AI hackathon and how can I join one?

Image above: the winning teams and participants in the Harmony AI hackathon on 3 June 2024 AI Hackathons: A Playground for Innovation What is an AI hackathon?

Harmony training workshop
Ai in research

Harmony training workshop

Transforming data management with Harmony: A hands-on introduction Fast Data Science is excited to be partnering with UK Data Service to deliver a practical workshop on how to best use Harmony for analysing data in the social sciences.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us