Can AI handle legal questions yet? The return of the Insolvency Bot

· Thomas Wood
Can AI handle legal questions yet?  The return of the Insolvency Bot

Can AI handle legal questions yet? We have compared the capabilities of the older and newer large language models (LLMs) on English and Welsh insolvency law questions, as a continuation of the Insolvency Bot project.

Thomas Wood tried asking several LLMs a series of questions about insolvency law set by insolvency expert Eugenio Vaccari, designed to be about undergraduate level. We tested older LLMs such as GPT-3.5 as well as newer entrants such as DeepSeek.

We tried using the LLMs “off the shelf” with no modification (our control), and then as a comparator, we also tried including relevant English and Welsh case law, statutes, and forms from HMRC in the prompt. For an example, instead of asking an LLM

I have X debts and Y happened. Should I close my company?

we can ask the LLM,

The Insolvency Act 1986 Section 123 states that [paragraph]. The Companies Act 2006 Section 456 states that [paragraph]. This Supreme Court ruling is relevant: [ruling]. I have X debts and Y happened. Should I close my company?

In other words, we do the legwork of looking up the relevant information and stuffing it into the prompt, and the LLM just has to do what it’s good at, namely formulating sentences. This technique of adding extra text to a prompt is called retrieval augmented generation, or RAG. What’s cool about RAG, is that the user doesn’t need to see it.

We found that over the last few years, the more advanced LLMs have actually had a bigger improvement due to the extra information that we included in the prompt. LLMs are constantly improving in their unmodified form, but you can see clearly in this time series plot that RAG has become more effective over time.

The models released in 2025, such as DeepSeek and the current iteration Gemini, now outperform the earlier GPT-3.5 by a large margin. This was not surprising. But what is unexpected is that the RAG-augmented models have even more of an edge over their non-RAG counterparts, than they did one or two years ago.

So we built a RAG system long before Google Gemini or DeepSeek came out, and it has performed far better on those models than on any model we had access to back in 2023 when we developed the system. Any ideas why this could be? Contact us and let us know your thoughts!

The pace of improvements is also astounding. Could we be facing a new Moore’s law in AI?

You can read our original paper (which predates the release of DeepSeek) here:

  • Marton Ribary, Paul Krause, Miklos Orban, Eugenio Vaccari, Thomas Wood, Prompt Engineering and Provision of Context in Domain Specific Use of GPT, Frontiers in Artificial Intelligence and Applications 379: Legal Knowledge and Information Systems, 2023. https://doi.org/10.3233/FAIA230979

And you can try the Insolvency Bot here: https://fastdatascience.com/insolvency

References

  1. Ribary, Marton, et al. Prompt Engineering and Provision of Context in Domain Specific Use of GPT. Legal Knowledge and Information Systems. IOS Press, 2023. 305-310.

Your NLP Career Awaits!

Ready to take the next step in your NLP journey? Connect with top employers seeking talent in natural language processing. Discover your dream job!

Find Your Dream Job

How good are the best large language models in 2026?
Generative aiLegal ai

How good are the best large language models in 2026?

How well do the newest large language models (LLMs) perform on real world tasks, and how fast are they advancing? In the beginning of 2026, a series of new releases of the Claude AI model by the company Anthropic which included plugins for legal and finance collectively wiped billions off the value of software companies like Salesforce and Adobe, as investors became concerned that agentic AI could replace the functionality of traditional software-as-a-service offerings.

Should AI companies be allowed to mine creative content to train models?
Legal ai

Should AI companies be allowed to mine creative content to train models?

The effects of AI companies training on creative works When AI companies train models on text, video and images from creative industries, the end product is an AI model which can create near-human quality visual designs or copywriting. Many artists have argued the AI companies are exploiting creatives by profiting from their work, while the original creators do not receive any compensation. By flooding the market with low-cost content, AI has driven down rates for human artists, which is threatening creative careers.

Are we in an AI bubble?
Generative ai

Are we in an AI bubble?

None of the content of this article constitutes investment advice. Few would dispute that the global economy is currently in an AI boom. However, opinions are divided over whether this boom also constitutes a bubble.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us