By all counts, AI (artificial intelligence) is quickly becoming the dominant trend when it comes to data ecosystems around the globe. As the decades unfold, this will only increase. IDC, a leading global market intelligence firm, estimates that the AI market will be worth $500 billion by 2024. Virtually all industries are going to be impacted, driving a string of new applications and services designed to make work and life in general easier. AI strategy consulting is becoming one of the top data science-related services as well.
This year, we can definitely expect AI to become far more efficient at solving practical problems which typically get in the way of unstructured language processes driven by data – thanks largely to advances in natural language processing (NLP).
If you are designing a job applicant system where jobseekers upload their CVs or résumés and the system recommends a salary band, how would you go about it?
Very simplified demonstration of how a symbolic AI might find seniority levels in a CV.
The first approach is called symbolic AI, rule-based AI, or knowledge engineering, and the second approach can be called non-symbolic AI, or simply machine learning.
Symbolic AI involves manual rules, whereas machine learning involves the learning of patterns from tagged data.
Of course, there is no right or wrong answer. Clearly the symbolic AI approach is fraught with difficulty: How can you ever be sure that you have considered all the possible turns of phrase a candidate could use? Pure machine learning is also tricky because you may not have access to a labelled dataset, and you may not have control over the patterns that the AI learns.
How about a combined approach: use machine learning to identify important features of a candidate’s job history, such as career length, but combine that with a manual set of rules to translate it into to a salary band? This approach, known as hybrid AI, allows you you to reap the benefits of both symbolic AI and machine learning.
Hybrid AI may be defined as the enrichment of existing AI models through specially obtained expert knowledge. Hybrid AI is one of the most debated topics in the field of technology, natural language processing and AI.
Before we explore hybrid AI in detail – i.e. what it is at the core and what some of its use cases may be – we ought to discuss a little background first, a backdrop, so to speak, for the emergence of hybrid AI.
Out of all the challenges AI must face, understanding language is probably one of the toughest. Even though the vast majority of AI solutions in existence today are capable of crunching massive volumes of both raw numbers and structured data in milliseconds, the multiple meanings and various nuances in language according to context is a whole different issue altogether.
Words are contextual to an AI system, which means they will be interpreted differently under different circumstances. This is fairly straightforward and “all in a day’s work” for our brains, but for a piece of software, it’s not quite as straightforward.
This is the key reason why coming up with software which can interpret language the right way and in a reliable way, has become very crucial to developing any kind of AI across the board. When companies are able to achieve this level of computational genius, they would literally be in a position to open the AI development floodgates – by letting it access and consume practically any kind of knowledge they throw at it.
Natural language processing or simply NLP is a vital component of this equation – namely by its virtue to leverage an entire world of language-based information. Language is something which is at the centre of all facets of enterprise activity. This means that an AI approach cannot be considered complete and viable unless the maximum amount of value can be extracted from this kind data.
A symbolic or knowledge-based approach, if you will, leverages a knowledge graph which may be seen as an open box – where the structure is created by data scientists with the purpose of representing the real world – a world where concepts are defined clearly and connected with each other through semantic relationships.
Given the power of NLP algorithms and knowledge graphs, it’s easy to read and learn from virtually any kind of text, straight ‘out of the box’, and gain an in-depth understanding of how data is going to be interpreted, where conclusions can be easily drawn from that interpretation.
Now, this is very similar to how people are able to create their own domain-oriented, specific knowledge – and this is what will enable AI projects to link the algorithmic results to explicit knowledge representations. In 2022, you can bet there will be a shift towards this type of AI approach, where both techniques will be combined. Enter the world of hybrid AI.
What hybrid AI does is that it takes advantage of different techniques to improve overall results while also tackling complex cognitive problems in a very effective way. Hybrid AI is also quickly becoming a very popular approach to natural language processing.
Bringing together the best of hybrid AI and machine learning (ML) models is the best way to unlock the full value of unstructured language data – and that too in a speedy, accurate and scalable way which most businesses demand today.
The use of symbolic reasoning, knowledge and semantic understanding will produce far more accurate results than thought possible, in addition to creating a more effective and efficient AI environment. Not only that, but it will also reduce resource-intensive training, which otherwise requires an expensive high-speed data infrastructure.
Hybrid AI is a more holistic and comprehensive approach to sustainably come to terms with the full benefits and potential of AI-based solutions. In order to grasp the magnitude of this statement, we should understand two areas of AI:
Symbolic AI: In symbolic AI, data scientists attempt to link facts and events using logic rules, making the knowledge machine-readable and retrievable via semantic enrichment.
Non-symbolic AI (machine learning approach): This area refers to models in machine learning, deep learning, and neural networks, where extensive training data is used to arrive upon decisions and conclusions.
The traditional view is that symbolic AI can be “supplier” to non-symbolic AI, which in turn, does the bulk of the work. Or alternatively, a non-symbolic AI can provide input data for a symbolic AI. The symbolic AI can be used to generate training data for the machine learning model.
This can also be viewed from a different standpoint, where symbolic AI is an essential task which can create a lot of value on its own, rather than leaving most of the work to non-symbolic AI – i.e. through the structuring, preparation and enrichment of organisational data as well as knowledge (e.g. logic, facts, semantics, events, etc.) into a machine-readable form.
This preparation takes place in the form of a knowledge graph, which we briefly discussed at the start of the article. It’s probably fair to say that hybrid AI is more of a symbolic and non-symbolic AI combination than anything else. And, the knowledge graph, can potentially be a major asset for any enterprise.
Perhaps it would be better if we revise and somewhat refine our definition of hybrid AI at this point:
Hybrid AI is the unified, structured and thorough use of both symbolic and non-symbolic AI to capture, map, and structure, as well as make data or knowledge of an organisation available in an understandable, readable and ‘retrievable by machines’ format. In turn, this knowledge can be retrieved through natural language processing, which is the easiest access mode for people.
Fast Data Science - London
We can certainly give the above statement more weight through these practical use case examples:
In the retail industry, the product database of a fashion brand could represent symbolic AI. Facts like size, colour or compatibility/suitability with other products can be represented very easily when a user queries product data through chatbots or voice assistants.
In the autonomous vehicle sector, symbolic AI may specify through map data where stop signs, traffic lights or obstacles in an area may be. This factual data can facilitate better control of the self-driving vehicle.
In event management, symbolic AI may be used to represent an event database. For instance, if a specific band is playing at a concert, let’s say a Jeff Beck concert – if this fact is integrated into the database, possibly extended by a music genre too, the chatbot can easily recognise meaning and context of queries related to “Jeff Beck”. It would not confuse this expressions with an everyday person named Jeff or something else.
Some of the prime candidates for introducing hybrid AI are business problems where there isn’t enough data to train a large neural network, or where traditional machine learning can’t handle all the edge cases on its own. Hybrid AI can also help where a neural network approach would risk discrimination or or problems due to lack of transparency, or would be prone to overfitting.
At Fast Data Science we are working on a project for identifying the risk of a clinical trial. The user uploads a PDF document to our platform which describes the plan for running a clinical trial, called the clinical trial protocol. A machine learning model is able to identify key attributes of the trial such as its location, duration, number of subjects, and some statistical parameters. The output of the machine learning model is then fed into a manually designed risk model which translates these parameters into a risk value which is then displayed to the user as a traffic light indicating high, medium or low risk.
It would have been more difficult to use a neural network alone to go directly from the text of the protocol to a risk value, because data is difficult to tag, and far more data would be needed for this approach. Furthermore, human intelligence is helpful to specify what is a sensible rule. If all the high-risk trials contained a particular feature, such as being located in a certain country, a traditional deep learning model might erroneously learn that country is a risk factor and end up discriminating accidentally.
With hybrid AI, machine learning can be used for the difficult part of the task, which is extracting information from raw text, but symbolic logic helps to to convert the output of the machine learning model to something useful for the business.
In our example, symbolic logic can be used to manually specify that small numbers of participants are associated with a trial failing, which is common sense to a human and doesn’t need a neural network. In this way we can simplify the problem space for the neural network.
How hybrid AI can calculate clinical trial risk using a machine learning model which inputs into a symbolic AI
Legal reasoning is an interesting challenge for natural language processing because legal documents are by their nature precise, information dense, and unambiguous. Depending on the legal system of a country, some areas of law may be more suited to symbolic logic than others. I imagine that statute law, which is designed to be unambiguous, is easier to translate into symbolic logic than case law (legal systems based on precedent, as found in common law jurisdictions such as Britain and the US).
Nils Holzenberger at Johns Hopkins University has succeeded in translating a large amount of the US tax code (which is statute law rather than case law) into symbolic logic in Prolog (a programming language used for logical reasoning).
So if we look at the second row of the below table from the US tax code:
US tax code title 26A-1A-I Section 1 (source: govinfo.gov)
the above row can be translated into Prolog as follows:
s1_a_ii(Taxinc,Tax) :- Taxinc =< 89150, 36900 < Taxinc, Tax is round(5535+(Taxinc-36900)*0.28).
Once the tax code is all in Prolog, questions can be put to a system such as How much must I pay in tax if I earned $49,000 last year, I am widowed, etc - provided the question is also converted to Prolog.
The translation of the tax code into symbolic logic was a painstaking manual process. If a natural language model such as BERT can be adapted to reliably translate statute into to symbolic logic, a large amount of the repetitive work of tax lawyers could potentially be automated. Holzenberger’s team and others have been working on models to interpret legal texts in natural language to feed into a symbolic logic model.
A holistic hybrid AI approach offers the best of something every business wants: uninterrupted scalability. This would allow them to add new products or, say, events with ease. All they need to do is add the new specifics through a conversational AI platform, for example, and the core process will remain the same: a combination of symbolic and non-symbolic AI.
For the average user consuming hybrid AI, there’s clear value to be had, which can go well beyond just chatbots or voice assistants, for example:
Development of knowledge graph – As a starting point of any chatbot or voice assistant development, for instance, a development team should produce a bespoke knowledge graph. We believe it’s the data structure that will propel businesses into the future, proving to be the core of all future use cases utilising AI.
Process implementation – Organisations that refuse to embrace digitisation and organisational preparation data will be left behind. Therefore, a bespoke knowledge graph will become almost mandatory at some point. We implement specific organisational processes and workflows specific to your business, through which you can update your knowledge documentation regularly, both in the present and in the future.
Decades of AI and NLP knowhow – Collectively, our team leverages decades of experience around AI, natural language processing and knowledge graph development. The average business user and enterprises alike can benefit massively from this experience for their customised hybrid AI solution.
Convenience and practicality – Fast Data Science tends to all the minute details running in the background, while you focus on preparing and adding information as and when needed.
Holistic process – We like to accompany our users through every phase of the process. From knowledge preparation for the knowledge graph to designing and training machine learning models, all of our work is documented and supported.
Visualise the following scenario for a moment if you will:
A prospective customer gets in touch with an event management company because he/she wants to inquire about an upcoming concert. Naturally, the first thing that prospect would do is ask about any upcoming concerts in their locality – what we call the discovery phase. Next, the prospect would ask about a concert by referring to a specific band, artist or musician.
He/she may ask other questions as well such as the location and time of the concert. All this is taken into consideration when we prepare the knowledge graph. Next, the prospect may ask about ticket availability, whether the ticket has any specific categories (single, couple, adult, senior) or ticket classes (front row, standing area, VIP lounge) – which will also be considered when developing the knowledge graph.
Once all the required information has been gathered from the prospect, he/she will be redirected to a cart or ticket booking system. With all the given data considered, the prospect turned customer can now buy their desired ticket in an NLP-based dialog via the chatbot
With all the above scenarios considered, one wonders: Why now? Why is the transition happening now? Why wasn’t AI capable of taking advantage of language-based knowledge in the past?
Well, different learning approaches lead to different solutions. For instance, in some cases, AI could do some or all of the above – although just because ML algorithms, for example, does well with certain needs and contexts, does not mean that it is the go-to method. Unfortunately, this can be observed all too often when we talk about computers attempting to understand and process language. It’s only in the last few years in particular that we’ve witnessed rather remarkable advancements in natural language processing (NLP) and natural language understanding (NLU), based just on hybrid AI approaches.
Every business, company and enterprise must now embrace hybrid AI – because where organisations were previously throwing just one form of AI at a problem (with its limited toolsets), they can now utilise multiple, varying approaches.
Each approach may be used to target the problem from a unique angle, and through varying models, evaluate and solve the problem in a multi-contextual way. Since each of the methods can be evaluated independently, it’s easy to see which one will deliver the most optimal results.
Enterprises have already got a taste of what AI can do, witnessing its powerful applications, and this hybrid approach of doing things is going to be a prominent initiative when we talk all things technology in 2022. There are significant time and cost benefits to be had, not to mention faster deployment and results, while also seeing unmatched efficiency and accuracy across the board in analytical and operational processes.
To demonstrate the above with just one example – the annotation process is currently being undertaken by select industry experts only, largely due to the complexity and expense associated with training. However, by combining extensive knowledge repositories and graphs, this training can be greatly simplified, effectively ‘democratising’ the process itself within the knowledge workforce.
Naturally, research into all types of AI rarely comes to a standstill, if at all. But we’re definitely going to be seeing a keen focus on expanding the knowledge graph and automating ML along with other methods, because enterprises are now under pressure to quickly consume massive amounts of data and at a lower cost too.
As 2022 continues, we’re going to be seeing some very exciting and promising improvements in how organisations apply hybrid AI models to their core processes. Business automation is already catching on in the form of email management and search.
However, the current keyword-based search engine approach, for example, can absorb and interpret entire documents with blazing speed, but they can extract only basic and largely non-contextual information. Similarly, automation email management systems are not quite capable of penetrating meaning beyond just product names and other points of information or references. In the end, users are tasked with sorting through a long list of ‘hits’, trying to locate the primary pieces of knowledge. This inevitably slows down business processes, sets the clock back on swift decision-making, and ultimately, has an adverse impact on productivity and revenue.
Hybrid AI can change all that. Empowering natural language processing and natural language understanding tools using symbolic comprehension under a hybrid framework – can provide all knowledge-based businesses and enterprises the ability to mimic the human ability (aha!), in order to comprehend entire documents across their automated processes.
Some key points we have discussed thus far:
AI is a very powerful tool which can work miracles for enterprise data operations, even though it is still in its infancy. Forward-facing organisations are already realising the limitations of single-mode AI models, and understand all too well that technology use needs to be more adaptive, more capable of getting to the depths of stored data, and become less costly, as well as a lot easier to use.
Once symbolic AI is introduced into business processes, the black box of AI is open, so to speak, allowing users to understand why machines act a certain way and what can be done to change that behaviour to get more desirable results. Additionally, this high visibility would allow operators to persistently monitor their processes, so that they can be further optimised and simplified.
Hybrid AI that’s based on symbolic AI capable of understanding actual knowledge like people do instead of just learning patterns – is the most effective way for enterprises to fully utilise and benefit from the data they’ve been feverishly collecting over the years.
We now understand that hybrid AI combines different methods to improve overall results and tackle complex cognitive problems much more effectively. Hybrid AI is quickly becoming a popular approach for natural language processing, and bringing together the best of symbolic AI and machine learning models, it is the best way to unlock the value of unstructured language data – and it does so with exceptional speed, accuracy and scale which today’s businesses require.
Since hybrid AI combines symbolic AI and ML, it can effectively leverage the strengths of each method while still remaining explainable – ML can target certain aspects of a problem, for example, where explainability isn’t required, while symbolic AI can arrive upon conclusions to make decisions through a transparent and easily understandable process. With time moving forward, a hybrid approach to AI will only become more common.
When it comes to challenges in AI, understanding language remains one of the hardest. While ML can certainly support certain kinds of language-intensive applications, it can’t quite deliver optimal results.
A symbolic or knowledge-based approach, on the other hand – aka. Hybrid AI – makes use of a knowledge graph in order to embed knowledge. It is structured in a very similar way to how people build their own knowledge. Furthermore, it offers explainable AI as the outcomes are directly connected with explicit knowledge representations.
Hybrid AI can also free up data scientists from cumbersome and tedious tasks such as data labelling. For example, an insurer with multiple medical claims may want to use natural language processing to automate coding so that the AI can detect and label the affected body parts automatically in an accident claim.
The hybrid AI system would capture the data in each claim and normalise it. For instance, if the right ankle is injured in an accident, symbolic AI can easily detect all synonyms, understand the underlying context and apply a code in regards to the body part involved. It’s a transparent process as it allows the insurer to see where the body part is coded with a snippet from the original report. There’s a huge efficiency gain to be had here although people will ultimately be making the final decision, of course.
This kind of implementation will also help businesses understand why an AI system is behaving a certain way. If there are errors, for example, symbolic AI can provide a clear and transparent process to backtrack in order to identify the source of the ‘blunder’.
Fast Data Science is at the forefront of hybrid AI and natural language processing, helping businesses improve process efficiency, among other things. Get in touch with our team to learn more.
Sremac, Neuro-fuzzy inference systems approach to decision support system for economic order quantity, Ekonomska Istraživanja / Economic Research 32(1):1114-1137
Ananthaswamy, AI’s next big leap, Knowable Magazine (2020)
Holzenberger et al, Factoring Statutory Reasoning as Language Understanding Challenges (2021)
Holzenberger et al, A Dataset for Statutory Reasoning in Tax Law Entailment and Question Answering (2020)
Guest post by Essa Jabang, who works as a data and engineering consultant in our team at Fast Data Science and also runs his own company Taybull.
What is NLP in business environments? Natural language processing (NLP) is a branch of AI (Artificial Intelligence), empowering computers to not just understand but also process and generate language in the same way that humans do.
Can we detect what is fake news or plagiarised in 59 articles for Der Spiegel by Claas Relotius? We used natural language processing to uncover the clues that pointed to a rogue journalist’s history of submitting fake news