Introduction to a Questions Answering System in a Legal Domain
Searching for legal-related information is a slow and tedious process. Even though we live in a world of vast amounts of data, very few technology products are able to extract the insights hidden in the documents. On top of that, the law is always growing and evolving, which makes providing jurisdictionally accurate responses even more challenging.
Saving time on finding the relevant legal information is crucial. Hence, we propose a solution addressing this problem — a legal Question Answering System (QA) based on deep learning technologies. Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
Such a system is currently under development in a collaborative European project, Lynx.
LYNX Project
Being a European Research project funded by the EC, the main objective of Lynx is to create an ecosystem of cloud services to better manage compliance, based on a legal knowledge graph (LKG). It integrates and links heterogeneous data sources, including legislation, case law, standards, and other private contracts. One of the outcomes of the project is a collection of natural legal language processing services, one of which is a Question Answering System in a legal domain.
For more information about the services, please refer to our paper Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services. This work was presented in June at NAACL 2019 conference at Natural Legal Language Processing (NLLP) workshop.
Cuatrecasas, a Real-World Practice
Cuatrecasas, a Spanish-Portuguese law firm, is one of the Lynx project partners. It is present in over ten countries and specializes in all areas of business law.
When processing inquiries (questions) from clients, lawyers spend a significant amount of time looking for related laws or articles in the legal corpora. One of the scenarios considered in the scope of the Lynx project is enabling the lawyers to find the information faster. To address this, a Question Answering System is implemented.
The Question Answering System consists of several modules:
- Query Formulation. Converting a natural language question into a query that could be recognized by the index of legal documents. As a part of it, non-relevant words are removed, and the query is enriched with meaningful synonyms and alternative forms using PoolParty API.
- Candidate Answer Generation. When the documents are retrieved from the indexer, they need to be processed — to extract a relevant passage or an answer from it. This module employs Deep Learning techniques to handle the task. To train our model, we use the Stanford Question Answering Dataset (SQuAD).
- Answer Selection. The extracted passages and candidates are then ranked using a pre-trained ranking model and therefore ordered by relevance.
The current state-of-the-art Question Answering Systems are very advanced and outperform humans on many datasets, such as SQuAD, which is not specific to any domain. However, there are always domain-specific difficulties. For instance, the challenges in the legal domain are:
- Adaptation of SQuAD trained model in the legal domain
Currently, there are no legal-specific datasets sufficient enough to allow the training of Deep Learning models. Hence, the performance of the system in the legal domain is worse than in a generic one. - The multilingual aspect of the use case imposes additional difficulty
The lawyer might have limited command of the language of the jurisdiction. Hence, the inquiries and the resulting answers are translated. The complete workflow of the system is presented in the Figure below:
Justified by the existence of such difficulties and also the fact that mistakes might be too costly, the goal of the system is to assist (but not to substitute) the lawyer in finding the relevant information faster. Indeed, though the exact answer might not be correct, the Question Answering System should be able to correctly identify the document and specific paragraph containing the relevant information. The lawyer then reads and interprets the document for the client.
In the following example, we ask: What effects does the unfair dismissal have? The correct document is “Article 56. Inadmissible Dismissal” which appears frequently.
Question: What effects does the unfair dismissal have? Query: (effect) AND (unfair OR violating) AND (dismissal) |
Answer, Score, Document title:
- Dismissals motivated by any of the reasons for discrimination prohibited by the Constitution or by Law, or violating the basic rights and public freedoms of the worker, shall be null and void., 0.9692, ‘Heading1_Chapter3_Section10_Article56’
- Due to mass dismissal based on economic, technical, organizational or production reasons, provided that such has been duly authorized in accordance with the provisions of this Law., 0.8717, ‘Heading1_Chapter3_Section10_Article50’
- The provisions contained in Law 30/1992 dated 26 November, on the Legal Regime of Public Administration and Ordinary Administrative Procedure, shall be applicable in what is not provided for by the present article, particularly as regards resources., 0.8279, ‘Heading1_Chapter3_Section10_Article50’
- What is set forth in the preceding letters shall be applicable unless the dismissal is declared admissible in these cases for reasons not related to the pregnancy or the exercise of the right to the leaves and permissions pointed out., 0.7620, ‘Heading1_Chapter3_Section10_Article56’
- Notice of dismissal must be given in writing to the worker, reflecting the facts that justify it and the date on which it will take effect., 0.7095, ‘Heading1_Chapter3_Section10_Article56’
Due to the closeness of the topic, passages from an incorrect document “Article 50. Extinction by Willingness of the Worker” are retrieved with high scores. Frequently, the main reason for having the wrong answer is retrieving the wrong document in the first place, the task purely dependent on the indexer’s features. We believe that using semantic heuristics, such as expanding the query with legal vocabularies, will significantly improve the results in the future.
Conclusion
Legal technology products is an advancement that goes beyond traditional legal search. It gives lawyers a better starting point by unlocking the information hidden in the data sources. Natural language queries are sometimes truly essential for users and are able to provide the universe of more relevant documents. Such tools as automatic Question Answering systems could save the most valuable resource in our lives — time.