Legaltech and Machine Learning — A Good Match

In the rush to automate everything a lawyer can do, some machine learning practitioners forget tacit knowledge, which is hard to transfer into the systems they build. As Michael Polanyi puts it, “we can know more than we can tell.” Riding a bike, speaking a language, or playing a musical instrument are tasks we can perform well, yet can’t easily describe or codify. Keeping Polanyi’s observation in mind, here is a list of tasks machine learning can help Legaltech with.


Discovery is an initial phase of litigation during which parties in a dispute are required to provide each other relevant information and documents. The annual budget for discovery in the US in 2010 was approximately forty billion dollars. Sanctions for discovery violations have mushroomed over the last decade.

Most discovery work is done electronically; this is termed e-discovery. E-discovery can’t be efficiently done using simple keyword based search methods. The criteria for document relevancy are more sophisticated as they rely on the semantics and context of the documents being assessed for relevancy.

A more efficient set of approaches for e-discovery that uses machine learning is called technology-assisted review (TAR). In it, a human reviewer labels a small set of documents as relevant, and another set as irrelevant, and feed both to the system. An algorithm analyzes both sets of documents, finds correlations between their words, and determines which features affect relevance and how. It then classifies the remaining documents as relevant or irrelevant based on their similarity to both document sets.

There are multiple open source tools for software developers that greatly simplify building a professional e-discovery system in legaltech. They include Apache OpenNLP and NLTK.

Legal research

Another time and resource intensive task lawyers engage in is finding relevant cases. This involves combing through huge volumes of cases. It may result in too many relevant cases, leaving the lawyer to decide which one best supports an argument. It may also result in too few cases, which could indicate the lawyer is pursuing the wrong line of inquiry.

To mitigate this, natural language processing (NLP) and machine learning can be used to parse and analyze legal questions or documents. It then finds relevant cases and ranks them based on their relevancy score.

The tools and methods used here are similar to the ones used in e-discovery. The difference is that case documents could be uniformly structured, making them easier to process and analyze than generic documents in e-discovery.

Some machine learning based legal research platforms, like ROSS Intelligence, use NLP to narrow down the jurisdiction and date range of relevant cases. They then use part-of-speech tags and word embeddings to score the relevant documents.

Free open source software, like Stanford’s Log-linear Part-Of-Speech TaggerApache OpenNLP and NLTK can be used as the backbone of a legal research platform.

Machine Learning in Legaltech

Predicting odds of winning

One of the factors lawyers and clients consider before pursuing a case is the odds of winning. Typically, an experienced lawyer compares the potential case to similar cases litigated in the same court under similar circumstances and reviews their outcomes.

Given digitized case records and the documents and facts from a potential case, a machine learning algorithm can perform a similar analysis. For example, it can analyze existing insurance case law and predict decisions in tort liability or accident benefit cases.

There have been a few academic studies on this topic. One predicted the outcomes of case-based legal arguments with a 91% success rate. Another famous study predicted the outcomes of the US Supreme Court decisions going back to 1816. The algorithm correctly predicted 70% of the court’s 28,000 decisions. It outperformed both the popular strategy of always guessing reversal, which was correct 63% of the time, and legal experts, who were correct 66% of the time. The algorithm used 16 features in making its prediction, including the justice, the term, the issue, and the court of origin.

There have been several commercial applications in this space. Blue J Legal uses machine learning to ask its clients questions about their tax filing to “rapidly resolve in advance how courts will rule in new tax situations, based on their unique factual scenarios” using 26 different factors.

London law firm Hodge Jones & Allen uses models to assess the viability of its personal injury caseload. The features used in prediction include the claimant’s demographics, the nature and cause of the injury, and the quality of the defendant’s solicitors, among others.

Contract review

Another time consuming and possibly tedious task for law departments is contract review. Analysis of contracts can identify risks, anomalies, and future financial obligations that could be costly to omit.

Machine learning is a good candidate for this information retrieval/extraction task. It takes a list of clauses to require, accept, and reject in contracts. It then scans all contracts to see if any of these clauses’ variants exist and brings them to the attention of the reviewer.

Kira Systems and LawGeex provide commercial platforms for contract review and analysis.

Free open source software like Apache OpenNLP and NLTK can help data scientists and software engineers in building a commercial grade contract review system.

What machine learning cannot do for Legaltech

Implicit knowledge cannot be easily identified. To help us distinguish between suitable and non-suitable tasks for machine learning, Erik Brynjolfsson and Tom Mitchell identified eight key criteria for suitable tasks. They include tolerance for error, “no need for detailed explanation of how the decision was made”, and “no long chains of logic or reasoning that depend on diverse background knowledge or common sense.”

Keeping these criteria in mind, here are some examples of non-suitable machine learning tasks in Legaltech:

  1. Writing briefs
  2. Giving legal advice
  3. Negotiating deals

Where can legaltech go from here

Investments in Legaltech reached $1 billion in 2018, with about one third of that spent on artificial intelligence. This spending was largely driven by demand from lawyers to automate the mundane tasks they performed, so they could spend their time on more cognitively challenging work. Consumers of legal services also benefit from these new technologies. They are willing to pay for expensive legal advice, but not $200 an hour for routine work.

The science and technology building blocks needed to automate the long list of machine learning suitable tasks in Legaltech are available today. As more investments and entrepreneurial skills rush into this industry, the list will grow even further and faster.

More posts.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>