eDiscovery AI Blog

Jim Sullivan

The Defensibility of Artificial Intelligence in eDiscovery

Let’s talk about Defensibility! As everyone would expect, we are hearing some concerns about AI being defensible in predictive coding, and people who might be hesitant to try it out. Let’s put it all out on the table.

First, I want to start with what I think are four very important points:

  1. We are talking about situations where you produce or withhold one or more documents without human review. If you are using human review, but AI to QC or assist in that process, you really, really don’t have anything to be concerned about.
  2. Technology-assisted review (TAR) has been used for over a decade. There have been thousands (if not tens of thousands) of matters that have used TAR to produce documents without review, and there hasn’t been a single successful challenge. While it is important to be diligent and defensible, we have to look at the facts and realize that nobody is interested in having this fight.
  3. When TAR was first introduced, we were jumping from a human-only review to a computer review. That’s a big step. However, now we are jumping from old-computer review to new-computer review. That is a much smaller step. If nobody challenged your old process, they really aren’t going to want to challenge this one.
  4. We are assuming you are using best practices to validate your output before production.

Now that is out of the way, how do we address defensibility? Let’s put ourselves in the shoes of someone challenging a TAR protocol. What arguments would you make?

Argument: Producing documents without review is not approved or defensible (sorry, but I had to start with an easy one).

Response: We have been producing documents without review for over a decade. THOUSANDS of cases. Producing documents without human review has been accepted for years by courts and govt agencies. The FTC, DOJ, SEC, you name it. This was being used even before any court approved the use. Since the Da Silva Moore ruling, the courts have affirmatively accepted producing documents without review, so long as it is done correctly. I don’t think anyone would seriously argue that computer-assisted review is not appropriate for electronic discovery.

Argument: AI review has not been approved by the courts.

Response: This is false. In Da Silva Moore, Judge Peck states “By computer-assisted coding, I mean tools (different vendors use different names) that use sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e., training by) a human reviewer.” He further states “I may be less interested in the science behind the ‘black box’ of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision.”
Let’s talk about tech for a minute: There has never been a single approved “TAR Algorithm.” Vendors and tools have used a variety of machine learning models since the beginning. Many of the most popular tools used Support Vector Machines (SVM) as their model of choice. Others used a Logistic Regression algorithm. Some vendors even tried to get away with using Latent Semantic Indexing (LSI) and calling it predictive coding (they were much more successful than I expected). AI tools today are using Large Language Models (LLMs).
These machine learning models do have differences, but LLMs are significantly better than the others by any measurable standard. I don’t know if I think “you can only use models that aren’t good” is a good legal argument.

Argument: The Da Silva Moore ruling states that training is based on interaction with a human reviewer. AI doesn’t work like this.

Response: This is a misunderstanding of AI. AI does work like this. How is the machine going to know what you are looking for if you don’t train it?
In traditional TAR, the system is trained by providing thousands of positive and negative examples to help the system differentiate between characteristics that make a document relevant to any given issue. It isn’t able to work if you can’t find enough training examples.
In AI-powered TAR, the system is trained by providing clear instructions to help the system understand what is relevant.
Let’s compare these two methods of training.
Suppose we are playing charades. I need to relay to you what qualifies as a “Relevant” document in my case. Which option would be more effective:

  • I can’t talk, write, or use any body language. My only method of communicating is by providing you with documents in two piles. One pile contains responsive documents and the other pile contains non-responsive documents. And you can’t read the words on the documents. You are limited to finding patterns in the words to determine which words and features are more prevalent in the pile of responsive examples vs the pile of non-responsive examples.
  • I can write down on paper what I’m looking for and hand you the piece of paper.

It isn’t that you aren’t training the system with a human reviewer. It’s just that the training is so much easier that it can be done in seconds instead of weeks.

Are you going to argue a system must be difficult to be defensible? It can’t be good if nobody bleeds? If a lawyer isn’t able to bill at least 60 hours to train it’s not acceptable?


If you have made it this far, you’re probably invested enough in this space to know that even though this is a fun exercise, none of the above arguments matter. The only thing that matters is how you validate the results and demonstrate high-quality output. If someone does a great job with validation and can show solid results, you probably aren’t going to win even if the training was done by a bunch of 3rd graders.

Let’s move on to arguments you would actually use to win this type of claim:

Argument: AI is an unknown technology and is susceptible to hallucination.

Response: Look me in the eye and tell me you know more about Logistic Regression algorithms than you do about Large Language Models. Do you think you can explain Logistic Regression to a judge but not LLMs? It doesn’t matter, so long as the results are good.
Hallucination is a real issue with AI, but it just isn’t a significant factor here. With eDiscovery AI, we are using LLMs to make one of 4 potential classifications on a document (Relevant, Not Relevant, Needs Further Review, and Tech Issue). Hallucinations are more common when you are generating content rather than classifying documents. In the worst-case scenario, a hallucination could only result in a miscategorization of a document. And because we are validating the results, we are able to confirm this isn’t an issue.


Argument: AI isn’t good enough at classifying documents to replace humans.

Response: Oh, my sweet summer child. Just wait until you see how good it is. I’ve seen many people defend 70% recall and 50% precision. I don’t think anyone will have trouble defending scores in the 90s.
This actually might open the door to the opposite effect. How are you going to defend 70% recall when tools like this exist?

What do you think? Is there anything I’m missing? How would you attack someone’s use of AI in document review?

Related Article

Recent Posts

Categories

Follow Us