GenAI Review and Legal Standards for Acceptance

By Tony Reichenberger

Many eDiscovery professionals are still hesitant to adopt Gen AI review because there hasn’t yet been a clear court ruling explicitly approving its use. Attorneys tend to be risk-averse, and often prefer to wait until others take the first step, just as we saw with the early days of Technology Assisted Review (TAR).

However, this hesitation isn’t really justified. Gen AI meets the same standards that have long been accepted for other eDiscovery technologies, and it often produces even better results. In fact, federal judges are already beginning to treat AI as just another tool to help lawyers work through massive volumes of discovery documents, viewing it no differently than earlier technologies.

Using AI on Projects Requiring No Approval 

There are many different types of reviews and various technologies that can be used during them.  Some situations, such as those with no subsequent productions and a focus on getting your side of the litigation prepared for trial, require no need for approval in using Gen AI at all. Some examples of these kinds of use cases: 

  • Productions received. A received production where attorneys must go through a “document dump” and find the important documents for the matter.  Using AI to target specific important documents and issues can go a long way to saving time, cost and effort, and quickly identify the documents you want. AI is far more beneficial than TAR in this instance because most of the documents you are receiving are already determined to be relevant and already have high scores.  AI can find the best documents for you in that regard, just by asking specifically for what you are seeking. 
  • Deposition Preparation (Depo Prep).  Depo Prep can be a time-consuming, cumbersome process, and quite often, attorneys are up against a deadline to locate and review the most significant documents  prior to the deposition. The sooner you can find what documents are pertinent to a particular person to be deposed and their role in the matter, the better you can organize what questions to ask, what issues to focus on and how they are important to the overall case. Early Case Intelligence™ solutions such as eDiscovery AI’s Insight ECI™ can provide a deeper understanding into an individual’s role in a matter, identify documents where they were involved and demonstrate who else they were communicating with regularly. 
  • Early Investigations. Often projects relate to internal investigations, prior to litigation, where the question of whether something wrong occurred is still in doubt, and internal legal departments just want to get a grasp of what may have occurred.  Using AI to quickly identify important documents is hugely beneficial in instances like this for various reasons.  AI can identify the important documents around these issues quickly.  AI also requires no training, so on projects like this, AI distinctly has a leg up on TAR in that it can find those important documents based only on the instruction, and not on multiple ambiguous, similar documents where what the target documents are implied by TAR. Important documents can be pushed to the investigating attorney by AI as opposed to having attorneys scavenge for them. Because AI can grasp grammar and context better than TAR and other applications, AI can also find the smoking gun documents where someone briefly assents much sooner and more often on projects. Using AI on investigations can be the difference between dumping tens of thousands of dollars reviewing volumes of documents from a defensive position only to find the smoking gun too late or settling early because you found the crucial document right away. 
  • PII Identification and Redaction.  Projects that contain Personally Identifiable Information (PII) or various forms of Health Information can be really time consuming, and the information can be easily missed by human reviewers.  AI features can help identify the documents requiring redaction and move those to redaction teams earlier in the process. Automated AI Redaction features are also available that identify PII and redact them out with the proper specifications for the project. 
  • PII Identification and Extraction. Data breaches can be large, costly projects.  Notifying individuals that their private information may have been compromised is usually a two-step process, first identifying the documents containing the PII and then extracting that information to notify the individuals as required by law. This extraction portion is extremely cumbersome if done manually and can take large review teams an exceptional amount of time to complete. That’s problematic in another way, in that data breach projects often have deadlines that must be met, and if reviewers are slow or not likely to meet those deadlines, the only solution is to throw even more reviewers at the problem, compounding inconsistent results. AI can identify and extract PII from large volumes of documents in very little time at all.  It can also provide a fuller, consistent, more comprehensive result than human reviewers can do.

But many projects must adhere to conventional ESI and discovery requirements. The use of AI on these projects is relatively new but not particularly controversial in the context of how it is to be incorporated and used for ESI matters.  To understand why, we must go back to when TAR was first introduced and approved.

The Current Standard: Da Silva Moore and Early TAR Cases in ESI Protocols

TAR gained wider acceptance in part because it became a central topic in negotiations within broader discussions on the ESI protocol. In large-scale matters, neither side was eager to incur the significant cost of having review attorneys manually tag vast volumes of documents. Likewise, receiving parties had no interest in wading through massive data sets in search of elusive, important relevant materials. TAR offered a solution: it reduced the number of documents requiring review and prioritized the most likely relevant ones based on scoring. To understand how TAR negotiations influence AI-driven review today, it’s important to revisit how TAR first gained judicial approval.

In the early 2010s, the legal community embraced TAR’s potential to reduce the size and cost of document review. But widespread adoption was slow—just as many hesitate with AI today—because no one wanted to be the first to defend its use in court. That changed with the landmark decision in Da Silva Moore v. Publicis Groupe & MSL Group, 868 F. Supp. 2d 137 (S.D.N.Y. 2012). In that case, Judge Andrew Peck described how in any challenge to TAR use the analysis would be based on the process used and the eventual results: 

“[I]f the use of predictive coding is challenged…I will want to know what was done and why that produced defensible results. I may be less interested in the science behind the “black box” of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision…Proof of a valid “process,” including quality control testing, also will be important.”  (Da Silva Moore at 144, citing Andrew Peck, Search, Forward, Law Tech. News, Oct. 2011, at 25, 29).

Thus, Judge Peck emphasized two key standards for evaluating TAR:

  1. The process and review methodology used, including quality control measures; and
  2. The results achieved and their defensibility.

Judge Peck’s decision in Da Silva Moore opened the door for widespread adoption by practitioners. It helped establish TAR as a near-standard practice across most vendor projects, and a practical necessity for large-scale matters.

These principles were reaffirmed in Rio Tinto PLC v. Vale S.A., 14-CV-3042 (S.D.N.Y. Mar. 3, 2015), where Judge Peck described it as “black letter law” that courts will permit TAR when proposed by the producing party. He also stressed the importance of cooperation and transparency, statistical validation of recall and precision, and a proportional approach tailored to the case’s size and complexity. Subsequent rulings have further reinforced this framework. In Winfield v. City of New York, No. 15-CV-05236 (S.D.N.Y. Nov. 27, 2017), the court approved continued use of TAR provided that the sampling methods and statistical metrics supported defensible recall and precision. Similarly, In re Domestic Airline Travel Antitrust Litigation, MDL No. 2656 (D.D.C. Sept. 13, 2018), emphasized that it is not the specific TAR tool or technology that matters, but the soundness of the underlying methodology—including vendor transparency, sampling rigor, and clearly defined benchmarks for recall and precision.

The TAR review process that eventually developed begins by isolating documents for review through methods like custodian identification, deduplication, email threading, de-NISTing, and keyword searches1.  Documents lacking extracted text are excluded from TAR and typically reviewed manually. The remaining documents are indexed and reviewed using TAR, which prioritizes relevant material by assigning higher scores through training. TAR’s effectiveness is measured using recall (how comprehensively relevant documents are identified in total) and precision (how accurately irrelevant documents are excluded). These metrics are assessed through random sampling—either of the overall document population (control set) or the documents TAR labeled irrelevant (elusion set). Key negotiation points in a TAR protocol typically include sample size (usually targeting 95% confidence with a 2–5% margin of error), the sample set used (control vs. elusion), acceptable recall thresholds (commonly 70–75%), treatment of foreign language documents, and other procedural details. 

One of the great things about Judge Peck’s decision is that the methods and procedures he outlined for TAR can analyze the effectiveness of any review regardless of the technology employed2.  The standard is the same whether it is keyword searches or Gen AI; indeed, it should surprise nobody that the methods to assess AI reviews are the exact same as those to assess TAR. Use of random sample control sets help quantify and monitor recall and precision. Quality Control, reviewing and analyzing results and assessing quality helps demonstrate defensibility. Elusion samples are conducted on null sets to ensure nothing relevant is missed. 

Recent Cases Reflective of Da Silva Moore and Rio Tinto

As AI applications continue to enter and expand across the legal technology market, more litigants are incorporating references to AI in their ESI protocol language. These references are generally framed in the same spirit of Judge Peck’s guidance in Da Silva Moore and Rio Tinto, emphasizing collaboration, statistical sampling, defensible workflows (such as quality control reviews), and overall transparency.

A notable example is EEOC v. Tesla, Inc., 727 F. Supp. 3d 875 (N.D. Cal. 2024). In that case, the parties submitted a stipulated discovery order that explicitly contemplated the potential use of AI in document review. While the mention was brief—almost an afterthought—it marked one of the first instances where AI was formally acknowledged in an ESI protocol. This approach reflects the growing consensus that AI is simply another review tool governed by the same principles Judge Peck articulated:

C. Technology Assisted Review/Predictive Coding.

The parties also recognize the availability of a variety of search tools and methodologies, including but not limited to Technology Assisted Review (TAR) and Gen AI tools. Tesla has notified plaintiff’s counsel that it may use TAR and/or Gen AI tools to further analyze documents for relevance after search terms are used to narrow the starting document universe to exclude documents not likely to be relevant. If the producing party intends to use TAR, Gen AI, or similar advanced analytics as a substitute for attorney responsiveness review, the parties agree to meet and confer in good faith to attempt to reach agreement about the technology and process that a producing party proposes to use to identify responsive ESI and a statistically sound methodology to determine the recall rate and other measures of the effectiveness of the tool and processes in identifying responsive documents. The producing party shall make disclosures regarding its tools and processes necessary to make the meet and confers meaningful and for the requesting party to negotiate on an informed basis.

If, prior to commencement of negotiations over search terms, a producing party intends, or is likely, to use both search terms and TAR (or similar advanced analytics), it shall notify the requesting party prior to commencement of search term negotiations. If a producing party decides to employ TAR or similar advanced analytics during, or after the conclusion of, negotiations over search terms, it shall promptly notify the requesting party before commencing any review.

The acknowledgement of Gen AI as on the same level as TAR or similar advanced analytics implies that they are to be treated roughly the same in ESI contexts, as per Da Silva Moore. Everything that follows is common language frequently reflective of TAR on ESI/review projects, including the need for transparency and cooperation, collaborating on search methodologies and validation metrics, and documenting its use.  Although the order doesn’t spell out recall or precision, the language structure anticipates the need for statistical sampling, documentation as to how AI/TAR was utilized and disclosure of validation protocols.  

A similar decision was made in The Estee Lauder Company Securities Litigation, Case Number 1:24-cv-00716, Document 1, Filed 02/01/24. In that matter, U.S. District Court Judge Arun Subramanian issued an order allowing the use of AI provided the parties disclosed the review application used and there was agreement by the parties on a search protocol.

The Future: Higher Standards of Discovery

Where AI oriented ESI standards go from here is likely to follow the same Da Silva Moore, TAR route, just substituting machine learning oriented TAR for Gen AI.  But whereas machine learning standards developed their own achievement standards to what practical metrics could be attained, Gen AI will likely break that mold establishing new, higher standards. 

To put that into context, AI has advantages over machine learning/TAR that suggests it may be more suitable and effectively better for review than TAR. 

The standards of quality of AI far surpass the standards TAR generally uses.  For TAR projects, it is common to use a 70-75% recall mark, with as high a precision as one can achieve at that point as the goal. There is a typical trade-off between recall and precision; if you cast a wider net (increase recall) you often collaterally capture more false positives (decrease precision). A typical TAR project, with a fully mature and well-trained classifier may look like this chart below, with the blue line showing recall, the orange line showing precision, the purple line the f-measure (the average of recall and precision) and the vertical red line at a score of 45 showing where 75% recall is obtained.  Generally speaking, 75% precision is a very good result for that point, and is actually better than what we see on most TAR projects. 

In working with AI however, the ability of Gen AI to capture a larger portion of the relevant population by instruction (as opposed to laborious training) is far better. It is not uncommon to get results like this below, where 75% recall occurs much higher in score (in this case 84), with a much greater precision; in fact, 90% recall and precision are commonly achieved on many AI review projects: 

What this means is that one of two things will occur; either review is complete when 75% recall is obtained, and additional relevant documents that could easily be included with little extra effort are discarded, or more likely, the recall standard increases so long as precision remains high and little additional effort is needed.  The plain language of FRCP 26(b)(2)(B) is instructive, pretty much lays out the higher standard: 

Specific Limitations on Electronically Stored Information. A party need not provide discovery of electronically stored information from sources that the party identifies as not reasonably accessible because of undue burden or cost. On motion to compel discovery or for a protective order, the party from whom discovery is sought must show that the information is not reasonably accessible because of undue burden or cost.

As there is no or very little additional burden to attain the higher recall standard, that’s what likely will result.  As practitioners of law, this is a good thing, as it means a more comprehensive result for our clients, a more transparent result for the court, and a better, more complete picture that better represents the truth. It also means faster and cheaper results for quicker resolutions and eased timelines. 

There are still other advantages AI has over TAR that supplement better results for ESI review projects: 

  • Because TAR only runs on the extracted text of a document, other filetypes such as images, audio files and scanned in documents must typically be manually reviewed.  AI can run across images, audio files and scanned documents without issue.  AI can also analyze handwritten notes and classify them, which TAR cannot do. Therefore, it is not necessary to exclude those types of documents commonly removed from a TAR review on an AI review. 
  • Foreign language reviews can be included into TAR, but require either a separately trained TAR model, and/or an adequately trained volume of foreign language documents to find relevant documents within that language.  AI review takes your prompt, interprets and applies that prompt as if it is in the native foreign language, classifies the documents as appropriate and provides the results in its original language. 

As AI technology continues to be adopted and these common sense, practical applications of it become the norm, they will come to be expected on ESI reviews regardless of the technology used.  Because machine learning can’t accommodate them, using AI for review will become a standard operating procedure. 

Conclusion

AI technology is advancing and improving results around the world, including in the legal solutions marketplace. Based on established case law and recent cases utilizing Gen AI incorporating those legal benchmarks, the standards that have been in place since TAR first came on the market are easily applied to AI and will likely continue to be. AI is just another tool in the toolbox for practitioners to use for their ESI review projects. The qualifications of it’s use on projects also are included: collaboration with opposing counsel, defensibility and rigorous statistical sampling analysis, transparency and proportionality. 


1One item often discussed during negotiation is whether keywords will be used or not. One of the advantages of TAR is that if applied broadly, it is much more accurate than keywords, being more inclusive of relevant material and cutting out more false positives, therefore making keyword filtering redundant, less effectual and unnecessary.  Nonetheless, there are cost considerations involved, as it can be expensive to process and host more data if keywords are not used.  If irrelevant documents could easily be removed without detrimentally impacting the overall relevant document set, search terms are often be agreed to by the sides in addition to TAR use. 

2In fact, many of these methods are included in the Sedona Conference’s Commentary on Achieving Quality in the E-Discovery Process, 2009.  

Download the Whitepaper

GenAI Review and Legal Standards for Acceptance

Scroll to Top