Predictive Coding: Come On In, The Water's Just Fine

Predictive coding and other forms of computer-assisted review have been touted by many as the solution to the spiraling costs of ediscovery. On February 24, 2012, in da Silva Moore v. Publicis Groupe, No. 11-CV-1279 (S.D.N.Y. Feb. 24, 2012), the Southern District of New York took a dive into the predictive coding pool and found the water just fine. In what appears to be the first judicial opinion approving the use of predictive coding in ediscovery, Magistrate Judge Andrew J. Peck invites the Bar to join him in these previously uncharted waters.

What predictive coding is, and what it is not.

In the most general terms, predictive coding is a process that combines human input and electronic "tools" to find relevant documents within a larger data or document set. Although the terms "predictive coding" and "computer-assisted review" may appear interchangeable at first glance, "computer-assisted" or "technology-assisted review" can actually encompass a number of different search technologies.[i] The general purpose of using any of these technologies in ediscovery is to locate relevant documents while saving time and money when compared with an exhaustive manual review.

When thinking about predictive coding, push any visions of 2001: A Space Odyssey and its computer villain HAL 9000 from your mind. As Magistrate Judge Peck cautions, "this is not a case of machine replacing humans." In a predictive coding process, lawyers can, and must remain, at the wheel:

"Unlike manual review, where the review is done by the most junior staff, computer-assisted coding involves a senior partner (or [small] team) who review and code a 'seed set' of documents. The computer identifies properties of those documents that it uses to code other documents. As the senior reviewer continues to code more sample documents, the computer predicts the reviewer's coding.[ii]"?

By way of example, the predictive coding protocol ordered in da Silva Moore directs the attorneys to first create a "seed set" of relevant and irrelevant documents using a variety of computer-assisted review tools, including classic keyword and Boolean (think "and/or") searching. After the seed set is created, it will be used to "train" the software and the true predictive coding process will begin. The attorneys must then review the software-suggested results, code that set, and repeat the predictive coding process. Notably, in da Silva Moore, "[a]ll of [the] review to create the seed set was done by senior attorneys (not paralegals, staff attorneys or junior associates.)"?

The da Silva Moore Case

In this putative class action, Monique da Silva Moore filed a Complaint against advertising conglomerate Publicis Groupe and its subsidiary MSL Group for gender discrimination and other related claims. During discovery, the parties endeavored to cull down approximately three million electronic documents and ultimately submitted a proposed protocol for searching electronically stored information ("ESI"). The plaintiffs reserved their rights to object to the use of predictive coding altogether, although their vendor is reported to have agreed "in general, that computer-assisted review works, and works better than most alternatives." Of course, the devil is in the details. The plaintiffs expressed concerns about the way in which the defendant planned to implement the predictive coding and sought clarification from the defendant and the Court.

In a detailed Opinion, the Court summarized computer-assisted review generally and offered a number of "lessons for the future." Citing to various well-known scholars and studies, the Court also sought to debunk the "myth" that manual review is superior to computer-assisted review "as statistics clearly show that computerized searches are at least as accurate, if not more so, than manual review." In ordering the predictive coding protocol, the Court also rejected plaintiffs' argument that the acceptance of a predictive coding protocol was contrary to Federal Rule of Evidence 702. The Court reasoned that Rule 702 sets forth an admissibility standard for trial and that "the admissibility of specific emails at trial will depend upon each email itself . . . not how it was found during discovery." The Court concluded that Rule 702 and the Supreme Court's decision in Daubert v. Merrell Dow Pharmaceuticals, Inc.[iii] are "simply not applicable to how documents are searched and found in discovery."

Five Factors To Consider

The Court set forth five factors that it considered in determining that the use of predictive coding was appropriate in da Silva Moore: "(1) the parties' agreement, (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e. linear manual review or key word searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by [defendants]."

Lessons For Litigants And The Bar

The Court cautions that predictive coding is not a "Staples-Easy-Button" and might not be appropriate for every case. Indeed, the detailed protocol set forth in da Silva Moore could not have been easy or inexpensive to design or to complete. Although the Opinion stressed the parties' basic agreement to use predictive coding, recent court filings indicate that the plaintiffs have now filed objections to the protocol with presiding Judge Andrew L. Carter, Jr. Assuming that the Opinion is affirmed by Judge Carter, it raises several questions. If three million documents warrant a predictive coding process, what about a million documents, or far less? Will other courts follow the lead of Magistrate Judge Peck or will this case be an outlier? The true impact of da Silva Moore remains to be seen.

Nonetheless, the invitation to implement predictive coding in ediscovery remains open and is best summed-up in Magistrate Judge Peck's own words: "[w]hat the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the 'first' or 'guinea pig' for judicial acceptance of computer-assisted review."

?[i] Maura R. Grossman and Gordon V. McCormack, Technology-Assisted Review In E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, 17 Rich. J.L. & Tech. 11, 2 (2011) ("A technology-assisted review process may involve, in whole or in part, the use of one or more approaches including, but not limited to keyword search, Boolean search, conceptual search, clustering, machine learning, relevance ranking, and sampling.")?

?[ii] Andrew Peck, Search, Forward, L. Tech. News, Oct. 2011, at 25, 29.

?[iii] 509 U.S. 579 (1993).


Related Attorneys