BYU Law Launches First Legal Corpus Linguistics Technology Platform

August 28, 2020

Read Outside Coverage about the BYU Law Corpora:

BYU Law, a leading national law school focused on innovation, today announced the launch of its Law and Corpus Linguistics Technology Platform, including three new and historically significant corpora. This first-of-its-kind platform for the legal community significantly advances the field of law and corpus linguistics, a methodology that uses naturally occurring language in large collections of texts called “corpora” to help determine the meaning of words and phrases.

Timed to celebrate Constitution Day, which commemorates the signing of the Constitution, BYU Law’s first corpora to be released on the new platform include the Corpus of Founding Era American English, Corpus of Supreme Court of the United States and Corpus of Early Modern English. These resources are free and available to legal professionals, judges, scholars and the public on the Law & Corpus Linguistics website.

“On this important day in the history of the U.S. Constitution, we are thrilled to release the first corpora featuring all United States Supreme Court rulings (up to the most recent term) and a large number of founding era documents,” said Gordon Smith, Dean, BYU Law. “Inspired by the success of BYU linguistics professor Mark Davies, one of the world’s leading corpus linguists, we recognized opportunities for creating a platform that would significantly benefit scholars, judges and legal practitioners.”

Brigham Young University is home to many of the most important corpora in the world, developed by Professor Davies, including the Corpus of Contemporary American English (COCA), a 560 million word corpus meant to represent general American English, and the Corpus of Historical American English (COHA), a 400+ million word corpus of historical American English, which can help users identify semantic changes and see a word’s increasing or decreasing popularity since 1810. The new Law and Corpus Linguistics Technology Platform builds upon these foundational innovations. Learn more about the law and corpus linguistics project.

BYU Law is committed to pioneering the study of law and corpus linguistics. In 2013, BYU Law offered the first course on law and corpus linguistics in the United States, taught by Stephen Mouritsen. In 2016, BYU Law organized the first academic conference on law and corpus linguistics in partnership with Georgetown Law, and BYU Law has continued to host an annual conference on the topic. Last year, BYU Law created two research fellowships dedicated to law and corpus linguistics. This fall, adjunct BYU Law professor and Associate Chief Justice of the Utah Supreme Court Thomas Lee is teaching a class on law and language with an emphasis on principles of corpus linguistics at Harvard Law School.

“Corpus Linguistics is new to the legal community, and it holds significant and largely unexplored value in the courtroom when evaluating ordinary meaning,” said Justice Lee. “When a case presents a problem of lexical ambiguity, corpus methods offer judges an approach that is empirical and transparent, rather than intuitive and opaque. Early judicial decisions employing this methodology have highlighted these benefits. It is exciting to hear of new developments that will make corpus linguistics more available and accessible to practicing attorneys, judges, students and scholars around the world.”

The new corpora will be especially useful to those who study the meaning of the Constitution. “The method of corpus linguistics, which employs large-scale data sets (corpora) that provide evidence of linguistic practice, provides an important tool for the recovery of the original public meaning of the constitutional text,” said Lawrence Solum, Professor of Law at Georgetown Law and an internationally recognized author and expert in constitutional theory. “The Georgetown Center for the Constitution co-sponsored the first conference on Law and Corpus Linguistics with BYU Law to further explore and exploit innovations surrounding this promising technology, and approach to understanding the meaning of words.”

Designed specifically for lawyers and scholars, the new Law and Corpus Linguistics Technology Platform for linguistic analysis includes:

  • The Corpus of Founding Era American English, which contains over 140 million words, allowing the user to examine context to see how words from the Constitution were used at the time of the founding (1750 – 1799). This provides judges with evidence for the original meaning of the Constitution, even though we are more than 200 years removed from its ratification.
  • The Corpus of Early Modern English, which contains more than 40,000 texts from 1485 to 1800.
  • The Corpus of Supreme Court of the United States contains more than 130 million words from 32,000 Supreme Court documents.

The Law and Corpus Linguistics Technology Platform features a user-friendly interface offering the ability to search these groundbreaking corpora by terms and phrases with filters for year, primary author, genre (legal or non-legal document, court proceeding, speech, diary entry, novel, etc.) and source. The corpora also support collocation searches, which enable the user to gain powerful insights into word meanings and relationships between words.