Harvard Opens Up Its Massive Caselaw Access Project

Almost exactly three years ago, we wrote about the launch of an ambitious project by Harvard Law School to scan all federal and state court cases and get them online (for free) in a machine readable format (not just PDFs!), with open APIs for anyone to use. And, earlier this week, case.law officially launched, with 6.4 million cases, some going back as far as 1658. There are still some limitations — some placed on the project by its funding partner, Ravel, which was acquired by LexisNexis last year (though, the structure of the deal will mean some of these restrictions will likely decrease over time).

Also, the focus right now is really on providing this setup as a tool for others to build on, rather than as a straight up interface for anyone to use. As it stands, you can either access data via the site’s API, or by doing bulk downloads. Of course, the bulk downloads are, unfortunately, part of what’s limited by the Ravel/LexisNexis data. Bulk downloads are available for cases in Illinois and Arkansas, but that’s only because both of those states already make cases available online. Still, even with the Ravel/LexisNexis limitation, individual users can download up to 500 cases per day.

The real question is what will others build with the API. The site has launched with four sample applications that are all pretty cool.

  • H2O is a tool that law professors can use to easily create casebooks for students in various areas of law. Anything published on H2O gets a Creative Commons license and can then be shared widely. I wonder if professors like Eric Goldman, who offers an Internet Law Casebook, or James Grimmelmann, who has a different Internet Law Casebook, will eventually port them over to a platform like H2O.
  • A wordcloud app that currently shows the “most used words” in California cases in various years. Here, for example, are the word clouds in California cases from 1871… and 2012. See if you can tell which one’s which.
   
  • Caselaw Limericks that appears to randomly generate what it believes is a rhyming limerick from the case law. Here’s what I got:

Her son Julius is a confirmed thief.
He did not turn over a new leaf.
The vessel, not.
the parking lot.
Respondent concedes this in its brief.

    The quality overall is… a bit mixed. But it’s fun.
  • And, finally, in time for Halloween, Witchcraft in Law, which totals up cases that cite “witchcraft” by state.

Hopefully this inspires a lot more on the development side as well.

Permalink | Comments | Email This Story

Techdirt.