Navigation auf uzh.ch

Suche

Center for Legal Data Science

Useful Databases

The following is a list of freely accessible online databases that contains various legal data.

US Supreme Court Database

scdb.wustl.edu

Legacy Database (1791-1945)

Modern Database (1946-2022)

Information about every case decided by the US Supreme Court between 1791 and today.

Size by Database: Legacy Database Case Centered Data (19681); Legacy Database Justice Centered Data (172213); Modern Database Case Centered Data (13780); Modern Database Justice Centered Data (123447).

Harold J. Spaeth, Lee Epstein, Andrew D. Martin, Jeffrey A. Segal, Theodore J. Ruger, and Sara C. Benesh. 2022 Supreme Court Database, Version 2022 Release 01, URL: http://Supremecourtdatabase.or.

SCD (Swiss Federal Supreme Court Dataset)

zenodo.org

                                                                            

The Swiss Federal Supreme Court Dataset (SCD) provides a record of all 116,000+ cases decided by the Swiss Federal Supreme Court since 2007. The SCD includes 31 variables that document basic case information, the court composition, the area of law, information about the appealed judgment, the parties, the case outcome, and about citations and publication status. The dataset will be updated quarterly until at least 2025 to include the latest judgments and possible expansions. Size: 116,650 cases (expanding in future versions).

Geering Florian, Merane Jakob. (2023), Swiss Federal Supreme Court Dataset (SCD), Zenodo: https://doi.org/10.5281/zenodo.7793043.

Corpus des Deutschen Bundesrechts

Github

zenodo.org

The Corpus of German Federal Law (C-DBR) is a comprehensive collection of the consolidated versions of all laws and ordinances at the federal level. The dataset uses www.gesetze-im-internet.de of the Federal Ministry of Justice as its source. Size: 6687 federal laws and ordinances of the FRG.

Fobbe, Sean. (2023). Corpus des Deutschen Bundesrechts (C-DBR) (2023-01-05), Zenodo: https://doi.org/10.5281/zenodo.7494474.

Schweizer Open Government Data opendata.swiss Opendata.swiss is a central portal for open, freely accessible, data from the Swiss authorities (Open Government Data, OGD). The opendata.swiss portal guarantees users simple and secure central access to data of the federal government, cantons and municipalities. If there is a public interest, data from third parties – federally affiliated companies as well as private actors commissioned by the federal, cantonal or municipal authorities – may also be published. Size: 8792 Data sets (13.4.23).
FSCS (Swiss court judgments)

Github

zenodo.org

Multilingual (German, French, and Italian), diachronic (2000-2020) corpus of 85'000 cases from the Federal Supreme Court of Switzerland (FSCS).

Niklaus Joel, Chalkidis Ilias and Stürmer Matthias (2021) “SwissJudgmentPrediction”. Proceedings of the 2021 Natural Legal Language Processing Workshop (NLLP), Zenodo: 10.5281/zenodo.5529712.

JRC-Acquis

JRC-Acquis Overview

Dataset on emm4u.eu

The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. The EU has 27 Member States and 23 official languages. The Acquis Communautaire texts comprises material in these languages, with the exception of Irish translations. Size: 463792 texts.

Steinberger Ralf, Pouliquen Bruno, Widiger Anna, Ignat Camelia, Erjavec Tomaž, Tufis Dan, Varga Dániel (2006).

ECtHR

Github

Dropbox 

Judicial Decisions of the European Court of Human Rights. Test20 (Art. 2-8, 10-14,18): 804 decisions, test_violations (Art. 2-8, 10-14, 18): 4054 decisions, Train (Art. 2-14,18): 8441 decisions.

Medvedeva Masha, Vols Michel, Wieling Martijn (2019): Github: https://github.com/masha-medvedeva/ECtHR_crystal_ball.

ITC and ICJ Datasets

Overview and book excerpts

Datasets in R Data Format (Version 2.0.0 or later)

Codebook ICJ 

Codebook ITC 

Trademark

ITC, ICJ and Trademark Datasets in: An Introduction to Empirical Legal Research.

ICJ: This database contains information on individual judge’s votes in the International Court of Justice (n=1,560). There are 103 cases from 1947 to 2003. Information about the nature and outcome of the case as well as a number of measures describing the country of the applicant, respondent, and judge are provided. These data were extracted from a larger dataset compiled by Eric A. Posner and Miguel de Figueiredo and available (along with a codebook) on Eric Voeten’s website.

ITC: This dataset contains information on defendants brought up on charges in front of the International Criminal Tribunal for Rwanda (n=50), the International Criminal Tribunal for Yugoslavia (n=160), and the Special Court for Sierra Leone (n=8) from 1994 to 2010. Data are available on the number of guilty counts, type of guilty counts (war crimes, genocide, crimes against humanity), mitigating and aggravating factors at sentencing, and the length of sentence (both initially and on appeal). These data are excerpted from an ongoing database maintained by James David Meernik. The full database and accompaying codebook (excerpted, as applicable, below) are available at: http://www.psci.unt.edu/ meernik/International%20Criminal%20Tribunals%20Website.htm.

Trademark: These data are from a survey conducted by an expert witness in a trademark case brought in 2007 in the Southern District of New York.1 The Trademark Trial and Appeal Board denied Victoria’s Secret’s application to register the mark SO SEXY for its hair care products based on the objections of Sexy Hair Concepts, LLC. Victoria’s Secret appealed this decision and one of their expert witnesses designed a survey to explore whether the word SEXY had attained a secondary meaning in relation to hair care products. ICJ Dataset: 1560 (Individual judge's votes in ICJ). ICT Dataset size: 218 (Information on defendants at the ICT).

Epstein Lee, Martin Andrew D. (2014): An introduction to empirical legal research. Oxford University Press.

European Court of Human Rights Database (ECHRdb) Version 1.0 washington.edu

The ECHRdb is available as a set of downloadable raw data files in .csv, excel and Stata formats enabling researches to analyze the complete set of variables included in the database using a wide variety of data analysis software. The database includes over 70 variables detailing judicial decision patterns (subject matter, violation rate, defendant country, etc.) and organization participation and effects (organization identification, participation rates, types of participation, amicus participation, domestic legal change, etc.). The data are available as three datasets: Comprehensive, Judgment Centered and Participation Centered. These three datasets enable researchers to examine all the variables (Comprehensive) or narrow their analysis to focus on the judgment level patterns (Judgment Centered) or organization participation patterns (Participation Centered). Released 2017. Size: 15147 judgments.

Cichowski & E. Chrun (2017). European Court of Human Rights Database (ECHRdb), Version 1.0 Release 2017: http://depts.washington.edu/echrdb/.

The Swedish High Court (SeHC) Package

Github

The package 'sehc' primarily contains a database on the Swedish high courts, more specifically the Supreme Court (“Högsta domstolen” or “HD”) and the Supreme Administrative Court (“Högsta förvaltningsdomstolen”, previously “Regeringsrätten”, or “HFD”). It contains data on both the judgments (2482 cases) of the Supreme Court (presented for example in cases, opinions), as well as on the individual Justices that have served on the Supreme Court and the Supreme Administrative Court (presented in the table appointments). It also contains a number of handy functions for manipulating datasets and combining variables from multiple datasets to conduct common types of analysis.

Lindholm, Johan; Derlén, Mattias; Naurin, Daniel (2023): Swedish High Court Database (version 0.9.1, 1 May 2023). DOI: 10.5281/zenodo.7883860".

WCLD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts

Github

The WCLD is a curated large dataset of 1.5 million criminal cases from circuit courts in the U.S. state of Wisconsin. The creators used reliable public data from 1970 to 2020 to curate attributes like prior criminal counts and recidivism outcomes. The dataset contains large number of samples from five racial groups, in addition to information like sex and age (at judgment and first offense). Other attributes in this dataset include neighborhood characteristics obtained from census data, detailed types of offense, charge severity, case decisions, sentence lengths, year of filing etc. It also provides pseudo-identifiers for judge, county and zipcode. 

Elliott Ash, Naman Goel, Nianyun Li, Claudia Marangon, Peiyao Sun, WLCD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)