Court records
The Marshall Project scraped criminal case records from the Cuyahoga County Clerk of Courts’ Search Selection and Entry site, which provides web-based access to basic information about criminal cases. It includes defendant information like race and home address, along with case dockets that include descriptions of events like sentencing and links to original PDF filing documents. The scraper has run on and off from May 2021, through January 2022.
This data was loaded into a PostGIS database. Defendants’ home addresses were geocoded with geocod.io and joined with geographies from other sources such as Cuyahoga Board of Elections precincts (to compare with election results), and U.S. Census places (to compare with population demographics). These spatial joins [1] [2] form the bulk of the analysis used in this piece.
The court provided us with a list of case numbers spanning 2016 to 2021. We used this list to audit our database of scraped cases and ensure our scrape was a complete record of all cases seen by the court in this timeframe. Over 98% of our data matched the case numbers provided by the court. The handful of mismatches represented cases that had either been sealed, or, we believe, recently expunged. As the details of expunged cases are no longer open to the public, the court was not free to confirm the status of certain cases that were missing from the court’s list of case numbers but captured by our scraper.
To calculate the disparity in outcomes for common charges like theft and drug possession, multiple techniques were used. One approach used a natural language classifier to determine outcome. Another used a simple flag to determine if a case ended in the defendant being sent to prison and applied more restrictive criteria, only considering cases with a single count of the charge. A third approach employed a dataset of cases from 2009 to 2019 obtained and processed by Lawstata that incl [3] [4] udes a count of defendants’ prior cases. Using this dataset, we applied similar criteria, and filtered based on a maximum number of priors, looking at scenarios where the defendant had a maximum of zero priors, as well as scenarios with one and two prior cases. All techniques show similar variation between judges.
Voting data
Voting data and precinct boundaries were obtained from the Cuyahoga County Board of Elections for the 2016, 2018 and 2020 general elections. Top-level voting figures such as who ran and total votes cast were also cross-referenced with Ballotpedia’s election results pages ( 2016, 2018, 2020).
To calculate the drop-off of people who voted for president but did not vote for judicial candidates, we calculated the average drop-off among judicial races with two or more candidates. Calculating the average accounts for varying participation between races to provide a measure of general voter participation in selecting judges. Uncontested races were excluded to avoid distorting participation due to races where voters only have a single choice.
The 2020 county precinct map was used to identify the precinct where a defendant’s address was located.
Incarceration Rates
To calculate the number of incarcerated people from Cuyahoga County, we used institutional census reports from the Ohio Department of Corrections that detail the gender and race of incarcerated people broken down by the county where they were convicted. The latest data is from January 2021.
Other data sources
Demographic data is drawn from the American Community Survey’s five-year estimates for 2016-2020 to best match the 2016-2021 timeframe used throughout the analysis. To ascertain the adult population in Cuyahoga County voting precincts, we used 2020 decennial census data from the U.S. Census Bureau.
This story is published in partnership with The Marshall Project, a nonprofit newsroom covering the US criminal justice system. You can learn more about Testify here, or sign up to learn more about our Cleveland reporting here.