Chapter 1 Preface

If you’ve read an article about crime or arrests in the United States in the last half century, in most cases it was referring to the FBI’s Uniform Crime Reporting Program Data, otherwise known as UCR data. UCR data is, with the exception of the more detailed data that only covers murders, a monthly number of crimes or arrests reported to a single police agency which is then gathered by the FBI into one file that includes all reporting agencies. It is actually a collection of different datasets, all of which have information about crimes and arrests that occur in a particular jurisdiction. Think of your home town. This data will tell you how many crimes were reported for a small number of crime categories or how many people (broken down by age, sex, and race) were arrested for a (larger) set of crime categories in that city (if the city has multiple police agencies then each agency will report crimes/arrests under their jurisdiction though the largest agency - usually the local police department - will cover the vast majority of crimes/arrests in that city) in a given month.

This is a very broad measure of crime, and its uses in research - or uses for understanding crime at all - is fairly limited. Yet is has become over much of the last century - and will likely remain among researchers for at least the next decade - the most important crime data in the United States.1

UCR data is important for three reasons:

  1. The definitions are standard, and all agencies (tend to) follow them so you can compare across agencies and over time.2
  2. The data is available since 1960 (for most of the datasets) so there is a long period of available data.3
  3. The data is available for most of the 18,000 police agencies in the United States so you can compare across agencies.

More than many other datasets, there will be times when using UCR data that you’ll think “that’s weird.” This book will cover this weirdness and when we think the weirdness is just an odd - but acceptable - quirk of the data, and when it is a sign of a big problem in the data or in that particular variable and that we should avoid using it. For most of this book we’ll be discussing the caveats of the above reasons - or, more directly, why these assumptions are wrong - but these are the reasons why the data is so influential.

1.1 Goal of the book

By the end of each chapter you should have a firm grasp on the dataset that is covered and how to use it properly. However, this book can’t possibly cover every potential use case for the data so make sure to carefully examine the data yourself for your own particular use.

I get a lot of emails from people asking questions about this data so my own goal is to create a single place that answers as many questions as I can about the data. Again, this is among the most commonly used crime datasets and there are still many current papers published with incorrect information about the data (including such simple aspects like what geographic unit data is in and what time unit it is in). So hopefully this book will decrease the number of misconceptions about this data, increasing overall research quality.

Since manuals are boring, I’ll try to include graphs and images to try to alleviate the boredom. That said, I don’t think it’s possible to make it too fun so sorry in advanced. This book is a mix of facts about the data, such as how many years are available, and my opinions about it, such as whether it is reliable. In cases of facts I’ll just say a statement - e.g. “the offenses data is available since 1960.” In cases of opinion I’ll temper the statement by saying something like “in my opinion…” or “I think.”

1.2 Structure of the book

This book will be divided into ten chapters: this chapter, an intro chapter briefly summarizing each dataset and going over overall issues with UCR data, and seven chapters each covering one of the seven UCR datasets. The final chapter will cover county-level UCR data, a commonly used but highly flawed aggregation of UCR data that I recommend against using. Each chapter will follow the same format: we’ll start with a brief summary of the data such as when it first because available and how it can be used. Next we’ll look at how many agencies report their data to this dataset, often looking at how to measure this reporting rate a couple of different ways. Finally, we’ll cover the important variables included in the data and how to use them properly (including not using them at all) - this will be the bulk of each chapter.

1.3 Citing this book

If this data was useful in your research, please cite it. To cite this book, please use the below citation:

Kaplan J (2021). Uniform Crime Reporting (UCR) Program Data: A Practitioner’s Guide. https://ucrbook.com/.

BibTeX format:

@Manual{ucrbook,
  title = {Uniform Crime Reporting (UCR) Program Data: A Practitioner's Guide},
  author = {{Jacob Kaplan}},
  year = {2021},
  url = {https://ucrbook.com/},
}

1.4 Sources of UCR data

There are a few different sources of UCR data available today. First, and probably most commonly used, is the data put together by the National Archive of Criminal Justice Data (NACJD)). This a team out of the University of Michigan who manages a huge number of criminal justice datasets and makes them available to the public. If you have any questions about crime data - UCR or other crime data - I highly recommend you reach out to them for answers. They have a collection of data and excellent documentation available for UCR data available on their site here. One limitation to their data, however, is that each year of data is available as an individual file meaning that you’ll need to concatenate each year together into a single file. Some years also have different column names (generally minor changes like spelling robbery “rob” one year and “robb” the next) which requires more work to standardize before you could concatenate. They also only have data through 2016 which means that the most recent years (UCR data is available through 2019) of data are (as of this writing) unavailable.

Next, and most usable for the general public - but limited for researchers - is the FBI’s official website Crime Data Explorer. On this site you can chose an agency and see annual crime data (remember, UCR data is monthly so this isn’t as detailed as it can be) for certain crimes (and not even all the crimes actually available in the data). This is okay for the general public but only provides a fraction of the data available in the actual data so is really not good for researchers.

Finally, I have my own collection of UCR data available publicly on openICPSR, a site which allows people to submit their data for public access. For each of these datasets I’ve taken the raw data from the FBI (for early years of homicide data this is actually from NACJD since the FBI’s raw data is wrong and can’t be parsed. For later years of homicide data this is from the FBI’s raw data.) and read it into R. Since the data is only available from the FBI as fixed-width ASCII files, I created a setup file (we’ll explain exactly how reading in this kind of data works in the next chapter) and read the data and then very lightly cleaned the data (i.e. only removing extreme outliers like an agency having millions of arsons in a month). For each of these datasets I detail what I’ve done to the data and briefly summarize the data (i.e. a very short version of this book) on the data’s page on openICPSR. The main advantage is that all my data has standard variable names and column names and, for data that is small enough, provide the data as a single file that has all years. For large datasets like the arrest data I break it down into parts of the data and not all years in a single file. The downside is that I don’t provide documentation other than what’s on the openICPSR page and only provide data in R and Stata format. I also have a similar site to the FBI’s Crime Data Explorer but with more variables available - that site is available here.

It’s worth mentioning a final source of UCR information. This is the annual Crimes in the United States report released by the FBI each year around the start of October.4 As an example, here is the website for the 2019 report. In this report is summarized data which in most cases estimates missing data and provides information about national and subnational (though rarely city-level) crime data. As with the FBI’s site, it is only a fraction of the true data available so is not a very useful source of crime data for quality research. Still, this is a very common source of information used by researchers.

1.4.1 Where to find the data used in this book

The data I am using in this book is the cleaned (we’ll discuss in more detail exactly what I did to clean each dataset in the dataset’s chapter, but the short answer is that I did very little) and concatenated data that I put together from the raw data that the FBI releases. That data is available on my website here. I am hosting this book through GitHub which has a maximum file size allowed that is far smaller than these data, so you’ll need to go to my site to download the data; it’s not available through this book’s GitHub repo.

1.6 How to contribute to this book

If you have any questions, suggestions (such as a topic to cover), or find any issues, please make a post on the Issues page for this book on GitHub. On this page you can create a new issue (which is basically just a post on this forum) with a title and a longer description of your issue. You’ll need a GitHub account to make a post. Posting here lets me track issues and respond to your message or alert you when the issue is closed (i.e. I’ve finished or denied the request). Issues are also public so you can see if someone has already posted something similar.

For more minor issues like typos or grammar mistakes, you can edit the book directly through its GitHub page. That’ll make an update for me to accept, which will change the book to include your edit. To do that, click the edit button at the top of the site - the button is highlighted in the below figure. You will need to make a GitHub account to make edits. When you click on that button you’ll be taken to a page that looks like a Word Doc where you can make edits. Make any edits you want and then scroll to the bottom of the page. There you can write a short (please, no more than a sentence or two) description of what you’ve done and then submit the changes for me to review.

The edit button for how to make edits of this book.

Figure 1.1: The edit button for how to make edits of this book.

Please only use the above two methods to contribute or make suggestions about the book. Don’t email me. While it’s a bit more work for you to do it this way, since you’ll need to make a GitHub account if you don’t already have one, it helps me. I wrote this book, in part, to help my career so having evidence that people read it and are contributing to it is important to me. It’s a way to publicly measure the book’s impact.


  1. The FBI has said they will no longer accept UCR data after 2020, instead only accepting the more detailed National Incident-Based Reporting System (NIBRS) data. However, only about half of agencies reported NIBRS data in 2019 and this number decreases steadily for earlier years. This means that UCR data has the longevity that NIBRS doesn’t have, as most agencies have reported for decades, and will still be useful even though the data becomes increasingly outdated.↩︎

  2. We’ll see many examples of when agencies do not follow the definitions, which really limits this data.↩︎

  3. While the original UCR data first reported in 1929, there is only machine-readable data since 1960.↩︎

  4. They also release a report about the first 6-months of the most recent year of data before the October release but this is generally an estimate from a sample of agencies so is far less useful.↩︎

  5. This is far more likely to happen as a result of standard government changing a site and forgetting to update the link rather than intentionally making the manual unavailable.↩︎