Introduction
On July 26, 2020, Propublica released a dataset on police discipline records.
I wanted to get a look at the data and do some exploratory analysis.
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gt)
## Warning: package 'gt' was built under R version 3.6.2
library(here)
## here() starts at /Users/samportnow/sam-portnow-website
library(fs)
## Warning: package 'fs' was built under R version 3.6.2
library(skimr)
## Warning: package 'skimr' was built under R version 3.6.2
library(janitor)
## Warning: package 'janitor' was built under R version 3.6.2
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Read in Data
data = read_csv(here::here('content', 'post_data', 'allegations_20200726939.csv'))
{{...