HW 03
coding principles
user-defined functions
CCSO Bookings Data Link & Data Preview
Chicago Food Inspections Data Link & Data Preview
Urbana Rental Inspections Data Link & Data Preview
The plot
function in R is quite powerful and flexible. We explain by example but first, let’s read in the CCSO Bookings Data.
jail <- read.csv("https://uofi.box.com/shared/static/9elozjsg99bgcb7gb546wlfr3r2gc9b7.csv", stringsAsFactors = FALSE)
apply(jail,2,mode)
## BOOKING.DATE BOOKING.NUMBER BOOKING.TIME
## "character" "character" "character"
## CUSTODY.CLASS EMPLOYMENT.STATUS INCARCERATION.REASON
## "character" "character" "character"
## JACKET.NUMBER JACKET.TYPE PRISONER.TYPE
## "character" "character" "character"
## RELEASED.DATE RELEASED.REASON RELEASED.TIME
## "character" "character" "character"
## CHARGE.STATUTE CRIME.CODE STATUTE.TYPE
## "character" "character" "character"
## CITY RACE SEX
## "character" "character" "character"
## STATE ZIP.CODE CITIZENSHIP
## "character" "character" "character"
## MARITIAL.STATUS MILITARY OCCUPATION
## "character" "character" "character"
## SCHOOL ARREST.AGENCY Age.at.Arrest
## "character" "character" "character"
## Age.at.Release Booking.Date.Time Release.Date.Time
## "character" "character" "character"
## Days.in.Jail Hours Minutes
## "character" "character" "character"
## Seconds X
## "character" "character"
apply(jail,2,class)
## BOOKING.DATE BOOKING.NUMBER BOOKING.TIME
## "character" "character" "character"
## CUSTODY.CLASS EMPLOYMENT.STATUS INCARCERATION.REASON
## "character" "character" "character"
## JACKET.NUMBER JACKET.TYPE PRISONER.TYPE
## "character" "character" "character"
## RELEASED.DATE RELEASED.REASON RELEASED.TIME
## "character" "character" "character"
## CHARGE.STATUTE CRIME.CODE STATUTE.TYPE
## "character" "character" "character"
## CITY RACE SEX
## "character" "character" "character"
## STATE ZIP.CODE CITIZENSHIP
## "character" "character" "character"
## MARITIAL.STATUS MILITARY OCCUPATION
## "character" "character" "character"
## SCHOOL ARREST.AGENCY Age.at.Arrest
## "character" "character" "character"
## Age.at.Release Booking.Date.Time Release.Date.Time
## "character" "character" "character"
## Days.in.Jail Hours Minutes
## "character" "character" "character"
## Seconds X
## "character" "character"
jail$days.in.jail <- as.numeric(jail$Days.in.Jail)
## Warning: NAs introduced by coercion
jail$hours.in.jail <- as.numeric(jail$Hours)
## Warning: NAs introduced by coercion
jail$minutes.in.jail <- as.numeric(jail$Minutes)
## Warning: NAs introduced by coercion
jail$seconds.in.jail <- as.numeric(jail$Seconds)
## Warning: NAs introduced by coercion
colnames(jail)
## [1] "BOOKING.DATE" "BOOKING.NUMBER" "BOOKING.TIME"
## [4] "CUSTODY.CLASS" "EMPLOYMENT.STATUS" "INCARCERATION.REASON"
## [7] "JACKET.NUMBER" "JACKET.TYPE" "PRISONER.TYPE"
## [10] "RELEASED.DATE" "RELEASED.REASON" "RELEASED.TIME"
## [13] "CHARGE.STATUTE" "CRIME.CODE" "STATUTE.TYPE"
## [16] "CITY" "RACE" "SEX"
## [19] "STATE" "ZIP.CODE" "CITIZENSHIP"
## [22] "MARITIAL.STATUS" "MILITARY" "OCCUPATION"
## [25] "SCHOOL" "ARREST.AGENCY" "Age.at.Arrest"
## [28] "Age.at.Release" "Booking.Date.Time" "Release.Date.Time"
## [31] "Days.in.Jail" "Hours" "Minutes"
## [34] "Seconds" "X" "days.in.jail"
## [37] "hours.in.jail" "minutes.in.jail" "seconds.in.jail"
jL <- jail[,-c(31:35)]
jail <- jL
dim(jail)
## [1] 67764 34
head(jail, 2)
## BOOKING.DATE BOOKING.NUMBER BOOKING.TIME CUSTODY.CLASS
## 1 1/1/2011 2.011e+11 1:05:19 Sentenced IDOC (CCSO ONLY)
## 2 1/1/2011 2.011e+11 1:05:19 Sentenced IDOC (CCSO ONLY)
## EMPLOYMENT.STATUS INCARCERATION.REASON JACKET.NUMBER JACKET.TYPE
## 1 Employed - Full Time Arrest - Without Warrant 31830 A
## 2 Employed - Full Time Arrest - Without Warrant 31830 A
## PRISONER.TYPE RELEASED.DATE
## 1 Misdemeanor Arraignment 2/28/2011
## 2 Misdemeanor Arraignment 2/28/2011
## RELEASED.REASON RELEASED.TIME
## 1 Sentenced (transfer) to State Corrections Y 1:17:41
## 2 Sentenced (transfer) to State Corrections Y 1:17:41
## CHARGE.STATUTE CRIME.CODE STATUTE.TYPE CITY RACE SEX
## 1 720-5/12-3 BATTERY NA RANTOUL White Male
## 2 730-5/5-6-4 PROBATION VIOLATION NA RANTOUL White Male
## STATE ZIP.CODE CITIZENSHIP MARITIAL.STATUS MILITARY
## 1 ILLINOIS 61866 US Single None
## 2 ILLINOIS 61866 US Single None
## OCCUPATION
## 1 SERVICE PERSONNEL(HOTEL,RESTAURANT,NIGHT CLUB)
## 2 SERVICE PERSONNEL(HOTEL,RESTAURANT,NIGHT CLUB)
## SCHOOL ARREST.AGENCY
## 1 Graduated from high school Champaign County Sherriff's Office
## 2 Graduated from high school Champaign County Sherriff's Office
## Age.at.Arrest Age.at.Release Booking.Date.Time Release.Date.Time
## 1 42 42 1/01/11 01:05:19 2/28/11 01:17:41
## 2 42 42 1/01/11 01:05:19 2/28/11 01:17:41
## days.in.jail hours.in.jail minutes.in.jail seconds.in.jail
## 1 58 0 12 22
## 2 58 0 12 22
colnames(jail) <- tolower(colnames(jail))
Typically we plot two numeric vectors
x<- jail$age.at.arrest
y<- jail$age.at.release
plot(x,y)
The two numeric vectors could be in a single matrix
mat<- matrix(c(jail$age.at.arrest,jail$age.at.release), ncol=2)
plot(mat)
We could create a time series plot (aka “index plot” aka “series plot”) using a single numeric vector
plot(jail$days.in.jail)
We could take advantage of factors in the data to plot a bar plot or box plots per level of the factor
jr<-factor(jail$race)
levels(jr)
## [1] "" "Asian/Pacific Islander"
## [3] "Black" "Hispanic"
## [5] "Native American" "Unknown"
## [7] "White" "White (Hispanic)"
plot(jr)
plot(jr, jail$days.in.jail)
Or we could use the table
function to explicitly count the frequency for the levels or categories to show a bar plot with barplot
sort(table(jail$crime.code), decreasing=TRUE)[1:5]
##
## OTHER TRAFFIC OFFENSES SUSPENDED OR REVOKED DRIVERS LICENSE
## 5983 5680
## OTHER CRIMINAL OFFENSES MISC JAIL CODE
## 5158 4686
## DOMESTIC BATTERY
## 4421
ccs <- names(sort(table(jail$crime.code), decreasing=TRUE)[1:5])
jcc <- jail[which(jail$crime.code==ccs),]
## Warning in jail$crime.code == ccs: longer object length is not a multiple
## of shorter object length
t1 <- table(jcc$crime.code, jcc$race)
barplot(colSums(t1))
barplot(rowSums(t1))
barplot(t1)
barplot(t1, beside = TRUE)
For data frames, we can leverage that structure to produce multiple plots with one plot
function
plot(days.in.jail ~ age.at.arrest + age.at.release, data=jail)
jdf<-data.frame(jail$age.at.arrest, jail$age.at.release, jail$race)
plot(jdf)
One popular multivariate plot is with the pairs
function
pairs(jdf)
But if we wanted to see the pairwise relationship between age at arrest and days in jail for each racial category (assuming we only had black and white categories), we could use a coplot
jdff <- data.frame(ageatarrest=jail$age.at.arrest, daysspent=jail$days.in.jail, race=jail$race)
jdff<-jdff[which(jdff$race=="Black" | jdff$race=="White"),]
jdff$race<-factor(jdff$race)
table(jdff$race)
##
## Black White
## 36319 25175
colnames(jdff)
## [1] "ageatarrest" "daysspent" "race"
coplot(daysspent ~ ageatarrest | race, data=jdff)
##
## Missing rows: 50854, 52604, 53780, 53781, 56394, 56646, 56647, 56648, 57044, 58304, 58305, 58306, 58307, 58308, 58309, 58310, 58311, 58312, 59835, 59836, 59837, 59838, 59839, 59911, 59912, 59913, 59914, 59915, 60118, 60119, 60120, 60123, 60124, 60125, 60126, 60328, 60343, 60344, 60345, 60912, 60913, 60914, 60915, 60916, 61199, 61200, 61201, 61370, 61371, 61372, 61373, 61374, 61375, 61386, 61387, 61388, 61491, 61492
There are various arguments that can be aded to plots to add visual clarity and to remove things that are distracting. Here’s a short list.
Now let’s go back and add basic arguments that may help improve our interpretation of the plots.
plot(mat, xlab="Age at Arrest", ylab="Age at Release", main="A Basic Scatter Plot", pch="+")
plot(jail$days.in.jail, xlab="Index", ylab="Days in Jail", main="A Series Plot", type="l", lty=2, lwd=2)
plot(jr, jail$days.in.jail, main="Distributions of Days Spent in Jail Per Racial Group")