Amateur Urbanist Critique of Work-Live-Ride using 360k Fairfield County Parcels

Posted on January 15, 2025 by R on Redwall Analytics in R bloggers | 0 Comments

[This article was first published on R on Redwall Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

MetroNorth New Haven Line Stations

Introduction

Connecticut has long had some of the highest income and home prices in the US. Although home prices have trailed national rate of appreciation in recent decades, from a high starting point, housing has gotten a lot more expensive in absolute terms and is still out of reach for too many. High housing costs have constrained population growth in many towns, likely slowed income growth by impeding business formation, and contributed to the State’s high costs and taxes. Starting in the 1920’s, many CT Towns started requiring large minimum parcel sizes in zoning, among other regulations, often with exclusionary intent, well documented in On The Line: How Schooling, Housing, and Civil Rights Shaped Hartford and its Suburbs. The effects of these policies have become acute due to the demand surge caused by the COVID-19 pandemic, which gobbled up residual excess supply left by the Global Financial Crisis. In 2021, a group called Desegregate CT Work-Live-Ride began making a series of reform proposals, a central one being Bill #6831 (WLR), to allow towns to opt into “Transit Oriented Development” allowing “by right” development of minimum densities and mixed use development near transit stations. This group consists of dozens of housing advocacy groups with backing from the Regional Plan Association (RPA), also non-profit whose largest contributors Board Members and often Executive leadership have often come from the New York real estate development industry.

One of the Desegregate group’s main points has been that only single family homes are allowed as of right on 90% of parcels in the State, while much less commonly for two family housing (and almost not at all more than that). Also, 80% of those single family parcels have been zoned 1+ acres, constraining housing supply growth increasingly over decades and driving up home ownership costs and sprawl. Finally, many towns have allowed only single family residential zoning around vital transit hubs. These points all are unfortunately true, but in my opinion, there is a bait and switch using the valid large lot issue as the reason we need the inflexible WLR, when densities near MNR have much lower percentages of residential housing and very few large nearby parcels.

The Connecticut Parcel and Cama data was first added to the Portal after laws requiring it passed in 2021, and offering the opportunity to look at the exact Polygon location data for almost 1.3 million parcels. This made me curious to investigate the lot sizes and housing density in those locations using the CT Geodata Portal and to explore what it all would mean for my Town of Greenwich. There are surely a lot of experts with more knowledge of these issues and stronger views on either side mine. I agree with most of the sentiments of the pro-housing reform coalition, but hope on the eve of the legislative session, this post injects what has seemed absent from the contentious discussions thus far. For full disclosure, I grew up in Stamford and live near a station in Greenwich (but not within a potential transit zone), and probably carry some of those biases against what might be done in a suburban location, but also have yet to see a multi-family residential project in my town which I opposed (likely putting me among the most pro-housing residents in my community). There is a lot of data cleaning and coding in this post, so please feel free to skip ahead to Results and Analysis to the tables and charts of findings and parting thoughts.

Collecting, Loading and Cleaning Data

I downloaded the Parcel File Geodatabase and 2024 CAMA property assessment data for each city. There are also separate files for county layers, but used the 2024 Basic Parcel Layers (including all counties) and filtered for Fairfield County and the last few Towns up the MNR line into New Haven County, the economic engine of the State. I get “unexpected geometry” warnings about interior rings, but discovered that changing to type = 3 while importing, converting the 3-dimensional XYZ Multi-Polygons to 2-dimensional XY Polygons for each parcel, which seems to work for my purposes. In the future, I would like to see if I can also look at the building footprints within the parcels, but this was enough for the purposes of this post. This was my first attempt at using GIS data on this scale, and so my knowledge is superficial, and suggestions are welcome.

Load and Clean FF Parcels

# Basic Parcel Geodatabase 
folder <- "~/Documents/Data/ct_state_data/"
file <- paste0(folder, "2024 Basic Parcel Layer.gdb")

# Reading and filtering for Fairfield County towns
ff_parcels <- read_sf(
  dsn = file, 
  query =
    "SELECT * FROM \"Basic_Parcels_2024\" where Town_Name in ('Greenwich', 'Stamford', 'Darien', 'New Canaan', 'Westport', 'Norwalk', 'Bridgeport', 'Shelton', 'West Haven', 'New Haven', 'East Haven', 'Trumbull', 'Easton', 'Redding', 'Bethel', 'Brookfield', 'Danbury', 'Newtown', 'New Fairfield', 'Ridgefield', 'Wilton', 'Weston', 'Stratford', 'Fairfield', 'Monroe', 'Ridgefield', 'Orange', 'Milford', 'Derby')", 
  type = 3)

# Clean parcels
ff_parcels <- ff_parcels[sf::st_is_valid(ff_parcels),]

# Fix Danbury links
ff_parcels[ff_parcels$Town_Name == "Danbury", ]$CAMA_Link <- 
  paste0("18500-", ff_parcels[ff_parcels$Town_Name =="Danbury", ]$Parcel_ID)

There were several thousand parcels which were invalid, and the only way I could manipulate the data with the {sf} package, was by removing them with sf::st_is_valid(). Also, Danbury’s disclosure was missing all property links, but I was fortunately able to extract and parse from other fields. The polygon shapes can be seen inside the NAD83 Connecticut boundaries. I spent some time trying to convert this to a lon-lat Coordinate Reference System (CRS), but in the end, discovered could do everything I needed with the original CRS distances. Below is the Shape of the parcels, which are I believe are called State Plane Coordinates.

# Towns along the MRN property Polygons
print(ff_parcels$Shape)
## Geometry set for 359565 features  (with 10 geometries empty)
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 730512.2 ymin: 554931.1 xmax: 1002166 ymax: 756258.3
## Projected CRS: NAD83 / Connecticut (ftUS)
## First 5 geometries:
## POLYGON ((868586.9 613438.8, 868509.1 613356.3,...
## POLYGON ((868361.3 613493.1, 868247.5 613380, 8...
## POLYGON ((868662.6 613522.2, 868586.9 613438.8,...
## POLYGON ((867755.4 613622.7, 867744.1 613443.5,...
## POLYGON ((867505.7 613668.3, 867622.2 613647, 8...

The Geodatabase unfortunately does not currently have the property assessment data, including the “state use” codes (ie: Commercial, Residential (Single Family, Multi-family), Industrial), which are the only way I could sort residential properties, and within those, if more than one family would be allowed in a location. The problem with the CAMA data is that was not contributed by Towns with consistent standards (as I have often found with data submitted by 169 often small towns probably lacking data infrastructure). While the data could be a lot cleaner and more uniform, my first recommendation would be to clean up the state use codes, and include them in this GIS database, which would streamline things for those working with the raw data instead of the online Parcel Viewer.

Load and Clean CAMA Data

file <- paste0(folder, "/2024_Connecticut_Parcel_and_CAMA_Data_20250111.csv")
cama_data <- data.table::fread(file)
cama_data[, link := re2::re2_replace_all(link, " ", "")]
cama_data <- janitor::clean_names(cama_data)

# Fix some Darien links not matching ff_parcels data
cama_data[
    property_city == "Darien"
    , link := data.table::fifelse(
      re2::re2_detect(link, ".*\\-\\d$"), 
      re2::re2_extract_replace(link, "^(.*)\\-(\\d)$", "\\1-0\\2"),
      link
    )]

# Fix missing Bridgeport links
cama_data[
  property_city=="Bridgeport", 
  cama_site_link := paste("0", cama_site_link)]

# Clean up any whitepace to make sure of joins
cama_data[, cama_site_link := trimws(cama_site_link)]

The full extent of the work in progress nature of this data can be seen in the {skimR} summary below. Some places where the state might clean up this dataset are: eight towns haven’t populated the town_id field, and various other fields are incomplete. There are two zoning fields (zone and zone description), but no uniform structure to these. An academic from Connecticut was Founder of the National Zoning Atlas, an impressive data-oriented project, which led to the initial recognition of the extent of the large lot zoning problem in the State. It doesn’t appear that this data can be downloaded and combined with other data, but tying the parcel data to to minimum zoned lot sizes would be big opportunity. The Zoning Atlas data also doesn’t include Coastal Overlay or other flood plane data, which seems like a significant factor to zoning for many parcels along the MNR line (if you look at the train map above).

For now, the most important field for the purposes of this exercise is state_use which is never NA, but in four towns had an empty string for all properties and overall has over 1600 unique categories just in this subset of Towns. state_use for a single family in most towns is “101”, but there are many others coded as simply “100” or starting with “1” followed by other formats. If I had more time, I might be able to do a better job predicting state_use with state_use_description, but leave that for another time. For this reason, all data summaries below had to be looked at as approximate and by no means the final word on density or mix of land use near stations.

In order to get train station coordinates, I found the USDOT Intermodal Passenger Connectivity Dataset (IPCD) with 15,000 transportation hubs around the US. This offers the option to conduct future analyses around other transport hubs, like bus routes. It took a few tries to get the MNR lines, for example, Cos Cob is mistakenly listed in NY state, and some locations came up more than once if there was both an Amtrak, MNR or bus stop at that location. I generally prefer data.table, but this doesn’t work for data manipulation with sf objects. I was able to get away with it here using only the X, Y coordinates, and then convert back to sf further downstream.

Load Train Station coordinates

file <- 
  paste0(folder, 
  "NTAD_Intermodal_Passenger_Connectivity_Database/Intermodal_Passenger_Connectivity_Database_(IPCD).shp"
  )
trains <- st_read(
  dsn = file, 
  query = "select * from \"Intermodal_Passenger_Connectivity_Database_(IPCD)\" 
    where METRO_AREA LIKE 'Bridgeport-Stamford-Norwalk CT' AND MODE_RAIL = 1")
## Reading query `select * from "Intermodal_Passenger_Connectivity_Database_(IPCD)" 
##     where METRO_AREA LIKE 'Bridgeport-Stamford-Norwalk CT' AND MODE_RAIL = 1'
## from data source `/Users/davidlucey/Documents/Data/ct_state_data/NTAD_Intermodal_Passenger_Connectivity_Database/Intermodal_Passenger_Connectivity_Database_(IPCD).shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 31 features and 52 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -73.62515 ymin: 41.02125 xmax: -73.13083 ymax: 41.3981
## Geodetic CRS:  WGS 84
data.table::setDT(trains)

# Filter unique stations by point_id and select columns
trains <- unique(trains, by="POINT_ID")
trains <- 
  trains[, .(X, Y, POINT_ID, ADDRESS, METRO_AREA, FAC_NAME, CITY, STATE, ZIPCODE)]

DT::datatable(trains)

Code for Filtering and Aggregating Parcels within Ranges

This section has the code to load and find lots within progressively expanding radiuses for each station. My custom function, get_parcels_in_geo_range(), loads data from an IPCD point and filter CAMA points within a specified parameter distance. Again, I’m relatively new to GIS data, so it took me a long time to figure out how to create the station {sf} data.frame in a long-lat point XY dimension and with a CRS which could properly filter the ff_parcels for distance with st::st_is_within_distance(). First I had to load as CRS 4326, then transform to 2234 for distance calculations in accordance with my ff_parcels sf formatting. This took a long time to figure out, and I still don’t completely understand how it is working.

Get Parcels in GEO Range

get_parcels_in_geo_range <- function(point_id, dist_range) {
  
  # Using trains, cama_data, and ff_parcels from the global env
  
  # Get train station
  train_station <- trains[trains$POINT_ID == point_id]
  train_station <- data.frame(lon=train_station$X, lat=train_station$Y)
  
  # Convert to sf
  station <- 
    sf::st_as_sf(train_station, coords = c("lon", "lat"), crs = 4326) %>% 
    st_transform(crs = 2234) 
  
  # Find lots with components within dist_range
  wd <- 
    st_is_within_distance(ff_parcels, station, dist = units::set_units(dist_range, "mile"))

  # Drop any lots where none of the components are within dist_range
  parcels <- ff_parcels %>% filter(lengths(wd) > 0)
  
  # Convert to data.table
  data.table::setDT(parcels)
  
  # Clean names
  parcels <- janitor::clean_names(parcels)
  
  # Keep lots >=0
  parcels <- parcels[shape_area >= 0]
  
  # Copy cama_data to leave in place
  cama_data <- copy(cama_data)
  
  # Clean up any extra spaces in Cama link
  parcels[, cama_link := re2::re2_replace_all(cama_link, " ", "")]
  
  # Join data on "link" = "cama_link"
  parcels <- cama_data[parcels, on = c("link" = "cama_link")]
  
  # Return
  return(parcels)
}

In prepare_town_data_frame(), I had to use trial and error to figure out towns where the standard codes were not picking up residential properties. There were varying codes for single family home, condominium or apartment building, so I did my best to capture them all as shown in the resi_codes vector below. Unfortunately, I had to leave the job of separating single- and multi-family properties for a later time. I went with number of dwellings, which doesn’t distinguish between single and multi-family, but does give an overall sense of density in those zones.

Prepare Town data.frame

prepare_town_data_frame <- function(towns) {
  
  # Remove towns outside FF county without parcels
  towns <- towns[sapply(towns, nrow) > 20]
  
  # Convert list of towns to a single data.table
  towns_df <- rbindlist(towns, fill = TRUE, use.names = TRUE, idcol = "station_name")
  
  # Add an indicator and filter keeping properties thought to be residential derived by trial and error
  resi_codes <- 
    c(800, 801, 802, 803, 805, 899, 100, 101, 102, 103, 122, 104, 105, 108, 109, 172, 1010, 1040, 1110, 1012, 1015, 1050, 1111)
  towns_df[, resi := state_use %in% as.character(resi_codes)]
  towns_df[, resi := fifelse((resi == FALSE & state_use_description == "Residential"), TRUE, resi)]
  towns_df[, resi := fifelse((resi == FALSE & state_use_description == "Commericial"), FALSE, resi)]
  towns_df[is.na(resi), resi := FALSE]
  
  # Drop empty links
  towns_df <- 
    towns_df[!link %in% c("48620-", "77200-", "57600-", "68170-", "86370-", "52980-") ]

  return(towns_df)
}

prepare_station_summary() summarizes all towns based on properties which were not identified, total residential properties, mean acres of residential dwellings, total number of properties above 1/2 acre and 1 acre, and the total acres of residential land within the specified distance. I also calculated the percentage of residential land within the total area in the half mile radius, which for most stations was more than half.

Prepare Station Summary

prepare_station_summary <- function(towns_df) {
  
  # Find unmatched links
  missing <- towns_df[is.na(pid), .(unmatched = .N), station_name]
  
  # Count total resi properties near specified station_name
  total_resi <- towns_df[resi == TRUE, .(total_dwellings = .N), by = station_name]
  
  # Calc mean acres and counts of 1/2 and 1 acre parcels
  ff_summary <- dplyr::distinct(towns_df[resi == TRUE], shape, .keep_all = TRUE)[
    , .(
      avg_acres_dwelling = round(mean(shape_area)/43560, 2), 
      `total_half_ac+` = sum(shape_area>0.5*43560), 
      `total_1_ac+` = sum(shape_area >43560))
    ,  station_name]
  
  # Count total acres and resi acres near specified station_name and dist_range
  resi_land <- dplyr::distinct(towns_df[resi==TRUE], shape, .keep_all = TRUE)[
    , .(resi_ac = sum(shape_area)/43560), station_name]
  total_land <- dplyr::distinct(towns_df, shape, .keep_all = TRUE)[
    , .(total_ac = round(sum(shape_area)/43560, 1)), station_name]
  
  # Aggregate components into final table by station_name
  ff_summary <- merge(ff_summary, total_land, by = "station_name")
  ff_summary <- merge(ff_summary, resi_land, by = "station_name")
  ff_summary <- merge(ff_summary, total_resi, by = "station_name")
  ff_summary <- merge(ff_summary, missing, by = "station_name")
  
  # Round digits
  ff_summary <- ff_summary[
    , pct_resi := round(resi_ac/total_ac, 1)][
      , resi_ac := NULL]
  
  # Return
  return(ff_summary)
}

I wrote get_stations_summary() to iterate over all the train stations at a given dist_range in the select trains data.frame.

Get Stations Summary

Results and Analysis

Below is a summary table of stations within 1/2 mile of stations with several showing more or less than the 500 acres (expected in 1/2 mile radius). This is probably because some stations may be close to the water or include significant portions that are highway, so has a smaller area of land which may be developed for any purpose. Others may have a large property (ie: cemeteries, golf clubs, parks and schools) with an address within, but extending outside the radius. Bethel’s numbers look off, but it is missing most “state use” codes, so it is difficult to accurately account for what parcel usage. For a sense of the overall accuracy for a particular station, the “Unmatched Properties” column shows how many of the parcels were not classified by a “state use” code and were left out of the calculations. For example, I still have a lot of properties near Darien’s 33 West Ave station which are not classified. The proposed legislation has specified 15 to 30 units per acre minimum density as of right within these zones, so demonstrating how significant the change for adopting WLR could be for residents of those neighborhoods. Overall, there are no stations with average residential lots greater than 1 acre at any of the 26 stations.

(#fig:0.5_miles)Summary of Properties within 1/2 mile of stations

Looking at density per acre for the portion of the zone including residential state_use codes, seven out of 26 stations have averages of more than 1 acre per dwelling in the first mile with more after 1.5 miles, much smaller numbers than the 80% average in the state, but still a lot. All of the lower density stations were much further from Grand Central. I don’t know the circumstances at those stations, but it does support the claim that some towns may have allowed sparse density at these valuable locations. Two stations have average lot size well above 1 acre in the first 0.5 miles, but then drop below 1 acre further out. Please hover over the lines to more better discern the station and the density at a given point, or select labels on the right to drill down on any location.

(#fig:average_acres)Average acres per dwelling is low around many stations

The next chart shows the number of 1+ acre lots by mile from stations, showing there are few within 1/2 mile at stations closest to NYC in Greenwich, Stamford and Norwalk for example, but there are lower densities much further up the line. While there are approximately 1,000 1+ acre lots near the 26 MNR stations, there are 10x that at 1.5 miles, and 25x 3 miles further away. Once you get a few miles away from stations, there is a lot more land in some locations, so this is my point that the remedy seems to miss the opportunity to address the concern.

To leave a comment for the author, please follow the link and comment on their blog: R on Redwall Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Collecting, Loading and Cleaning Data

Code for Filtering and Aggregating Parcels within Ranges

Results and Analysis

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)