Quick Facts
- Category: Health & Medicine
- Published: 2026-05-19 16:21:02
- How to Understand the Impact of Subquadratic's $29M Seed Round and 12M-Token Context Window
- BREAKING: PS5 Hack Unlocks Linux – Steam Gaming Now Possible on PlayStation 5
- Kubernetes v1.36 Overhauls Job Resource Management: Mutable Pod Resources Now Beta
- Cloudflare's Browser Run Gets a Massive Speed and Scalability Boost, Now Running on Company's Own Containers
- Subquadratic's Bold AI Efficiency Claim: 1,000x Improvement or Hype? - A Q&A Breakdown
Introduction
Recent research using 66 million GPS mobility records revealed a stark inequity in New York City's public transit system: white neighborhoods enjoy far better access to jobs, banks, healthcare, parks, and schools within a one-hour commute than Black and Hispanic communities do. This guide walks you through the methodology used in that “PNAS Nexus” study, enabling you to replicate the analysis or apply it to your own city. By following these steps, you’ll learn how to leverage geospatial big data to uncover systemic biases in transit infrastructure.

What You Need
- GPS mobility dataset – anonymized location pings (e.g., from mobile apps or transportation providers) covering at least one typical week. The study used 66 million records for NYC.
- Census tract shapefiles – boundaries for neighborhoods, along with demographic data (race/ethnicity percentages) from the U.S. Census Bureau or American Community Survey.
- Public transit schedule and network data – General Transit Feed Specification (GTFS) files for buses, subways, and trains in your study area.
- Destinations of interest – geocoded locations of jobs (e.g., employer data), banks, healthcare facilities, parks, and schools. These can be sourced from open data portals or commercial databases.
- Geographic Information System (GIS) software – QGIS (free) or ArcGIS; plus a programming environment like Python (with libraries: pandas, geopandas, networkx, osmnx) for large-scale processing.
- Computing resources – a machine with enough RAM (at least 16 GB) and storage to handle millions of GPS points.
Step-by-Step Instructions
- Step 1: Collect and Preprocess GPS Mobility Data
Obtain a representative sample of GPS pings from anonymous users. Clean the data by removing outliers (e.g., pings with unrealistic speed or accuracy), filtering for trips that occurred on weekdays during typical commute hours (7–9 AM and 4–7 PM), and mapping each ping to its nearest census tract. Aggregate the data to create origin-destination matrices: for each census tract, calculate the number of trips starting from that tract.
- Step 2: Build a Time-Distance Transit Network
Import GTFS files into your GIS or Python environment. Use a routing engine (e.g., OpenTripPlanner, GraphHopper, or “osmnx” with “networkx”) to model travel times between all pairs of census tracts via public transit. Include walking time to/from stops, waiting time, and in-vehicle travel. Set a maximum one-hour threshold, as in the original study.
- Step 3: Geocode Destination Amenities
Compile geolocated lists of job sites (counts of employment per location), banks, healthcare facilities (hospitals, clinics), parks (public green spaces), and schools (K–12, universities). Standardize addresses using a geocoding service (e.g., Google Maps Geocoding API, Census Geocoder) and assign each amenity to its census tract.
- Step 4: Calculate Accessible Destinations from Each Tract
For each census tract, query the transit network model to find all destinations reachable within 60 minutes. For each type of amenity, sum the total number of opportunities (e.g., total jobs, number of banks) within that one-hour travel time window. Store these accessibility scores per tract.
- Step 5: Segment Tracts by Dominant Race/Ethnicity
Overlay the accessibility results with demographic data from the census. Classify each tract as “Predominantly White” (>50% non-Hispanic white), “Predominantly Black” (>50% Black), or “Predominantly Hispanic” (>50% Hispanic). Alternatively, use continuous measures of racial composition to perform regression analysis.
- Step 6: Compare and Visualize Disparities
Calculate the average accessibility score (jobs, banks, etc.) for each demographic group. Use bar charts or choropleth maps to display the differences. As the original study found, white-majority tracts consistently have higher counts of reachable job sites, banks, healthcare, parks, and schools than Black- or Hispanic-majority tracts. Perform statistical tests (e.g., t-test or ANOVA) to confirm significance.
Tips for Accurate Analysis
- Consider temporal variations: Transit schedules change on weekends and evenings. Use consistent time windows (e.g., 8 AM departure) across all comparisons.
- Account for data representativeness: GPS mobility data may underrepresent low-income individuals who lack smartphones. Combine with census commute data to validate results.
- Normalize by population: Raw counts of destinations favor denser areas. Normalize by population or area to avoid confounding density with equity.
- Use open data: Many cities provide free GTFS feeds and open government data. Check portals like NYC Open Data or the U.S. Department of Transportation’s Bureau of Transportation Statistics.
- Replicate sensitivity analyses: Vary the travel time threshold (e.g., 45 vs. 60 minutes) to see if disparities persist.
- Interpret with caution: Correlation does not imply causation. The study reveals a pattern, but further investigation is needed to understand policy levers (e.g., bus route prioritization, new subway lines).