An improvement and continuation of work done during Winter 2025 as part of a graduate statistical consulting course at Portland State University in collaboration with TREC (Transportation Research and Education Center). The goal of the project was to compare two different datasets, PORTAL and INRIX, using the Maximum Mean Discrepancy (MMD) statistic to measure distributional similarity.

Executive Summary

Understanding the key differences between data collection approaches and their resulting datasets is crucial for transportation planning and traffic management. This project aims to compare travel-time data from two prominent sources, PORTAL and INRIX, using the Maximum Mean Discrepancy (MMD) statistic. Each dataset records travel-time on highway segments. However, they differ in approaches to collecting those recordings, where fixed point highway sensors are used in one dataset (PORTAL) and OEM probe data from moving vehicles are used in the other (INRIX). By analyzing the travel-time distributions from these datasets, we can identify significant differences and similarities that may impact their use in various applications.

The Data

Data was collected from two sources: PORTAL—public data managed by the Transportation Research and Education Center (TREC) at Portland State University (PSU)—and INRIX—a commercial provider. The PORTAL data is aggregated from sensors maintained by Oregon Department of Transportation (ODOT) and Washington State Department of Transportation (WSDOT). The INRIX data is collected from GPS-enabled vehicles, mobile devices, and other third-party sensors.

The Analysis

An unbiased estimator of MMD with the Radial Basis Function (RBF) kernel was applied to various views of the travel-time data in order to ask and attempt to answer several questions about how the distributions may or may not differ. The focus of analysis was constrained to a subset of 15-minute interval travel-time readings from 2019 through 2024 on I-5, I-205, and SR-14 in the Portland, Oregon - Vancouver, Washington Metropolitan Area.

Missing Intervals

The datasets differed substantially in data completeness, with PORTAL containing notable gaps at several stations and INRIX having comparatively few missing intervals. To ensure that these differences did not disproportionately influence the distributional comparisons, three complementary strategies were used to handle missing readings: standardization with zero-filling, masking during computation, and a combined standardized-masking approach. These methods allowed the analysis to separate effects due to missingness from genuine distributional differences.

Key Findings

Results indicate that the travel-time readings from PORTAL and INRIX can be considered to come from a different distribution as measured by the MMD statistic. This suggests that the two data sources may capture different aspects of traffic conditions—potentially due to differences in data collection methods and sensor types. However, the experiments also revealed a trend towards increasing similarity between the datasets over time, hinting at possible improvements in data collection, sensor coverage, or processing techniques.

Code and Report

Code located on GitHub: whitham-powell/TREC-PORTALvsINRIX-MMD

Link to the final report: Measuring Distributional Similarity via Maximum Mean Discrepancy (PDF)

E Whitham-Powell

Summer 2025 Graduate Research Assistantship

Executive Summary

The Data

The Analysis

Missing Intervals

Key Findings

Code and Report

Executive Summary

The Data

The Analysis

Missing Intervals

Key Findings

Code and Report

social