Washington D.C. Bikeshare 2011-2012

August 2025 - October 2025,
Capstone Project of CMU's Foundations of Data Science Certificate
Timeline
Skillset
Data Analysis
Tools
R

About the Dataset

The dataset has a combination of quantitative and qualitative variables including:

Weather Conditions:

  • Quantitative: Temperature, Windspeed, Humidity
  • Qualitative: 4 Types of Weather conditions (Sunny, misty, light rain/snow, heavy rain/snow)

Time of the ride:

  • Quantitative: Year, Month, Day, Hour
  • Qualitative: Workday, Holiday

Ride numbers:

  • Registered/Casual

Research Questions

  • What is the general trend of bike-shares overtime? Does it generally increase, decrease, or stay the sam
  • What are some patterns of the bike-share rides? Does it varies between day types, hours, weather, season, and year?

Methodology

Preliminary Analysis: Weather Conditions & Day Types

Categorical Variables

Start of the game: human body

Day Types:

  • Significant difference btw holiday/non-holiday
  • Different but not so much btw workday/non-workday

Weather Types:

  • Each weather condition is significantly different from each others

Quantitative Weather Variables: Humidity, Tempeterature, Temperature Feel

Start of the game: human body

  • Negative trend between Humidity & Rides
  • Positive trend between Temperature & Rides
  • Positive trend between Windspeed & Rides, though a flatter slope

Dataset Cleaning

Correlation Graph

Temp and TempFeel (feel like temperature) are highly correlated, so I took away TempFeel in the final dataset.

Before:
Raw ride information

After:
Date information is summarized into categorical variables: day type, season, and time of day. Total number of rides in a day are normalized through square root transformation.

Ride Prediction through Linear Regression Analysis

Adjusted R-square of the model is 0.66669, showing that overall, the model is a good fit. Below is an overview of the model output: most variables are highly significant to total ride (sqrt) prediction. Temperature and Time of Day, who has the largest coefficients, are the strongest variables.

Year

  • Significant rideshare growth from 2011 to 2012

Weather

  • Relative to Clear, Sunny weather, small drop for mist/cloudy weather
  • Big drop for light rain/light snow
  • Significant negative impact for heavy rain/snow/ice pellet days

Day Type

  • Weekend rides are not significantly different from weekdays
  • Holidays rides are less than weekdays & weekends

Season

  • Fall has most rides, followed by spring, summer, winter has the least rides

Time of Day

  • Compared to midnight (00-03), afternoon (16-19) has the largest rides (commute peaks), followed by mornings (08-11) (morning commute)

Variable Importance Analysis using Decision Tree Model

Adjusted R-square of the model is 0.66669, showing that overall, the model is a good fit. Below is an overview of the model output: most variables are highly significant to total ride (sqrt) prediction. Temperature and Time of Day, who has the largest coefficients, are the strongest variables.

Level 1 - Time of Day

  • Time of day is the primary driver:Most rides are between 08-23

Level 2 - Temperature

  • For daytime/evening, temperature is next
  • On cool days, late evening (8–11pm) lifts demand
  • Warmer days has significantly more rides than cooler days

Level 3 - Year

  • Lastly, the year is the deciding variable

Conclusion

Time of Day

  • Time-of-day dominates. Overnight (00–07) has the least riders, evening commute (16–19) is the peak, followed by and mid-day/evening (08–11, 12–15, 20–23)
  • Holidays have the least rides. However, weekends aren’t much different from weekdays after consider all the factors.

Weather

  • Warmth increase rides; bad weather suppresses it. Temperature has a large positive effect.
  • humidity, wind, and precipitation (rain/snow categories) pull rides down—precipitation the most.
  • Even after accounting for temperature, Fall has the highest baseline; Spring/Summer/Winter are lower (capturing remaining seasonal structure).

Year

  • Growth over time: 2012 > 2011.