# Load libraries
import os
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import rioxarray as rioxr
The 2017 Thomas Fire started on December 4, 2017 and was not finally declared extinguished until January 12, 2018. It burned 281,893 acres of Santa Barbara and Ventura county, destroyed 1,063 structures, and resulted in the deaths of 1 civilian and 1 firefighter. The fire had far reaching impacts on the surrounding communities both during and after the fire eventually burned out. This analysis attempts to demonstrate the far reaching impacts of the blaze by using Python to visualize the fire’s burn scar, and how air quality was affected before, during, and after the fire.
More information about this anlysis, the code behind it, and data files can be found on github.
Learning Objectives
- Manipulating raster data with rioxarray
- Manipulating vector data with geopandas
- Manipluating data to create accurate and appealing visulizations with matplotlib.pyplot
- Creating an effective and efficent workflow
About the Data
Historic California Fires GeoDatabase: This dataset includes information about all recorded wildifires within the state of California that, according to CalFire, burn “≥10 acres timber, ≥50 acres brush, ≥300 acres grass, damages or destroys three or more structures or does $300,000 worth of damage, or results in loss of life.”. Accessed: November 21, 2024
Landsat Data: A simplified collection of bands (red, green, blue, near-infrared and shortwave infrared) from the Landsat Collection 2 Level-2 atmosperically corrected surface reflectance data, collected by the Landsat 8 satellite. Accessed: November 21, 2024
Air Quality Index (AQI) Data: Air quality index data from the US Environmental Protection Agency (EPA).
Import Libraries
You will need to begin your analysis by loading relevant Python packages.
Load Data
Thomas Fire Boundary
The perimeter of the Thomas Fire was aquired by filtering the Historic California Fires GeoDatabase by year (2017) and fire name (Thomas). This code can be found at the github repository linked above.
# Load Thomas Fire boundary data
= gpd.read_file(os.path.join('data', 'thomas_fire.shp')) thomas_fire
Landsat Data
The landsat
data used for this analysis was pre-cleaned by our course instructor to include only the Santa Barbara and Ventura County areas.
# Load landsat data
= rioxr.open_rasterio(os.path.join('data', 'landsat8-2018-01-26-sb-simplified.nc')) landsat
# Display landsat contents
landsat
<xarray.Dataset> Dimensions: (y: 731, x: 870, band: 1) Coordinates: * y (y) float64 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06 * x (x) float64 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05 * band (band) int64 1 spatial_ref int64 0 Data variables: red (band, y, x) float64 ... green (band, y, x) float64 ... blue (band, y, x) float64 ... nir08 (band, y, x) float64 ... swir22 (band, y, x) float64 ...
- y: 731
- x: 870
- band: 1
- y(y)float643.952e+06 3.952e+06 ... 3.755e+06
- axis :
- Y
- crs :
- EPSG:32611
- long_name :
- y coordinate of projection
- resolution :
- -30
- standard_name :
- projection_y_coordinate
- units :
- metre
- _FillValue :
- nan
array([3952395., 3952125., 3951855., ..., 3755835., 3755565., 3755295.])
- x(x)float641.213e+05 1.216e+05 ... 3.559e+05
- axis :
- X
- crs :
- EPSG:32611
- long_name :
- x coordinate of projection
- resolution :
- 30
- standard_name :
- projection_x_coordinate
- units :
- metre
- _FillValue :
- nan
array([121305., 121575., 121845., ..., 355395., 355665., 355935.])
- band(band)int641
array([1])
- spatial_ref()int640
- crs_wkt :
- PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-117],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32611"]]
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- WGS 84
- horizontal_datum_name :
- World Geodetic System 1984
- projected_crs_name :
- WGS 84 / UTM zone 11N
- grid_mapping_name :
- transverse_mercator
- latitude_of_projection_origin :
- 0.0
- longitude_of_central_meridian :
- -117.0
- false_easting :
- 500000.0
- false_northing :
- 0.0
- scale_factor_at_central_meridian :
- 0.9996
- spatial_ref :
- PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-117],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32611"]]
- GeoTransform :
- 121170.0 270.0 0.0 3952530.0 0.0 -270.0
array(0)
- red(band, y, x)float64...
- add_offset :
- 0.0
- coordinates :
- time
- scale_factor :
- 1.0
- _FillValue :
- 0.0
[635970 values with dtype=float64]
- green(band, y, x)float64...
- add_offset :
- 0.0
- coordinates :
- time
- scale_factor :
- 1.0
- _FillValue :
- 0.0
[635970 values with dtype=float64]
- blue(band, y, x)float64...
- add_offset :
- 0.0
- coordinates :
- time
- scale_factor :
- 1.0
- _FillValue :
- 0.0
[635970 values with dtype=float64]
- nir08(band, y, x)float64...
- add_offset :
- 0.0
- coordinates :
- time
- scale_factor :
- 1.0
- _FillValue :
- 0.0
[635970 values with dtype=float64]
- swir22(band, y, x)float64...
- add_offset :
- 0.0
- coordinates :
- time
- scale_factor :
- 1.0
- _FillValue :
- 0.0
[635970 values with dtype=float64]
The landsat
data has three dimensions (x, y, band), and five variables (red, green, blue, nir08, swir22). The band dimension contains no information and is making the dataset needlessly three dimensional. We will remove the band to make the data 2-dimensional and easier to work with.
# Remove band dimension
# Remove coordinates associated to band
= landsat.squeeze()
landsat = landsat.drop_vars('band')
landsat print(landsat.dims, landsat.coords)
Frozen({'y': 731, 'x': 870}) Coordinates:
* y (y) float64 3.952e+06 3.952e+06 ... 3.756e+06 3.755e+06
* x (x) float64 1.213e+05 1.216e+05 ... 3.557e+05 3.559e+05
spatial_ref int64 0
True Color Image
WE will first be creating a true color image of Santa Barbara County. This means creating a map from the visible bands of light (Red, Green, and Blue), which will result in an image that is similar to what we can see with our own eyes in the real world. This can help us easily interpret large land and atmospheric features.
# Plot
= plt.subplots(figsize=(10, 8))
fig, ax
# Create a true color (RGB) image of Santa Barbara County
# Select the "red", "green", and "blue" variables
'red', 'green', 'blue']].to_array().plot.imshow()
landsat[[
# Add title
"True Color Image of Santa Barbara County and Surrounding Areas")
ax.set_title(
# Remove axes ticks
ax.set_xticks([])
ax.set_yticks([])
# Remove axes labels
"")
ax.set_xlabel("")
ax.set_ylabel(
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Yikes! We have a white map, with no colors! Did we do something wrong?
Lets try setting the robust parameter to true and see if that helps.
# Plot
= plt.subplots(figsize=(10, 8))
fig, ax
# Create a true color (RGB) image of Santa Barbara County
# Set the `robust` parameter to `True`
'red', 'green', 'blue']].to_array().plot.imshow(robust=True)
landsat[[
# Add title
"True Color Image of Santa Barbara County and Surrounding Areas")
ax.set_title(
# Remove axes ticks
ax.set_xticks([])
ax.set_yticks([])
# Remove axes labels
"")
ax.set_xlabel("")
ax.set_ylabel(
plt.show()
That is much better!
The cloud RGB values are outliers compared to the values of the rest of the image. This causes the rest of the values to be squished when plotting which results in not being able to see anything in the resulting image. The robust parameter pulls weight away from these outliers, allowing the rest of the image to scale properly.
False Color Image
The purpose of a false color image is to assign bands of light not visible to the human eye to the visible spectrum, where we can create an image that allows a burn scar to stand out amongst healthy vegetation. Using near infrared (NIR) bands of light is an excellent way to map vegetation because healthy vegeation tends to reflect higher levels of NIR bands than dead or dying vegetation. In this false color image we have assigned the shortwave infrared band to the red channel, which means bare ground and dead vegetation is going to appear as red on our map. We have assigned the NIR band to the green channel, which means healthy vegetation will appear as green on our map.
# Plot
= plt.subplots(figsize=(10, 8))
fig, ax
# Create a false color image of Santa Barbara County
'swir22', 'nir08', 'red']].to_array().plot.imshow(robust=True)
landsat[[
# Add title
"False Color Image of Santa Barbara County and Surrounding Areas")
ax.set_title(
# Remove axes ticks
ax.set_xticks([])
ax.set_yticks([])
# Remove axes labels
"")
ax.set_xlabel("")
ax.set_ylabel(
plt.show()
False Color Map of the 2017 Thomas Fire
Now that we have a false color image of Santa Barbara county and the surrounding area, we need to add the boundary of the 2017 Thomas Fire to see if it lines up with what we think the false color image is telling us about where there is burned vegetation.
But first we must check that the CRS of the thomas fire shapefile matches the CRS of the landsat
data.
# Confirm the CRS of the data match
if thomas_fire.crs == landsat.rio.crs:
print("The CRS match.")
else:
= thomas_fire.to_crs(landsat.rio.crs)
thomas_fire print("The CRS match.")
The CRS match.
Now that we have confirmed the CRS match, we can add the Thomas Fire boundary to our false color image.
# Plot
= plt.subplots(figsize=(10, 8))
fig, ax
# Add landsat data
'swir22', 'nir08', 'red']].to_array().plot.imshow(ax=ax,
landsat[[=True)
robust
# Add Thomas Fire Perimeter
=ax,
thomas_fire.geometry.boundary.plot(ax='darkred',
color=1.5,
linewidth="Fire Perimeter")
label
# Add legend
='small')
ax.legend(fontsize
# Add title
"False Color Image of 2017 Thomas Fire Burn Scar")
ax.set_title(
# Remove axes ticks
ax.set_xticks([])
ax.set_yticks([])
# Remove axes labels
"")
ax.set_xlabel("")
ax.set_ylabel(
# Add access date
0.1, 0.05,
plt.figtext("Accessed: November 21, 2024",
=8,
fontsize="grey")
color
plt.show()
Visualizing AQI during the 2017 Thomas Fire in Santa Barbara County
In the following analysis we will use Air Quality Index (AQI) from the Environmental Protection Agency (EPA) to visualize the impact on Santa Barbara County air quality by the 2017 Thomas Fire.
We first load the daily AQI data from 2017 and 2018 and combine the two data frames into a single dataframe. This will allow us to more easily visulaize our data. After we have combined our data frames, we can filter for Santa Barbara County and keep only the columns we need in our resulting data frame.
# Read in data
= pd.read_csv('data/daily_aqi_by_county_2017.csv',
aqi_17 =['Date'], # Set the index to be the `Date` column
index_col=['Date']) # Update the `Date` column to a `pandas.datetime` object
parse_dates= pd.read_csv('data/daily_aqi_by_county_2018.csv',
aqi_18 =['Date'],
index_col=['Date'])
parse_dates
# Combine the `aqi_17` and `aqi_18` dataframes into a single dataframe
= pd.concat([aqi_17, aqi_18])
aqi
# Simplify column names
= (aqi.columns
aqi.columns str.lower()
.str.replace(' ','_')
.
)
# Select data only from Santa Barbara County
= (aqi[aqi['county_name'] == 'Santa Barbara']
aqi_sb # Remove the `state_name`, `county_name`, `state_code`, and `county_code` columns
= ['state_name', 'county_name', 'state_code', 'county_code'])
.drop(columns )
We can now calculate a 5 day rolling average of our air quality data in Santa Barbara County and add it as a new column in our aqi_sb
data frame.
# Calculate AQI rolling average over 5 days and add to a new column
'five_day_average'] = aqi_sb['aqi'].rolling('5D').mean() aqi_sb[
Create a plot of our AQI five day rolling average and the unmanipulated AQI.
# Line plot
=['aqi', 'five_day_average'],
(aqi_sb.plot(y='AQI Pre and Post the Thomas Fire in December 2017',
title ='Month, Year',
xlabel='Air Quality Index (AQI)',
ylabel= {'aqi':'#9ED8DB',
color 'five_day_average': '#467599'
}'AQI','Five Day Rolling Average'])
).legend([ )
Higher AQI indicates poor air quality and we can clearly see a spike in AQI during December 2017 when the Thomas Fire was burning. The five day rolling average helps to smooth our data and eliminate the showiness of outlier AQI data points. Even with this smoothing, there is a distinguishable spike in AQI in December 2017. This figure also indicates there was not a long term impact to AQI after the Thomas Fire was extinguished.
References
CalFire. Updated April 2023. https://www.fire.ca.gov/what-we-do/fire-resource-assessment-program/fire-perimeters. Accessed: November 16, 2024
Landsat Collection 2 Level-2, Microsoft Open Source, Matt McFarland, Rob Emanuele, Dan Morris, & Tom Augspurger. (2022). microsoft/PlanetaryComputer: October 2022 (2022.10.28). Zenodo. https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2. Accessed: November 16, 2024
U.S. Enivornmental Protection Agency. (2024). Air Quality Index Daily Values Report: July 2024 (2024.07.23). https://www.epa.gov/outdoor-air-quality-data/air-quality-index-daily-values-report. Accessed: October 22, 2024