Skip to content

Code, data sources, and workpapers associated with "Dynamic Grid Management Technologies Reduce Electric-Power Sector Wildfire Adaptation Costs"

License

Notifications You must be signed in to change notification settings

junbozhao/electric-sector-wildfire

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

electric-sector-wildfire

Code, source data, and workpapers associated with "Dynamic Grid Management Technologies Reduce Electric-Power Sector Wildfire Adaptation Costs"

Detailed documentation of analysis code is being updated below.

Overview

This repository contains the analysis code and data inputs to replicate cost-effectiveness estimates of wildfire adadptation measures in the electric-power sector. The code is organized into the following sections:

  1. Setup
  2. Load and clean data
  3. Estimate risk models
  4. Estimate structures burned
  5. Analyze cost-effectiveness

Setup

The analysis is run using RStudio statistical software.

Master file

The first script to run is 1 master. This script loads required packages, specifies relevant file paths, color schemes, and the main plotting theme. Further below, a list of the sequence of scripts is commented out. It is recommended to open each of these scripts individually and run them individually. Some scripts can take a signficant amount of time to run.

This file also contains a switch labeled R SWITCH_NEW_LOAD. When set to TRUE some processes that need only be run once will be run. By default this value is set to FALSE so the user does not have to re-run some of the initial steps that take a significant amount of time to run.

Miscellaneous functions

The next script is 1 misc_functions. This script defines miscellaneous functions for regression formatting and confidence intervals. It also provides a crosswalk for CALFIRE regions to counties.

Initiate spatial data

The next script 1 iniate_spatial_boundaries imports the base GIS spatial files, such as county, utility, and state boundaries. This file also imports the GIS data on distribution circuits. This file needs only to be initialized once.

Load and clean data

This next section of code imports datasets on ignitions, vegetation, weather, distribution circuit characteristics, and circuit-level mitigation measures. The final script in this section compiles these different data sources inton one dataset for regression and cost-effectiveness analysis.

Conductor covariates

This script 2a load_conductor_covariates imports data about the age, the length, and the wind speed rating of distribution circuits.

Hardening data

This script 2b load_hardening_data imports circuit-year data on vegetation management and system hardening. To convert the circuit-year data to the circuit-day level, weekly project progress reports provided by the utility are used. For one year, vegetation management is provided in different units. The utility reports trees worked instead of miles of vegetation management completed for this year. A crosswalk is created using data from prior years when both miles of vegetation management and trees worked are provided at the circuit level.

Weather data

This script 2c load_weather_covariates imports gridded weather data from GridMET. Initially, the script creates a crosswalk that intersects the locations of distribution circuits with the grid cells that the raster weather data is provided in. This step only needs to be run once, or alternatively it can be pre-loaded when SWITCH_NEW_LOAD is set to FALSE. If running from scratch (SWITCH_NEW_LOAD set to TRUE), the process will rely on parallel processing to improve processing time. This step will automatically detect the total number of cores available on the user's machine and utilize all the user's available cores:

  # Cores for parallel processing
  no_cores <- detectCores()
  registerDoParallel(no_cores)

This approach to parallel processing is used several times throughout the analysis. The user should be aware that these steps will utilize all available CPU on the user's machine. To avoid this, the user can set no_cores to a value less than the user's machine's total cores, but this will come at the cost of slower computation time.

The next section of this script will loop through each weather variable (e.g., vapor pressure deficit, wind speed) and each year of weather data to calculate an average daily value across each grid cell that a given distribution intersects with.

Load vegetation

This script 2d load_vegetation imports data from USGS LANDFIRE on tree canopy height. The tree canopy data is provided in a raster format. This script calculates average and max canopy height for each distribution circuit.

Load ignitions

This script 2e load_ignitions imports data sources from the California Public Utilities Commission and data produced from wildfire mitigaiton plans on powerline-caused ignitions. Over time, the annual files containing ignition data have changed formats, so data cleaning is required when compiling the different sources together.

Some years contain the name of the distribution circuit associated with the ignition in addition to the latitude and longitude of the ignition. In other cases, only the lat/long information is provided as a means of identifying the location. For the cases where only lat/long information is provided, this script finds the nearest distribution circuit to the point location. If no distribution circuit is in close proximity, the ignition is discarded.

Another step of this script is to identify which ignitions occurred when fast-trip settings were enabled. Lists of these ignitions are provided for 2021, 2022, and 2023 based on documents filed in the wildfire mitigation plan proceedings.

Load PSPS and fast-trip

The next script 2f load_PSPS.R imports the workbook that contains data on PSPS events. The names of circuits that experienced PSPS events appear to manually entered into the source data, so some data cleaning is needed to match the names of circuits that were de-energized with the GIS data on distribution circuits.

Data on fast-trip events is also loaded in from monthly reports on fast-trip events provided to the CPUC. Some fast-trip data is also sourced from WMP filings.

Because the format of the PSPS and fast-trip data includes a start time and a restoration time, the script pro-rates the outage duration to different circuit-days if the outage extends past midnight to a second (or more) day.

PSPS events can occur at a more granular spatial and temporal resolution than our unit of analysis, which is a circuit-day. In some cases, we observe ignitions on circuit-days when PSPS events occur. This script attempts to identify ignitions that occur outside of PSPS windows by using the timestamp of the ignition event and the PSPS event.

Compile dataset

The last script 2g compile_dataset.R in this section compiles all the various datasets described above into a single dataset for analysis.

As is the case with the PSPS event data, some cleaning of circuit names is required when merging the ignition data with the GIS data. This is needed because it appears some circuit names are manually entered into the source data, creating some typos and some inconsistencies in circuit name conventions.

The script also transforms the hardening and vegetation management data to a cumulative basis anchored to 2018. Therefore if a circuit received 10 miles of undergrounding in 2019 and 5 miles in 2020, then its ending value in 2020 will reflect 15 miles.

The units of some treatment variables are transformed in this script. For example, the length of the circuit is measured in hundreds of miles rather than miles.

Two versions of the main analysis dataset are exported in this step. One of them, "regression_dataset_full.RData", includes additional columns that are not critical to the remaining analysis. The other one, "regression_dataset_clean_full.RData", drops some of the unneeded columns for faster loading and computation.

Estimate risk models

This section of the analysis primarily addresses the statistical models used to predict baseline ignition probability and to estimate the mitigation effectiveness coefficients.

Ignition risk model

The first script here, 3a risk_score.R, trains and tests the random forest model that predicts ignitions at the circuit-day level.

The first part of the function cleans up the main analysis dataset and prepares it for training and testing. The user has the option to focus on equipment-caused on ignitions if interested, but the primary workflow of this analysis focuses only on vegetation-caused ignitions (see discussion in paper).

Various hyperparameters are tuned, and the user can specify alternate inputs if desired. Down-sampling is performed to create better balance between positive and negative ignition events. The default approach uses 3-repeat 10-fold cross-validation, and the random forest model uses 3, 6, or 9 features at each split. It ultimately selects the hyperparameter that results in the best model performance based on AUC value.

The user has the option to re-run the random forest model, or load the existing model object given computation time can be long here.

Performance statistics are generated after testing the model on the testing data. The model is then evaluated across all circuit-days. Various plots are generated, including the confusion matrix and feature importance.

Fast-trip enablement

The next script 3b high_risk_days.R determines which circuit-days fast-trip settings are enabled on. It uses a publicly-available data on the utility's fire potential index (FPI) during ignition events. The script trains a model using a subset of those ignition events, along with detailed fire weather covariates, to predict whether or not the utility's FPI was at level three (R3) or greater. R3 or greater is the criteria to enable fast-trip settings, though in some cases the utility may enable fast-trip settings at a lower level if certain conditions are met.

The model is then evaluated on all circuit-days, even prior to 2021 when fast-trip settings were not deployed yet. The regression model described next will use these historical days when the fire potential index was R3 or greater to measure the effectiveness of fast-trip settings.

Matching and logistic regression

The next script 3c high_risk_days.R implements the matching procedure and fits the logistic regression model to estimate mitigation effectiveness.

The first step identifies circuits that received high or moderate levels of enhanced vegetation management (the language in the code calls these "doses" in the style of Callaway and Sant'anna 2022 differences-in-differences model). The next step prepares the regression dataset in the same fashion before it is used in the ignition probability prediction model. Other data preparation steps include merging in the FPI data and predicted baseline ignition probability from the random forest model. Then, indicators are constructed to reflect when fast-trip settings are enabled. Recall from the paper that some circuits were initially piloted with fast-trip settings in July 2021, and the remaining HFTD circuits had fast-trip settings enabled in 2022.

Various treatment variables are constructed too. This includes indicator variables for whether the circuit is in the high vegetation management tranche or the moderate vegetation management tranche.

Next, a logistic regression model is specified prior to any matching technique. This regression result is shown in the first column of the primary regression table in the main text of the paper, and it used to illustrate potential bias from differences in baseline risk prior to matching.

The following step implements the matching technique in which circuits are matched to their nearest neighbors on the basis of average predicted ignition probability. The user has the option here to pick the number of matches (the default is n = 1). Robustness results are shown in the supplementary regression tables where n = 2. In other words, each treated circuit is matched to its two nearest neighbors in terms of average predicted ignition probability. The user also has the option to change the caliper size of the match (the default is std = 0.1). This parameter reflects that matches are only successful if the nearest neighbor's average ignition probability is within 10% of the sample's standard deviation ignition probability. The matching process uses replacment, so a control circuit can be matched multiple times to a different treated circuit.

Once the control circuits have been matched to the treated circuits, the logistic regression model is run through several iterations. This includes a version where only circuits in the high vegetation management (and their matched control counterparts) are considered and a version where only the moderate vegetation management group is considered. The results for the high vegetation management group are shown in the second column of the regression table in the main text. The next set of regression models subset the sample to high-risk days (FPI of R3 or greater). These results are shown in the third column of the regression table in the main text.

All of the regression models are stored and saved in the rscore_models.RData object and the regression dataset and matched groups are saved in the reg_matched_data.RData object.

The remaining steps in the script transform the coefficients to incidence rates and transform the standard errors to confidence intervals using heteroskedastic robust standard errors. Performance statistics are also computed to assess the goodness-of-fit of the logistic regression models (area under the receiver operating characteristic curve is used to assess performance).

The script prints the results in a format that is friendly for Overleaf/Latex. If the user desires a simple text table the user can change the argument type='latex' to type='text' in the stargazer command.

Further below in the script the regression results are re-estimated for robustness, including adding regional fixed effects and matching on two control circuits (n = 2) in the matching technique.

Estimate structures burned

This section provides documentation for estimating the count of structures burned for each HFTD distribution circuit.

Intersect structures

The first of two scripts 4a intersect_structures.R focuses on estimating the potential number of structures impacted surrounding HFTD distribution circuits. The script begins by loading GIS circuit data and subsetting the data to circuits with at least 10 miles in the HFTD. Using the function simulateIgnitions, the script randomly drops 5 ignition points along each distribution circuit to assess potential structure impacts. These ignition locations are saved as ignitions_pyrologix.RData.

The next step uses the function intersectHousing to draw buffers around each simulated ignition point and intersect the buffers with structure locations and building loss factors. Building loss factors are sourced from the Conditional Risk to Potential Structures dataset, which uses simulated flame lengths at each pixel to assess the likelihood of a structure being lost if a fire were to occur in that pixel. The building loss factors do not take into account structure-specific defensive measures or construction materials. More on the methodology can be found in the paper.

Because this analysis step is computationally-intensive, the intersectHousing function was run in four chunks across distribution circuits. Each output is saved in the sub-folder labeled Missoula Fire Lab.

Estimate structures

The second script in this section 4b estimate_structures.R uses empirical data on wildfire events in the utility's service territory to predict the probability that a given ignition will spread into a wildfire of a given size (e.g., less than 10 acres, greater than 10,000 acres). To do so, a random forest model is trained where the target variable is the wildfire size class. See the methods section in the paper for more detail.

Analyze cost-effectiveness

The last section of the analysis code addresses cost-effectiveness of mitigation measures and generates many of the plots shown in the paper.

Cost-effectiveness

This first script of this section 5a cost_effectiveness.R contains the primary analysis that generates the cost-effectiveness bar chart in the main text of the paper. The first part of the script specifies key parameter values, including cost of capital, discount rate, asset life of the underground line, and so forth. Value of lost load (VOLL) parameters are specified in the beginning part of this script as well.

The next part merges in the estimated structure losses for each circuit-day. Outage information for PSPS and fast-trip is tabulated to determine the share of customer classes (e.g., residential, industrial) that are typically impacted by an outage event.

Cost parameters are specified for the mitigation measures are specified next. For more information about the sources for these cost parameters, please see the methods section of the paper.

The getUnits function creates a time-series dataset of the mitigation measures that extends into the future as far out as the asset's lifetime. In other words, to effectively model the costs and risk reduction of undergrounding, which can have an asset life of four or five decades, the analysis must create a circuit-day dataset for each circuit up to 50 years beyond 2023. Needless to say, this creates a very large dataset in which hundreds of distribution circuits have daily observations for 50 years. For cost accounting, this step also depreciates the undergrounding asset over its useful life. Straight-line (linear) depreciation is assumed. This is important because the electric utility earns a regulated rate of return that is charged to ratepayers on the non-depreciated component of the undergrounding asset.

The getWeatherTimeSeries function then randomly draws weather observations from the historical dataset to populate the expanded time-series dataset created by the getUnits procedure. Here, the user can specify the rate at which fire weather increases in future years due to climate change using the risk_by_2050 argument. The default value is risk_by_2050 = 0.5 which indicates in the year 2050 only the 50th percentile of fire risk weather is used when randomly sampling, and the percentile linearly increases from 0 to 50 between 2024 and 2050. The argument ignition_switch and the argument combined_risk_switch are used to determine how the percentiles should be assessed. If ignition_switch is used, then percentiles are based upon ignition probability. If neither ignition_switch or combined_switch is used, then percentiles are based off of the probability of a wildfire exceeding 10 acres. If combined_switch is used, then percentiles are based upon the product of ignition probability and probability of a wildfire exceeding 10 acres. More detail can be found in the methods section of the paper. This function is run over several iterations where the different asset lifetimes are used (30, 40, and 50 year underground asset lifetimes) and different climate change modifiers are used (25th, 50th, and 75th percentiles). Given the computation time of this procedure, the intermediate products are saved in the Intermediate Climatology sub-folder.

The constructDataset function is an intermediate step that combines the time-series dataset from getUnits with the sampled weather data from getWeatherTimeSeries step.

The estimateIgnitions procedure then uses the regression coefficients modeled earlier to predict the count of ignitions under different mitigation deployments. The procedure first estimates ignitions assuming no mitigations are deployed as a baseline. It then estimates ignitions separately when vegetation management is deployed, fast-trip settings are enabled, PSPS is activated, and undergrounding is deployed. Structure losses associated with each ignition are calculated for each circuit-day by multiplying the estimated probability of the wildfire class size by the average size in acres of the wildfire class size and by the structures losses per acre obtained in the previous section of the analysis. See methods in the paper for more detail on the calculation. Avoided ignitions are calculated as the difference between baseline estimated ignitions and estimated ignitions with the respective mitigation measure deployed. Avoided structure losses are calculated as avoided ignitions multiplied by expected structure losses for that circuit-day. Avoided ignitions and structure losses for vegetation management are depreciated over the lifetime of vegetation management asset given vegetation grows back over time. Finally, avoided ignitions and structure losses are summarized at the annual level for cost-effectiveness assessment.

The getAnnualInvestment function summarizes the cirucit-day time-series data on vegetation management and undergrounding into an annual basis to align with the estimated ignitions and structure losses.

The calculateCosts function calculates total costs of each mitigation measure under various cost parameterizations. A description of key parameters/arguments into this function follows:

  • r_disc is the real social discount rate
  • r_wacc is the authorized rate of return the utility earns on capital investment
  • c_veg is the cost per mile of vegetation management
  • c_ug is the cost per mile of undergrounding
  • c_epss_hr is the cost to the utility of fast-trip outages per customer-hour of outage (think of these as patrol and re-energization costs the utility incurs)
  • c_psps_hr is the equivalent of c_epss_hr for PSPS
  • c_routine_veg is the per-mile cost of routine vegetation management work (this is used as an avoided cost for undergrounding)
  • maint_veg and maint_ug are the operations and maintenance costs expressed as a share of per-mile costs
  • share_overhead_ug is the ratio of overhead line length that underground lines replace (a mile of overhead line is shorter than a replacement mile of underground line because overhad lines can more easily avoid surface and below-surface obstacles)
  • year_pv is the base year to calculate present value for net present value calculations
  • lost_load contains the value of lost load assumptions by customer class
  • outages is a dataframe containing the customer-hours of fast-trip and PSPS outages for reliability cost calculations

The first part of the function calculates reliability costs using lost_load and outages. The next part calculates undergrounding costs, which includes both the total capital investment and the rate of return earned on the remaining useful component of the undergrounding asset each year. The components of these costs that accrue in future years are discounted.

Maintenance costs for both vegetation management and undergrounding are calculated over their lifetimes and discounted as well. Additionally, avoided routine vegetation management costs are calculated for underground circuit-miles.

The following part of the procedure discounts the avoided ignitions and avoided structure losses that occur in future years. Lastly, discounted costs, discounted avoided ignitions, discounted structure losses, and reliability costs are summed across their lifetimes to get 2023 NPV basis. The cost-effectiveness for each measure is then calculated with the numerator being the mitigation cost, including reliability impacts and avoided routine vegetation management costs, and the denominator being either avoided ignitions or avoided structure losses. This is done for each mitigation measure, as well as a combined approach where PSPS and fast-trip are considered together. The results of this procedure are shown in the main bar chart cost-effectiveness plot in the paper.

The final piece of this script re-runs the calculateCosts function with different sensitivities. These sensitivities make up the upper and lower ends of the whiskers shown in the cost-effectiveness bar plot. This last step takes a significant amount of time to run because the dataset is quite large spanning up to 50 years in the future, and many iterations are required to test sensitivities related to discount rate, future climate change impacts, per-mile costs, and so forth.

Underground curve

The next component of the cost-effectiveness section, 5b underground_curve.R, takes a similar approach to the previous cost-effectiveness script. However, in this case, rather than creating point estimates for the cost-effectiveness bar chart using the actual amount of undergrounding deployed by the utility, this analysis constructs counterfactuals where zero to all overhead HFTD circuit-miles are converted to underground, and cost-effectiveness estimates are produced along a curve for each of these points. This script produces the results necessary to generate the curves in the final plot of the main text of the paper.

The mergeStructuresReshape function merges in potential structure losses for each circuit-day with the main time-series datasets that contain sampled weather days at various levels of climate change modifiers.

The estimateUndergroundingBenefit function calculates the avoided structures burned for each distribution circuit individiually assuming it was placed underground. As a part of this calculation, avoided outage impacts in terms of customer-hours are assessed for each distribution circuit. Next, circuits are ranked in terms of cost-effectiveness of undergrounding.

The estimateUndergroundingCost function then calculates the costs of undergrounding each of these distribution circuits. As a part of this analysis, we assume fast-trip outages and PSPS outages increase in step with climate change trajectories. See methods of the paper for more detail. Thus, undergrounding will avoid increased fast-trip and PSPS outages in future years.

The final function costEffectiveness combines the avoided structures for each distribution circuit with the estimated costs (and avoided costs) of undergroundin that distribution circuit. Circuits are then ranked in terms of most cost-effective to least cost-effective to generate the curves.

Next, the monteCarlo function re-runs this analysis 200 times, with each draw including a different discount rate, value of lost load, effectiveness of fast-trip settings, undergrounding cost per-mile, PSPS and fast-trip hourly cost to the utility, and climate change modifier. The results of the Monte Carlo analysis generate the shaded confidence intervals seen in the curves in the main text of the paper.

Main figures

This script draws on many of the analysis outputs to create the main plots relied upon in the paper.

Other plots

This script generates the supplementary plots relied upon in the suppelementary material and appendices.

About

Code, data sources, and workpapers associated with "Dynamic Grid Management Technologies Reduce Electric-Power Sector Wildfire Adaptation Costs"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%