Augmented Synthetic Control Method

The Augmented Synthetic Control Method is due to Ben-Michael, Feller & Rothstein [BMFR21] and adapts the Synthetic Control Method in an effort to adjust for poor pre-treatment fit.

The authors do this by adjusting the Synthetic Control Method estimate by adding a term that is an imbalance in a particular function of the pre-treatment outcomes. In the Ridge Augmented Synthetic Control Method this function is linear in the pre-treatment outcomes and fit by ridge regression of the control post-treatment outcomes against pre-treatment outcomes.

In particular, the method constructs a vector of weights \(w = (w_1, w_2, \dots, w_k)\) such that

\[w = w_\mathrm{scm} + w_\mathrm{aug},\]

where \(w_\mathrm{scm}\) are the weights obtained from the standard Synthetic Control Method and \(w_\mathrm{aug}\) are augmentations that are included when the treated unit lies outside the convex hull defined by the control units. The weights may be negative and larger than 1, the degree of extrapolation is controlled by a ridge parameter \(\lambda\).

In general, this method will obtain weights at least as good as the synthetic control method in terms of pre-treatment fit.

The AugSynth class

The AugSynth class implements the Ridge Augmented Synthetic Control Method. The expected way to use the class is to first create a Dataprep object that defines the study data and then use it as input to a AugSynth object. See the examples folder of the repository for examples illustrating usage.

The implementation is based on the same method in the R augsynth package and aims to produce results that can be reconciled with that package.

class pysyncon.AugSynth

Implementation of the augmented synthetic control method due to Ben- Michael, Feller & Rothstein [BMFR21].

The implementation follows the augsynth R package with the option progfunc=”Ridge”.

att(time_period: Iterable | Series | dict, Z0: DataFrame | None = None, Z1: Series | None = None) dict[str, float]

Computes the average treatment effect on the treated unit (ATT) and the standard error to the value over the chosen time-period.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time period to compute the ATT over.

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

A dictionary with the ATT value and the standard error to the ATT.

Return type:

dict

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

fit(dataprep: Dataprep, lambda_: float | None = None) None

Fit the model/calculate the weights.

Parameters:
  • dataprep (Dataprep, optional) – Dataprep object containing data to model.

  • lambda (float, optional) – Ridge parameter to use. If not supplied, then it is obtained by cross-validation, by default None

gaps_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None

Plots the gap between the treated unit and the synthetic unit over time.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None

  • treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None

  • grid (bool, optional) – Whether or not to plot a grid, by default True

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mae(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean absolute error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean absolute error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mape(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean absolute percentage error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean absolute percentage error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mspe(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean square prediction error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean square prediction Error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

path_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None

Plot the outcome variable over time for the treated unit and the synthetic control.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None

  • treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None

  • grid (bool, optional) – Whether or not to plot a grid, by default True

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

summary(round: int = 3, X0: DataFrame | None = None, X1: Series | None = None) DataFrame

Generates a pandas.DataFrame with summary data. The first column will show the mean value of each predictor over the time period time_predictors_prior for the treated unit and the second column the case of the synthetic unit and finally there will be a column ‘sample mean’ that shows the mean value of each predictor over the time period time_predictors_prior across all the control units, i.e. this will be the same as a synthetic control where all the weights are equal.

Parameters:
  • round (int, optional) – Round the table values to the given number of places, by default 3

  • X0 (pd.DataFrame, shape (n_cov, n_controls), optional) – Matrix with each column corresponding to a control unit and each row is a covariate. If no dataprep is set, then this must be supplied along with X1, by default None.

  • X1 (pandas.Series, shape (n_cov, 1), optional) – Column vector giving the covariate values for the treated unit. If no dataprep is set, then this must be supplied along with Z1, by default None.

Returns:

Summary data.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

weights(round: int = 3, threshold: float | None = None) Series

Return a pandas.Series of the weights for each control unit.

Parameters:
  • round (int, optional) – Round the weights to given number of places, by default 3

  • threshold (float, optional) – If supplied, will only show weights above this value, by default None

Returns:

The weights computed

Return type:

pandas.Series

Raises:

ValueError – If there is no weight matrix available