Synthetic Control Method

Overview

The synthetic control method is due to Abadie and Gardeazabal [AG03] (also see Abadie, Diamond and Hainmueller [ADH07] [ADH15]). This method constructs a weighted combination of the control units that most resembles the selected characteristics of the treated unit in a time period prior to the treatment time. This so-constructed “synthetic control unit” can then be compared with the treated unit to investigate the causal effect of the treatment.

Details

In particular, this method constructs a vector of non-negative weights \(w = (w_1, w_2, \dots, w_k)\) whose sum is 1 and \(k\) is the number of control units that minimizes

\[\|x_1-X_0w^T\|_V,\]

where

  • \(\|A\|_V=\sqrt{A^TVA}\), where \(V\) is a diagonal matrix with non-negative entries that captures the relationship between the outcome variable and the predictors,

  • \(X_0\) is a matrix of the values for the control units of the chosen statistic for the chosen predictors over the selected (pre-intervention) time-period (each column corresponds to a control),

  • \(x_1\) is a (column) vector of the corresponding values for the treated unit.

The matrix \(V\) can be supplied otherwise it is part of the optimization problem: it is obtained by minimizing the quantity

\[\|z_1-Z_0w^T\|,\]

where

  • \(Z_0\) is a matrix of the values of the outcome variable for the control units over the (pre-intervention) time-period (each column corresponds to a control),

  • \(z_1\) is a (column) vector of the corresponding values for the treated unit.

The Synth class

The Synth class implements the synthetic control method. The expected way to use the class is to first create a Dataprep object that defines the study data and then use it as input to a Synth object. See the examples folder of the repository for examples illustrating usage.

The implementation is based on the same method in the R Synth package and aims to produce results that can be reconciled with that package.

class pysyncon.Synth

Implementation of the synthetic control method due to Abadie & Gardeazabal [AG03].

att(time_period: Iterable | Series | dict, Z0: DataFrame | None = None, Z1: Series | None = None) dict[str, float]

Computes the average treatment effect on the treated unit (ATT) and the standard error to the value over the chosen time-period.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time period to compute the ATT over.

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

A dictionary with the ATT value and the standard error to the ATT.

Return type:

dict

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

confidence_interval(alpha: float, time_periods: list, tol: float, pre_periods: list | None = None, dataprep: Dataprep | None = None, X0: DataFrame | None = None, X1: Series | None = None, Z0: DataFrame | None = None, Z1: Series | None = None, custom_V: ndarray | None = None, optim_method: Literal['Nelder-Mead', 'Powell', 'CG', 'BFGS', 'L-BFGS-B', 'TNC', 'COBYLA', 'trust-constr'] | None = None, optim_initial: Literal['equal', 'ols'] | None = None, optim_options: dict | None = None, method: Literal['conformal'] = 'conformal', max_iter: int = 50, step_sz: float | None = None, step_sz_div: float = 20.0, verbose: bool = True) DataFrame

Confidence intervals obtained from test-inversion, where the p-values are obtained by adjusted refits of the data following Chernozhukov et al. [VCZ21].

Parameters:
  • alpha (float) – The required significance level, e.g. alpha = 0.05 will yield a confidence level of 100 * (1 - alpha) = 95%.

  • time_periods (list) – The time-periods to calculate confidence intervals for.

  • tol (float) – The required tolerance (accuracy) required when calculating the lower/upper cut-off point of the confidence interval. The search will try to obtain this tolerance level but will not exceed max_iter iterations trying to achieve that.

  • pre_periods (Optional[list], optional) – The time-periods to use for the optimization when refitting the data with the adjusted outcomes, optional.

  • dataprep (Optional[Dataprep], optional) – Dataprep object defining the study data, if this is not supplied then either self.dataprep must be set or else (X0, X1, Z0, Z1) must all be supplied, by default None.

  • X0 (pd.DataFrame, shape (m, c), optional) –

    Matrix with each column corresponding to a control unit and each row is covariates, if this is not supplied then either dataprep must

    be supplied or self.dataprep must be set by default None.

  • X1 (pandas.Series, shape (m, 1), optional) –

    Column vector giving the covariate values for the treated unit, if this is not supplied then either dataprep must

    be supplied or self.dataprep must be set by default None.

  • Z0 (pandas.DataFrame, shape (n, c), optional) – A matrix of the time series of the outcome variable with each column corresponding to a control unit and the rows are the time steps; the columns correspond with the columns of X0, if this is not supplied then either dataprep must be supplied or self.dataprep must be set by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) –

    Column vector giving the outcome variable values over time for the treated unit, if this is not supplied then either dataprep must

    be supplied or self.dataprep must be set by default None.

  • custom_V (numpy.ndarray, shape (c, c), optional) – Provide a V matrix (using the notation of the Abadie, Diamond & Hainmueller paper), the optimisation problem will only then be solved for the weight matrix W. This is the same argument as in the fit method, by default None.

  • optim_method (str, optional) –

    Optimisation method to use for the outer optimisation, can be any of the valid options for scipy minimize that do not require a jacobian matrix, namely

    • ’Nelder-Mead’

    • ’Powell’

    • ’CG’

    • ’BFGS’

    • ’L-BFGS-B’

    • ’TNC’

    • ’COBYLA’

    • ’trust-constr’

    This is the same argument as in the fit method, by default ‘Nelder-Mead’.

  • optim_initial (str, optional) –

    Starting value for the outer optimisation, possible starting values are

    • ’equal’, where the weights are all equal,

    • ’ols’, which uses a starting value obtained for fitting a regression.

    This is the same argument as in the fit method, by default ‘equal’.

  • optim_options (dict, optional) –

    options to provide to the outer part of the optimisation, value options are any option that can be provided to scipy minimize for the given optimisation method. This is the same argument as in

    the fit method, by default {‘maxiter’: 1000}.

  • method (str, optional) – The type of method to use when computing the confidence intervals, currently only conformal inference (conformal) is implemented, by default “conformal”.

  • max_iter (int, optional) – Maximum number of times to re-fit the data when trying to locate the lower/upper cut-off point and when binary searching for the cut-off point, by default 20.

  • step_sz (Optional[float], optional) – Step size to use when searching for an interval that contains the lower or upper cut-off point of the confidence interval, by default None.

  • step_sz_div (float, optional) – Alternative way to define step size: it is the fraction that defines step-size in terms of the standard deviation of the att, i.e. if step_sz_div=20.0 then the step size used will be (att +/- 2.5 * std(att)) / 20.0, by default 20.0.

  • verbose (bool, optional) – Print output, by default True.

Returns:

A pandas.DataFrame indexed by post_periods, with 3 columns: value that gives the calculated treatment effect, lower_ci that gives the value defining the lower-end of the confidence interval, upper_ci that gives the value defining the upper-end of the confidence interval.

Return type:

pd.DataFrame

Raises:
  • ValueError – If there is no Dataprep object set or (X0, X1, Z0, Z1) is not supplied or self.dataprep is not set.

  • TypeError – if (\(X1\), \(Z1\)) are not of type pandas.Series.

  • ValueError – if dataprep is not set and pre-periods is not set.

  • ValueError – if an invalid option for method is given, currently only conformal is supported.

fit(dataprep: Dataprep | None = None, X0: DataFrame | None = None, X1: Series | None = None, Z0: DataFrame | None = None, Z1: Series | None = None, custom_V: ndarray | None = None, optim_method: Literal['Nelder-Mead', 'Powell', 'CG', 'BFGS', 'L-BFGS-B', 'TNC', 'COBYLA', 'trust-constr'] = 'Nelder-Mead', optim_initial: Literal['equal', 'ols'] = 'equal', optim_options: dict = {'maxiter': 1000}) None

Fit the model/calculate the weights. Either a Dataprep object should be provided or otherwise matrices (\(X_0\), \(X_1\), \(Z_0\), \(Z_1\)) should be provided (using the notation of Abadie & Gardeazabal [AG03]).

Parameters:
  • dataprep (Dataprep, optional) – Dataprep object containing data to model, by default None.

  • X0 (pd.DataFrame, shape (m, c), optional) – Matrix with each column corresponding to a control unit and each row is covariates, by default None.

  • X1 (pandas.Series, shape (m, 1), optional) – Column vector giving the covariate values for the treated unit, by default None.

  • Z0 (pandas.DataFrame, shape (n, c), optional) – A matrix of the time series of the outcome variable with each column corresponding to a control unit and the rows are the time steps; the columns correspond with the columns of X0, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – Column vector giving the outcome variable values over time for the treated unit, by default None.

  • custom_V (numpy.ndarray, shape (c, c), optional) – Provide a V matrix (using the notation of the Abadie, Diamond & Hainmueller paper), the optimisation problem will only then be solved for the weight matrix W, by default None.

  • optim_method (str, optional) –

    Optimisation method to use for the outer optimisation, can be any of the valid options for scipy minimize that do not require a jacobian matrix, namely

    • ’Nelder-Mead’

    • ’Powell’

    • ’CG’

    • ’BFGS’

    • ’L-BFGS-B’

    • ’TNC’

    • ’COBYLA’

    • ’trust-constr’

    By default ‘Nelder-Mead’.

  • optim_initial (str, optional) –

    Starting value for the outer optimisation, possible starting values are

    • ’equal’, where the weights are all equal,

    • ’ols’, which uses a starting value obtained for fitting a regression.

    By default ‘equal’.

  • optim_options (dict, optional) – options to provide to the outer part of the optimisation, value options are any option that can be provided to scipy minimize for the given optimisation method, by default {‘maxiter’: 1000}.

Returns:

None

Return type:

NoneType

Raises:
  • ValueError – if neither a Dataprep object nor all of (\(X_0\), \(X_1\), \(Z_0\), \(Z_1\)) are supplied.

  • TypeError – if (\(X1\), \(Z1\)) are not of type pandas.Series.

  • ValueError – if optim_initial=ols and there is collinearity in the data.

  • ValueError – if optim_initial is not one of ‘equal’ or ‘ols’.

gaps_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None

Plots the gap between the treated unit and the synthetic unit over time.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None

  • treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None

  • grid (bool, optional) – Whether or not to plot a grid, by default True

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mae(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean absolute error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean absolute error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mape(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean absolute percentage error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean absolute percentage error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

mspe(Z0: DataFrame | None = None, Z1: Series | None = None) float

Returns the mean square prediction error in the fit of the synthetic control versus the treated unit over the optimization time-period.

Parameters:
  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Returns:

Mean square prediction Error

Return type:

float

Raises:
  • ValueError – If the fit method has not been run (no weights available.)

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

path_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None

Plot the outcome variable over time for the treated unit and the synthetic control.

Parameters:
  • time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None

  • treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None

  • grid (bool, optional) – Whether or not to plot a grid, by default True

  • Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.

  • Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.

Raises:
  • ValueError – If there is no weight matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

summary(round: int = 3, X0: DataFrame | None = None, X1: Series | None = None) DataFrame

Generates a pandas.DataFrame with summary data. In particular, it will show the values of the V matrix for each predictor, then the next column will show the mean value of each predictor over the time period time_predictors_prior for the treated unit and the synthetic unit and finally there will be a column ‘sample mean’ that shows the mean value of each predictor over the time period time_predictors_prior across all the control units, i.e. this will be the same as a synthetic control where all the weights are equal.

Parameters:
  • round (int, optional) – Round the numbers to given number of places, by default 3

  • X0 (pd.DataFrame, shape (n_cov, n_controls), optional) – Matrix with each column corresponding to a control unit and each row is a covariate. If no dataprep is set, then this must be supplied along with X1, by default None.

  • X1 (pandas.Series, shape (n_cov, 1), optional) – Column vector giving the covariate values for the treated unit. If no dataprep is set, then this must be supplied along with Z1, by default None.

Returns:

Summary data.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If there is no V matrix available

  • ValueError – If there is no Dataprep object set or (Z0, Z1) is not supplied

  • ValueError – If there is no weight matrix available

weights(round: int = 3, threshold: float | None = None) Series

Return a pandas.Series of the weights for each control unit.

Parameters:
  • round (int, optional) – Round the weights to given number of places, by default 3

  • threshold (float, optional) – If supplied, will only show weights above this value, by default None

Returns:

The weights computed

Return type:

pandas.Series

Raises:

ValueError – If there is no weight matrix available