Penalized Synthetic Control Method¶
The penalized synthetic control method is due to Abadie & L’Hour [ALHour21].
This version of the synthetic control method adds a penalization term to the loss function that has the effect of serving to reduce the interpolation bias. It does this by penalizing pairwise discrepancies in any unit contributing to the synthetic control and the treated unit.
The PenalizedSynth
class¶
The PenalizedSynth
class implements the penalized
synthetic control method. The expected way to use the class is to first create a
Dataprep
object that defines the study data and
then use it as input to a PenalizedSynth
object. See the
examples folder
of the repository for examples illustrating usage.
- class pysyncon.PenalizedSynth¶
Implementation of the penalized synthetic control method due to Abadie & L’Hour [ALHour21].
- att(time_period: Iterable | Series | dict, Z0: DataFrame | None = None, Z1: Series | None = None) dict[str, float] ¶
Computes the average treatment effect on the treated unit (ATT) and the standard error to the value over the chosen time-period.
- Parameters:
time_period (Iterable | pandas.Series | dict, optional) – Time period to compute the ATT over.
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Returns:
A dictionary with the ATT value and the standard error to the ATT.
- Return type:
dict
- Raises:
ValueError – If there is no weight matrix available
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- fit(dataprep: Dataprep | None = None, X0: DataFrame | None = None, X1: Series | DataFrame | None = None, lambda_: float | None = 0.01, custom_V: ndarray | None = None) None ¶
Fit the model/calculate the weights.
- Parameters:
dataprep (Dataprep, optional) –
Dataprep
object containing data to model, by default None.X0 (pd.DataFrame, shape (c, m), optional) – Matrix with each column corresponding to a control unit and each row is a covariate value, by default None.
X1 (pandas.Series, shape (c, 1), optional) – Column vector giving the covariate values for the treated unit, by default None.
lambda (float, optional) – Ridge parameter to use, default 0.01
custom_V (numpy.ndarray, shape (c, c), optional) – Provide a V matrix (using the notation of the Abadie, Diamond & Hainmueller paper, this matrix is denoted by Γ in the Abadie and L’Hour paper), if not provided then the identity matrix is used (equal importance to all covariates).
- Returns:
None
- Return type:
NoneType
- Raises:
ValueError – if neither a Dataprep object nor all of (X0, X1) are supplied.
- gaps_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None ¶
Plots the gap between the treated unit and the synthetic unit over time.
- Parameters:
time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None
treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None
grid (bool, optional) – Whether or not to plot a grid, by default True
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Raises:
ValueError – If there is no weight matrix available
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- mae(Z0: DataFrame | None = None, Z1: Series | None = None) float ¶
Returns the mean absolute error in the fit of the synthetic control versus the treated unit over the optimization time-period.
- Parameters:
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Returns:
Mean absolute error
- Return type:
float
- Raises:
ValueError – If the fit method has not been run (no weights available.)
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- mape(Z0: DataFrame | None = None, Z1: Series | None = None) float ¶
Returns the mean absolute percentage error in the fit of the synthetic control versus the treated unit over the optimization time-period.
- Parameters:
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Returns:
Mean absolute percentage error
- Return type:
float
- Raises:
ValueError – If the fit method has not been run (no weights available.)
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- mspe(Z0: DataFrame | None = None, Z1: Series | None = None) float ¶
Returns the mean square prediction error in the fit of the synthetic control versus the treated unit over the optimization time-period.
- Parameters:
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Returns:
Mean square prediction Error
- Return type:
float
- Raises:
ValueError – If the fit method has not been run (no weights available.)
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- path_plot(time_period: Iterable | Series | dict | None = None, treatment_time: int | None = None, grid: bool = True, Z0: DataFrame | None = None, Z1: Series | None = None) None ¶
Plot the outcome variable over time for the treated unit and the synthetic control.
- Parameters:
time_period (Iterable | pandas.Series | dict, optional) – Time range to plot, if none is supplied then the time range used is the time period over which the optimisation happens, by default None
treatment_time (int, optional) – If supplied, plot a vertical line at the time period that the treatment time occurred, by default None
grid (bool, optional) – Whether or not to plot a grid, by default True
Z0 (pandas.DataFrame, shape (n, c), optional) – The matrix of the time series of the outcome variable for the control units. If no dataprep is set, then this must be supplied along with Z1, by default None.
Z1 (pandas.Series, shape (n, 1), optional) – The matrix of the time series of the outcome variable for the treated unit. If no dataprep is set, then this must be supplied along with Z0, by default None.
- Raises:
ValueError – If there is no weight matrix available
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- summary(round: int = 3, X0: DataFrame | None = None, X1: Series | None = None) DataFrame ¶
Generates a
pandas.DataFrame
with summary data. The first column will show the mean value of each predictor over the time periodtime_predictors_prior
for the treated unit and the second column the case of the synthetic unit and finally there will be a column ‘sample mean’ that shows the mean value of each predictor over the time periodtime_predictors_prior
across all the control units, i.e. this will be the same as a synthetic control where all the weights are equal.- Parameters:
round (int, optional) – Round the table values to the given number of places, by default 3
X0 (pd.DataFrame, shape (n_cov, n_controls), optional) – Matrix with each column corresponding to a control unit and each row is a covariate. If no dataprep is set, then this must be supplied along with X1, by default None.
X1 (pandas.Series, shape (n_cov, 1), optional) – Column vector giving the covariate values for the treated unit. If no dataprep is set, then this must be supplied along with Z1, by default None.
- Returns:
Summary data.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If there is no weight matrix available
ValueError – If there is no
Dataprep
object set or (Z0, Z1) is not supplied
- weights(round: int = 3, threshold: float | None = None) Series ¶
Return a
pandas.Series
of the weights for each control unit.- Parameters:
round (int, optional) – Round the weights to given number of places, by default 3
threshold (float, optional) – If supplied, will only show weights above this value, by default None
- Returns:
The weights computed
- Return type:
pandas.Series
- Raises:
ValueError – If there is no weight matrix available