Dataprep class¶
This class and its API are based on the similarly named function in the R Synth package.
The dataprep class defines all the information necessary for the synthetic
control study. It takes in as argument a pandas.DataFrame foo containing
the panel data, a list of predictors, special predictors, the statistical operation to
apply to the predictors over the selected time frame, the dependant variable,
the columns denoting the unit labels, the label denoting the control units,
the label denoting the treated unit, the time period to carry out the optimisation
procedure over and the time period to apply the statistical operation to the
predictors. See below for further details about each individual argument, and also see
the examples folder
of the repository to see how this class is set up in three real research contexts.
The principal difference between the function signature here and the one in
the R synth package is that whereas there are two arguments unit.variable
and unit.names.variable in that package, in this package these are
consolidated into one argument unit_variable as here it is unnecessary to have
both.
- class pysyncon.Dataprep(foo: pd.DataFrame, predictors: Axes, predictors_op: PredictorsOp_t, dependent: Any, unit_variable: Any, time_variable: Any, treatment_identifier: Any | list | tuple, controls_identifier: list | tuple, time_predictors_prior: IsinArg_t, time_optimize_ssr: IsinArg_t, special_predictors: Iterable[SpecialPredictor_t] | None = None)¶
Helper class that takes in the panel data and all necessary information needed to describe the study setup. It is used to automatically generate the matrices needed for the optimisation methods, plots of the results etc.
- Parameters:
foo (pandas.DataFrame) – A pandas DataFrame containing the panel data where the columns are predictor/outcome variables and each row is a time-step for some unit
predictors (Axes) – The columns of
footo use as predictorspredictors_op ("mean" | "std" | "median" | "sum" | "count" | "max" | "min" | "var") – The statistical operation to use on the predictors - the time range that the operation is applied to is
time_predictors_priordependent (Any) – The column of
footo use as the dependent variableunit_variable (Any) – The column of
foothat contains the unit labelstime_variable (Any) – The column of
foothat contains the time periodtreatment_identifier (Any) – The unit label that denotes the treated unit
controls_identifier (Iterable) – The unit labels denoting the control units
time_predictors_prior (Iterable) – The time range over which to apply the statistical operation to the predictors (see
predictors_opargument)time_optimize_ssr (Iterable) – The time range over which the loss function should be minimised
special_predictors (Iterable[SpecialPredictor_t], optional) –
An iterable of special predictors which are additional predictors that should be averaged over a custom time period and an indicated statistical operator. In particular, a special predictor consists of a triple of:
column: the column offoocontaining the predictor to use,time-range: the time range to applyoperatorover - it should have the same type astime_predictors_priorortime_optimize_ssroperator: the statistical operator to apply tocolumn- it should have the same type aspredictors_op
by default None
- Raises:
TypeError – if
foois not of typepandas.DataFrameValueError – if
predictoris not a column offooValueError – if
predictor_opis not one of “mean”, “std”, “median”, “sum”, “count”, “max”, “min” or “var”.ValueError – if
dependentis not a column offooValueError – if
unit_variableis not a column offooValueError – if
time_variableis not a column offooValueError – if
treatment_identifieris not present infoo['unit_variable']TypeError – if
controls_identifieris not of typeIterableValueError – if
treatment_identifieris in the list of controlsValueError – if any of the controls is not in
foo['unit_variable']ValueError – if any element of
special_predictorsis not an Iterable of length 3ValueError – if a predictor in an element of
special_predictorsis not a column of fooValueError – if one of the operators in an element of
special_predictorsis not one of “mean”, “std”, “median”, “sum”, “count”, “max”, “min” or “var”.