`Dataprep` class¶

This class and its API are based on the similarly named function in the R Synth package.

The dataprep class defines all the information necessary for the synthetic control study. It takes in as argument a pandas.DataFrame foo containing the panel data, a list of predictors, special predictors, the statistical operation to apply to the predictors over the selected time frame, the dependant variable, the columns denoting the unit labels, the label denoting the control units, the label denoting the treated unit, the time period to carry out the optimisation procedure over and the time period to apply the statistical operation to the predictors. See below for further details about each individual argument, and also see the examples folder of the repository to see how this class is set up in three real research contexts.

The principal difference between the function signature here and the one in the R synth package is that whereas there are two arguments unit.variable and unit.names.variable in that package, in this package these are consolidated into one argument unit_variable as here it is unnecessary to have both.

class pysyncon.Dataprep(foo: pd.DataFrame, predictors: Axes, predictors_op: PredictorsOp_t, dependent: Any, unit_variable: Any, time_variable: Any, treatment_identifier: Any | list | tuple, controls_identifier: list | tuple, time_predictors_prior: IsinArg_t, time_optimize_ssr: IsinArg_t, special_predictors: Iterable[SpecialPredictor_t] | None = None)¶

Helper class that takes in the panel data and all necessary information needed to describe the study setup. It is used to automatically generate the matrices needed for the optimisation methods, plots of the results etc.

Parameters:

foo (pandas.DataFrame) – A pandas DataFrame containing the panel data where the columns are predictor/outcome variables and each row is a time-step for some unit
predictors (Axes) – The columns of foo to use as predictors
predictors_op ("mean" | "std" | "median" | "sum" | "count" | "max" | "min" | "var") – The statistical operation to use on the predictors - the time range that the operation is applied to is time_predictors_prior
dependent (Any) – The column of foo to use as the dependent variable
unit_variable (Any) – The column of foo that contains the unit labels
time_variable (Any) – The column of foo that contains the time period
treatment_identifier (Any) – The unit label that denotes the treated unit
controls_identifier (Iterable) – The unit labels denoting the control units
time_predictors_prior (Iterable) – The time range over which to apply the statistical operation to the predictors (see predictors_op argument)
time_optimize_ssr (Iterable) – The time range over which the loss function should be minimised
special_predictors (Iterable[SpecialPredictor_t], optional) –
An iterable of special predictors which are additional predictors that should be averaged over a custom time period and an indicated statistical operator. In particular, a special predictor consists of a triple of:
- column: the column of foo containing the predictor to use,
- time-range: the time range to apply operator over - it should have the same type as time_predictors_prior or time_optimize_ssr
- operator: the statistical operator to apply to column - it should have the same type as predictors_op
by default None

Raises:

TypeError – if foo is not of type pandas.DataFrame
ValueError – if predictor is not a column of foo
ValueError – if predictor_op is not one of “mean”, “std”, “median”, “sum”, “count”, “max”, “min” or “var”.
ValueError – if dependent is not a column of foo
ValueError – if unit_variable is not a column of foo
ValueError – if time_variable is not a column of foo
ValueError – if treatment_identifier is not present in foo['unit_variable']
TypeError – if controls_identifier is not of type Iterable
ValueError – if treatment_identifier is in the list of controls
ValueError – if any of the controls is not in foo['unit_variable']
ValueError – if any element of special_predictors is not an Iterable of length 3
ValueError – if a predictor in an element of special_predictors is not a column of foo
ValueError – if one of the operators in an element of special_predictors is not one of “mean”, “std”, “median”, “sum”, “count”, “max”, “min” or “var”.

`Dataprep` class¶

pysyncon

Navigation

Related Topics

Dataprep class¶

`Dataprep` class¶