Sample data generation¶
The package provides a method for generating fake data for testing purposes.
Linear Factor model¶
Let \(Y_{it}^N\) (resp. \(Y_{it}^I\)) denote the outcome for unit \(i\) at time \(t\)
in the absence of treatment (resp. in the presence of treatment). The LinearFactorModel
generates sample potential outcomes data according to a Linear
Factor model:
where \(Z_j\) denotes a vector of observable covariates, \(\mu_j\) is a vector of unobservable covariates and \(\epsilon_{tj}\) are mean-zero normal shocks. The vector \(\delta_t\) denotes a vector of treatment effects and the remaining variables are model parameters.
- class pysyncon.generator.LinearFactorModel(observed_dist: tuple[int] = (0, 1), observed_params_dist: tuple[int] = (0, 10), unobserved_dist: tuple[int] = (0, 1), unobserved_params_dist: tuple[int] = (0, 10), effect_dist: tuple[int] = (0, 20), shocks_dist: tuple[int] = (0, 1), seed: int | None = None, rng: Generator | None = None)¶
Generates potential outcomes following a linear factor model
- generate(n_units: int, n_observable: int, n_unobservable: int, n_periods_pre: int, n_periods_post: int) tuple[DataFrame, Series, DataFrame, Series] ¶
Generate the matrices (\(X_0\), \(X_1\), \(Z_0\), \(Z_1\)) that can be used as input to a synthetic control method (using the notation of Abadie & Gardeazabal [AG03]).
- Parameters:
n_units (int) – Number of units in the model
n_observable (int) – Number of observable covariates in the model
n_unobservable (int) – Number of unobservable covariates in the model
n_periods_pre (int) – Number of time periods prior to the intervention
n_periods_post (int) – Number of time periods post the intervention
- Returns:
Returns a tuple of 4 pandas objects: \(X_0\) a pandas DataFrame of shape (n_periods_pre + n_periods_post, n_units - 1), \(X_1\) a pandas Series of shape (n_periods_pre + n_periods_post, 1), \(Z_0\) a pandas DataFrame of shape (n_observable, n_units - 1), \(Z_1\) a pandas Series of shape (n_observable, 1).
- Return type:
tuple[pandas.DataFrame, pandas.Series, pandas.DataFrame, pandas.Series]