Sample data generation

The package provides a method for generating fake data for testing purposes.

Linear Factor model

Let \(Y_{it}^N\) (resp. \(Y_{it}^I\)) denote the outcome for unit \(i\) at time \(t\) in the absence of treatment (resp. in the presence of treatment). The LinearFactorModel generates sample potential outcomes data according to a Linear Factor model:

\[\begin{split}Y_{jt}^N &= \theta_t^T Z_j + \lambda_t^T \mu_j + \epsilon_{tj},\\ Y_{jt}^I &= Y_{jt}^N + \delta_t,\end{split}\]

where \(Z_j\) denotes a vector of observable covariates, \(\mu_j\) is a vector of unobservable covariates and \(\epsilon_{tj}\) are mean-zero normal shocks. The vector \(\delta_t\) denotes a vector of treatment effects and the remaining variables are model parameters.

class pysyncon.generator.LinearFactorModel(observed_dist: tuple[int] = (0, 1), observed_params_dist: tuple[int] = (0, 10), unobserved_dist: tuple[int] = (0, 1), unobserved_params_dist: tuple[int] = (0, 10), effect_dist: tuple[int] = (0, 20), shocks_dist: tuple[int] = (0, 1), seed: int | None = None, rng: Generator | None = None)

Generates potential outcomes following a linear factor model

generate(n_units: int, n_observable: int, n_unobservable: int, n_periods_pre: int, n_periods_post: int) tuple[DataFrame, Series, DataFrame, Series]

Generate the matrices (\(X_0\), \(X_1\), \(Z_0\), \(Z_1\)) that can be used as input to a synthetic control method (using the notation of Abadie & Gardeazabal [AG03]).

Parameters:
  • n_units (int) – Number of units in the model

  • n_observable (int) – Number of observable covariates in the model

  • n_unobservable (int) – Number of unobservable covariates in the model

  • n_periods_pre (int) – Number of time periods prior to the intervention

  • n_periods_post (int) – Number of time periods post the intervention

Returns:

Returns a tuple of 4 pandas objects: \(X_0\) a pandas DataFrame of shape (n_periods_pre + n_periods_post, n_units - 1), \(X_1\) a pandas Series of shape (n_periods_pre + n_periods_post, 1), \(Z_0\) a pandas DataFrame of shape (n_observable, n_units - 1), \(Z_1\) a pandas Series of shape (n_observable, 1).

Return type:

tuple[pandas.DataFrame, pandas.Series, pandas.DataFrame, pandas.Series]