Skip to contents

Simulate a complete training dataset, which may be representative of various applications. Several flexible arguments allow adjustment of the range of observed days, the distribution and the mean of Output values, as well as the ratio of missing data.

Usage

simu_db(
  start_date = "2022-01-01",
  end_date = "2023-01-01",
  by = "day",
  output_distrib = "Gaussian",
  ratio_missing = 0.5,
  mean = 50,
  var = 10,
  range_unif = c(0, 100)
)

Arguments

start_date

A date, indicating the starting time of observations. Default is '2022-01-01'.

end_date

A date, indicating the ending time of observations. Default is '2023-01-01'.

by

A number or a character string, indicating the reference time time period between two observations. Possible values are 'day', 'week', 'month', 'year', or any arbitrary number. See documentation of the 'seq()' for additional information if necessary. Default is 'day'.

output_distrib

A character string, indicating the distribution of Output values. Possible values: 'Gaussian' (default), 'Uniform'.

ratio_missing

A number, between 0 and 1, indicating the ratio of missing values in the dataset. Default is 0.5.

mean

A number, indicating the mean value of the Gaussian distribution. Default is 50.

var

A number, indicating the variance of the Gaussian distribution. Default is 10.

range_unif

A vector, indicating the range of values for the Uniform distribution. Default is c(0,100).

Value

A full dataset of synthetic data.

Examples


## Generate a dataset with Gaussian measurements
data = simu_db(output_distrib = 'Gaussian')

## Generate a dataset with Uniform measurements and 30% of missing data.
data = simu_db(output_distrib = 'Uniform', ratio_missing = 0.3)