impute is calculating imputation values for missing data depending on the selected method.

impute(
  data,
  sample,
  grouping,
  intensity_log2,
  condition,
  comparison = comparison,
  missingness = missingness,
  noise = NULL,
  method = "ludovic",
  skip_log2_transform_error = FALSE,
  retain_columns = NULL
)

Arguments

data

a data frame that is ideally the output from the assign_missingness function. It should containing at least the input variables. For each "reference_vs_treatment" comparison, there should be the pair of the reference and treatment condition. That means the reference condition should be doublicated once for every treatment.

sample

a character column in the data data frame that contains the sample names.

grouping

a character column in the data data frame that contains the precursor or peptide identifiers.

intensity_log2

a numeric column in the data data frame that contains the intensity values.

condition

a character or numeric column in the data data frame that contains the the conditions.

comparison

a character column in the data data frame that contains the the comparisons of treatment/reference pairs. This is an output of the assign_missingnes function.

missingness

a character column in the data data frame that contains the missingness type of the data determines how values for imputation are sampled. This should at least contain "MAR" or "MNAR". Missingness assigned as NA will not be imputed.

noise

a numeric column in the data data frame that contains the noise value for the precursor/peptide. Is only required if method = "noise". Note: Noise values need to be log2 transformed.

method

a character value that specifies the method to be used for imputation. For method = "ludovic", MNAR missingness is sampled from a normal distribution around a value that is three lower (log2) than the lowest intensity value recorded for the precursor/peptide and that has a spread of the mean standard deviation for the precursor/peptide. For method = "noise", MNAR missingness is sampled from a normal distribution around the mean noise for the precursor/peptide and that has a spread of the mean standard deviation (from each condition) for the precursor/peptide. Both methods impute MAR data using the mean and variance of the condition with the missing data.

skip_log2_transform_error

a logical value that determines if a check is performed to validate that input values are log2 transformed. If input values are > 40 the test is failed and an error is returned.

retain_columns

a vector that indicates columns that should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains an imputed_intensity and imputed column in addition to the required input columns. The imputed column indicates if a value was imputed. The imputed_intensity column contains imputed intensity values for previously missing intensities.

Examples

set.seed(123) # Makes example reproducible # Create example data data <- create_synthetic_data( n_proteins = 10, frac_change = 0.5, n_replicates = 4, n_conditions = 2, method = "effect_random", additional_metadata = FALSE ) head(data, n = 24)
#> # A tibble: 24 × 8 #> protein peptide condition sample peptide_intensi… change change_peptide #> <chr> <chr> <chr> <chr> <dbl> <lgl> <lgl> #> 1 protein_1 peptide_1_1 conditio… sampl… 16.8 TRUE TRUE #> 2 protein_1 peptide_1_1 conditio… sampl… 17.0 TRUE TRUE #> 3 protein_1 peptide_1_1 conditio… sampl… 17.0 TRUE TRUE #> 4 protein_1 peptide_1_1 conditio… sampl… 17.0 TRUE TRUE #> 5 protein_1 peptide_1_1 conditio… sampl… 15.8 TRUE TRUE #> 6 protein_1 peptide_1_1 conditio… sampl… 15.9 TRUE TRUE #> 7 protein_1 peptide_1_1 conditio… sampl… 16.1 TRUE TRUE #> 8 protein_1 peptide_1_1 conditio… sampl… 15.9 TRUE TRUE #> 9 protein_1 peptide_1_2 conditio… sampl… 12.6 TRUE FALSE #> 10 protein_1 peptide_1_2 conditio… sampl… 12.7 TRUE FALSE #> # … with 14 more rows, and 1 more variable: peptide_intensity_missing <dbl>
# Assign missingness information data_missing <- assign_missingness( data, sample = sample, condition = condition, grouping = peptide, intensity = peptide_intensity_missing, ref_condition = "all", retain_columns = c(protein, peptide_intensity) )
#> "all" was provided as reference condition. All pairwise comparisons are #> created from the conditions and assigned their missingness. The #> created comparisons are: #> condition_1_vs_condition_2
head(data_missing, n = 24)
#> # A tibble: 24 × 8 #> protein peptide_intensi… sample condition peptide peptide_intensi… comparison #> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 protei… 16.8 sampl… conditio… peptid… 16.8 condition… #> 2 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 3 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 4 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 5 protei… 15.8 sampl… conditio… peptid… 15.8 condition… #> 6 protei… 15.9 sampl… conditio… peptid… 15.9 condition… #> 7 protei… 16.1 sampl… conditio… peptid… 16.1 condition… #> 8 protei… 15.9 sampl… conditio… peptid… 15.9 condition… #> 9 protei… 12.6 sampl… conditio… peptid… NA condition… #> 10 protei… 12.7 sampl… conditio… peptid… NA condition… #> # … with 14 more rows, and 1 more variable: missingness <chr>
# Perform imputation data_imputed <- impute( data_missing, sample = sample, grouping = peptide, intensity_log2 = peptide_intensity_missing, condition = condition, comparison = comparison, missingness = missingness, method = "ludovic", retain_columns = c(protein, peptide_intensity) ) head(data_imputed, n = 24)
#> # A tibble: 24 × 10 #> protein peptide_intensi… sample condition peptide peptide_intensi… comparison #> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 protei… 16.8 sampl… conditio… peptid… 16.8 condition… #> 2 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 3 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 4 protei… 17.0 sampl… conditio… peptid… 17.0 condition… #> 5 protei… 15.8 sampl… conditio… peptid… 15.8 condition… #> 6 protei… 15.9 sampl… conditio… peptid… 15.9 condition… #> 7 protei… 16.1 sampl… conditio… peptid… 16.1 condition… #> 8 protei… 15.9 sampl… conditio… peptid… 15.9 condition… #> 9 protei… 12.6 sampl… conditio… peptid… NA condition… #> 10 protei… 12.7 sampl… conditio… peptid… NA condition… #> # … with 14 more rows, and 3 more variables: missingness <chr>, #> # imputed_intensity <dbl>, imputed <lgl>