impute is calculating imputation values for missing data depending on the selected
method.
impute(
data,
sample,
grouping,
intensity_log2,
condition,
comparison = comparison,
missingness = missingness,
noise = NULL,
method = "ludovic",
skip_log2_transform_error = FALSE,
retain_columns = NULL
)a data frame that is ideally the output from the assign_missingness function.
It should containing at least the input variables. For each "reference_vs_treatment" comparison,
there should be the pair of the reference and treatment condition. That means the reference
condition should be doublicated once for every treatment.
a character column in the data data frame that contains the sample names.
a character column in the data data frame that contains the precursor or
peptide identifiers.
a numeric column in the data data frame that contains the intensity
values.
a character or numeric column in the data data frame that contains the
the conditions.
a character column in the data data frame that contains the the
comparisons of treatment/reference pairs. This is an output of the assign_missingnes
function.
a character column in the data data frame that contains the
missingness type of the data determines how values for imputation are sampled. This should at
least contain "MAR" or "MNAR". Missingness assigned as NA will not be imputed.
a numeric column in the data data frame that contains the noise value for
the precursor/peptide. Is only required if method = "noise". Note: Noise values need to
be log2 transformed.
a character value that specifies the method to be used for imputation. For
method = "ludovic", MNAR missingness is sampled from a normal distribution around a
value that is three lower (log2) than the lowest intensity value recorded for the
precursor/peptide and that has a spread of the mean standard deviation for the
precursor/peptide. For method = "noise", MNAR missingness is sampled from a normal
distribution around the mean noise for the precursor/peptide and that has a spread of the
mean standard deviation (from each condition) for the precursor/peptide. Both methods impute
MAR data using the mean and variance of the condition with the missing data.
a logical value that determines if a check is performed to validate that input values are log2 transformed. If input values are > 40 the test is failed and an error is returned.
a vector that indicates columns that should be retained from the input
data frame. Default is not retaining additional columns retain_columns = NULL. Specific
columns can be retained by providing their names (not in quotations marks, just like other
column names, but in a vector).
A data frame that contains an imputed_intensity and imputed column in
addition to the required input columns. The imputed column indicates if a value was
imputed. The imputed_intensity column contains imputed intensity values for previously
missing intensities.
set.seed(123) # Makes example reproducible
# Create example data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
head(data, n = 24)
#> # A tibble: 24 × 8
#> protein peptide condition sample peptide_intensity change change_peptide
#> <chr> <chr> <chr> <chr> <dbl> <lgl> <lgl>
#> 1 protein_1 peptide_1… conditio… sampl… 16.8 TRUE TRUE
#> 2 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 3 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 4 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 5 protein_1 peptide_1… conditio… sampl… 15.8 TRUE TRUE
#> 6 protein_1 peptide_1… conditio… sampl… 15.9 TRUE TRUE
#> 7 protein_1 peptide_1… conditio… sampl… 16.1 TRUE TRUE
#> 8 protein_1 peptide_1… conditio… sampl… 15.9 TRUE TRUE
#> 9 protein_1 peptide_1… conditio… sampl… 12.6 TRUE FALSE
#> 10 protein_1 peptide_1… conditio… sampl… 12.7 TRUE FALSE
#> # ℹ 14 more rows
#> # ℹ 1 more variable: peptide_intensity_missing <dbl>
# Assign missingness information
data_missing <- assign_missingness(
data,
sample = sample,
condition = condition,
grouping = peptide,
intensity = peptide_intensity_missing,
ref_condition = "all",
retain_columns = c(protein, peptide_intensity)
)
#> "all" was provided as reference condition. All pairwise comparisons are
#> created from the conditions and assigned their missingness. The created
#> comparisons are:
#> condition_1_vs_condition_2
head(data_missing, n = 24)
#> # A tibble: 24 × 8
#> protein peptide_intensity sample condition peptide peptide_intensity_mi…¹
#> <chr> <dbl> <chr> <chr> <chr> <dbl>
#> 1 protein_1 16.8 sample_1 conditio… peptid… 16.8
#> 2 protein_1 17.0 sample_2 conditio… peptid… 17.0
#> 3 protein_1 17.0 sample_3 conditio… peptid… 17.0
#> 4 protein_1 17.0 sample_4 conditio… peptid… 17.0
#> 5 protein_1 15.8 sample_5 conditio… peptid… 15.8
#> 6 protein_1 15.9 sample_6 conditio… peptid… 15.9
#> 7 protein_1 16.1 sample_7 conditio… peptid… 16.1
#> 8 protein_1 15.9 sample_8 conditio… peptid… 15.9
#> 9 protein_1 12.6 sample_1 conditio… peptid… NA
#> 10 protein_1 12.7 sample_2 conditio… peptid… NA
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹peptide_intensity_missing
#> # ℹ 2 more variables: comparison <chr>, missingness <chr>
# Perform imputation
data_imputed <- impute(
data_missing,
sample = sample,
grouping = peptide,
intensity_log2 = peptide_intensity_missing,
condition = condition,
comparison = comparison,
missingness = missingness,
method = "ludovic",
retain_columns = c(protein, peptide_intensity)
)
head(data_imputed, n = 24)
#> # A tibble: 24 × 10
#> protein peptide_intensity sample peptide peptide_intensity_mi…¹ condition
#> <chr> <dbl> <chr> <chr> <dbl> <chr>
#> 1 protein_1 16.8 sample_1 peptid… 16.8 conditio…
#> 2 protein_1 17.0 sample_2 peptid… 17.0 conditio…
#> 3 protein_1 17.0 sample_3 peptid… 17.0 conditio…
#> 4 protein_1 17.0 sample_4 peptid… 17.0 conditio…
#> 5 protein_1 15.8 sample_5 peptid… 15.8 conditio…
#> 6 protein_1 15.9 sample_6 peptid… 15.9 conditio…
#> 7 protein_1 16.1 sample_7 peptid… 16.1 conditio…
#> 8 protein_1 15.9 sample_8 peptid… 15.9 conditio…
#> 9 protein_1 12.6 sample_1 peptid… NA conditio…
#> 10 protein_1 12.7 sample_2 peptid… NA conditio…
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹peptide_intensity_missing
#> # ℹ 4 more variables: comparison <chr>, missingness <chr>,
#> # imputed_intensity <dbl>, imputed <lgl>