impute
is calculating imputation values for missing data depending on the selected
method.
impute(
data,
sample,
grouping,
intensity_log2,
condition,
comparison = comparison,
missingness = missingness,
noise = NULL,
method = "ludovic",
skip_log2_transform_error = FALSE,
retain_columns = NULL
)
a data frame that is ideally the output from the assign_missingness
function.
It should containing at least the input variables. For each "reference_vs_treatment" comparison,
there should be the pair of the reference and treatment condition. That means the reference
condition should be doublicated once for every treatment.
a character column in the data
data frame that contains the sample names.
a character column in the data
data frame that contains the precursor or
peptide identifiers.
a numeric column in the data
data frame that contains the intensity
values.
a character or numeric column in the data
data frame that contains the
the conditions.
a character column in the data
data frame that contains the the
comparisons of treatment/reference pairs. This is an output of the assign_missingnes
function.
a character column in the data
data frame that contains the
missingness type of the data determines how values for imputation are sampled. This should at
least contain "MAR"
or "MNAR"
. Missingness assigned as NA
will not be imputed.
a numeric column in the data
data frame that contains the noise value for
the precursor/peptide. Is only required if method = "noise"
. Note: Noise values need to
be log2 transformed.
a character value that specifies the method to be used for imputation. For
method = "ludovic"
, MNAR missingness is sampled from a normal distribution around a
value that is three lower (log2) than the lowest intensity value recorded for the
precursor/peptide and that has a spread of the mean standard deviation for the
precursor/peptide. For method = "noise"
, MNAR missingness is sampled from a normal
distribution around the mean noise for the precursor/peptide and that has a spread of the
mean standard deviation (from each condition) for the precursor/peptide. Both methods impute
MAR data using the mean and variance of the condition with the missing data.
a logical value that determines if a check is performed to validate that input values are log2 transformed. If input values are > 40 the test is failed and an error is returned.
a vector that indicates columns that should be retained from the input
data frame. Default is not retaining additional columns retain_columns = NULL
. Specific
columns can be retained by providing their names (not in quotations marks, just like other
column names, but in a vector).
A data frame that contains an imputed_intensity
and imputed
column in
addition to the required input columns. The imputed
column indicates if a value was
imputed. The imputed_intensity
column contains imputed intensity values for previously
missing intensities.
set.seed(123) # Makes example reproducible
# Create example data
data <- create_synthetic_data(
n_proteins = 10,
frac_change = 0.5,
n_replicates = 4,
n_conditions = 2,
method = "effect_random",
additional_metadata = FALSE
)
head(data, n = 24)
#> # A tibble: 24 × 8
#> protein peptide condition sample peptide_intensity change change_peptide
#> <chr> <chr> <chr> <chr> <dbl> <lgl> <lgl>
#> 1 protein_1 peptide_1… conditio… sampl… 16.8 TRUE TRUE
#> 2 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 3 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 4 protein_1 peptide_1… conditio… sampl… 17.0 TRUE TRUE
#> 5 protein_1 peptide_1… conditio… sampl… 15.8 TRUE TRUE
#> 6 protein_1 peptide_1… conditio… sampl… 15.9 TRUE TRUE
#> 7 protein_1 peptide_1… conditio… sampl… 16.1 TRUE TRUE
#> 8 protein_1 peptide_1… conditio… sampl… 15.9 TRUE TRUE
#> 9 protein_1 peptide_1… conditio… sampl… 12.6 TRUE FALSE
#> 10 protein_1 peptide_1… conditio… sampl… 12.7 TRUE FALSE
#> # ℹ 14 more rows
#> # ℹ 1 more variable: peptide_intensity_missing <dbl>
# Assign missingness information
data_missing <- assign_missingness(
data,
sample = sample,
condition = condition,
grouping = peptide,
intensity = peptide_intensity_missing,
ref_condition = "all",
retain_columns = c(protein, peptide_intensity)
)
#> "all" was provided as reference condition. All pairwise comparisons are
#> created from the conditions and assigned their missingness. The created
#> comparisons are:
#> condition_1_vs_condition_2
head(data_missing, n = 24)
#> # A tibble: 24 × 8
#> protein peptide_intensity sample condition peptide peptide_intensity_mi…¹
#> <chr> <dbl> <chr> <chr> <chr> <dbl>
#> 1 protein_1 16.8 sample_1 conditio… peptid… 16.8
#> 2 protein_1 17.0 sample_2 conditio… peptid… 17.0
#> 3 protein_1 17.0 sample_3 conditio… peptid… 17.0
#> 4 protein_1 17.0 sample_4 conditio… peptid… 17.0
#> 5 protein_1 15.8 sample_5 conditio… peptid… 15.8
#> 6 protein_1 15.9 sample_6 conditio… peptid… 15.9
#> 7 protein_1 16.1 sample_7 conditio… peptid… 16.1
#> 8 protein_1 15.9 sample_8 conditio… peptid… 15.9
#> 9 protein_1 12.6 sample_1 conditio… peptid… NA
#> 10 protein_1 12.7 sample_2 conditio… peptid… NA
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹peptide_intensity_missing
#> # ℹ 2 more variables: comparison <chr>, missingness <chr>
# Perform imputation
data_imputed <- impute(
data_missing,
sample = sample,
grouping = peptide,
intensity_log2 = peptide_intensity_missing,
condition = condition,
comparison = comparison,
missingness = missingness,
method = "ludovic",
retain_columns = c(protein, peptide_intensity)
)
head(data_imputed, n = 24)
#> # A tibble: 24 × 10
#> protein peptide_intensity sample peptide peptide_intensity_mi…¹ condition
#> <chr> <dbl> <chr> <chr> <dbl> <chr>
#> 1 protein_1 16.8 sample_1 peptid… 16.8 conditio…
#> 2 protein_1 17.0 sample_2 peptid… 17.0 conditio…
#> 3 protein_1 17.0 sample_3 peptid… 17.0 conditio…
#> 4 protein_1 17.0 sample_4 peptid… 17.0 conditio…
#> 5 protein_1 15.8 sample_5 peptid… 15.8 conditio…
#> 6 protein_1 15.9 sample_6 peptid… 15.9 conditio…
#> 7 protein_1 16.1 sample_7 peptid… 16.1 conditio…
#> 8 protein_1 15.9 sample_8 peptid… 15.9 conditio…
#> 9 protein_1 12.6 sample_1 peptid… NA conditio…
#> 10 protein_1 12.7 sample_2 peptid… NA conditio…
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹peptide_intensity_missing
#> # ℹ 4 more variables: comparison <chr>, missingness <chr>,
#> # imputed_intensity <dbl>, imputed <lgl>