Data filtering based on coefficients of variation (CV)

Filters the input data based on precursor, peptide or protein intensity coefficients of variation. The function should be used to ensure that only robust measurements and quantifications are used for data analysis. It is advised to use the function after inspection of raw values (quality control) and median normalisation. Generally, the function calculates CVs of each peptide, precursor or protein for each condition and removes peptides, precursors or proteins that have a CV above the cutoff in less than the (user-defined) required number of conditions. Since the user-defined cutoff is fixed and does not depend on the number of conditions that have detected values, the function might bias for data completeness.

filter_cv(
  data,
  grouping,
  condition,
  log2_intensity,
  cv_limit = 0.25,
  min_conditions,
  silent = FALSE
)

Arguments

data: a data frame that contains at least the input variables.
grouping: a character column in the data data frame that contains the grouping variable that can be either precursors, peptides or proteins.
condition: a character or numeric column in the data data frame that contains information on the sample condition.
log2_intensity: a numeric column in the data data frame that contains log2 transformed intensities.
cv_limit: optional, a numeric value that specifies the CV cutoff that will be applied. Default is 0.25.
min_conditions: a numeric value that specifies the minimum number of conditions for which grouping CVs should be below the cutoff.
silent: a logical value that specifies if a message with the number of filtered out conditions should be returned. Default is FALSE.

Value

The CV filtered data frame.

Examples

set.seed(123) # Makes example reproducible

# Create synthetic data
data <- create_synthetic_data(
  n_proteins = 50,
  frac_change = 0.05,
  n_replicates = 3,
  n_conditions = 2,
  method = "effect_random",
  additional_metadata = FALSE
)

# Filter coefficients of variation
data_filtered <- filter_cv(
  data = data,
  grouping = peptide,
  condition = condition,
  log2_intensity = peptide_intensity_missing,
  cv_limit = 0.25,
  min_conditions = 2
)
#> 704 groups of 704 were filtered out. 0% of data remains.