Label-free protein quantification — calculate_protein

Determines relative protein abundances from ion quantification. Only proteins with at least three peptides are considered for quantification. The three peptide rule applies for each sample independently.

calculate_protein_abundance(
  data,
  sample,
  protein_id,
  precursor,
  peptide,
  intensity_log2,
  min_n_peptides = 3,
  method = "sum",
  for_plot = FALSE,
  retain_columns = NULL
)

Arguments

data: a data frame that contains at least the input variables.
sample: a character column in the data data frame that contains the sample name.
protein_id: a character column in the data data frame that contains the protein accession numbers.
precursor: a character column in the data data frame that contains precursors.
peptide: a character column in the data data frame that contains peptide sequences. This column is needed to filter for proteins with at least 3 unique peptides. This can equate to more than three precursors. The quantification is done on the precursor level.
intensity_log2: a numeric column in the data data frame that contains log2 transformed precursor intensities.
min_n_peptides: An integer specifying the minimum number of peptides required for a protein to be included in the analysis. The default value is 3, which means proteins with fewer than three unique peptides will be excluded from the analysis.
method: a character value specifying with which method protein quantities should be calculated. Possible options include "sum", which takes the sum of all precursor intensities as the protein abundance. Another option is "iq", which performs protein quantification based on a maximal peptide ratio extraction algorithm that is adapted from the MaxLFQ algorithm of the MaxQuant software. Functions from the iq package (doi:10.1093/bioinformatics/btz961 ) are used. Default is "iq".
for_plot: a logical value indicating whether the result should be only protein intensities or protein intensities together with precursor intensities that can be used for plotting using peptide_profile_plot(). Default is FALSE.
retain_columns: a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

If for_plot = FALSE, protein abundances are returned, if for_plot = TRUE also precursor intensities are returned in a data frame. The later output is ideal for plotting with peptide_profile_plot() and can be filtered to only include protein abundances.

Examples

# \donttest{
# Create example data
data <- data.frame(
  sample = c(
    rep("S1", 6),
    rep("S2", 6),
    rep("S1", 2),
    rep("S2", 2)
  ),
  protein_id = c(
    rep("P1", 12),
    rep("P2", 4)
  ),
  precursor = c(
    rep(c("A1", "A2", "B1", "B2", "C1", "D1"), 2),
    rep(c("E1", "F1"), 2)
  ),
  peptide = c(
    rep(c("A", "A", "B", "B", "C", "D"), 2),
    rep(c("E", "F"), 2)
  ),
  intensity = c(
    rnorm(n = 6, mean = 15, sd = 2),
    rnorm(n = 6, mean = 21, sd = 1),
    rnorm(n = 2, mean = 15, sd = 1),
    rnorm(n = 2, mean = 15, sd = 2)
  )
)

data
#>    sample protein_id precursor peptide intensity
#> 1      S1         P1        A1       A  17.47465
#> 2      S1         P1        A2       A  13.69787
#> 3      S1         P1        B1       B  16.66303
#> 4      S1         P1        B2       B  16.90101
#> 5      S1         P1        C1       C  18.17304
#> 6      S1         P1        D1       D  18.10713
#> 7      S2         P1        A1       A  22.14845
#> 8      S2         P1        A2       A  19.71673
#> 9      S2         P1        B1       B  21.58946
#> 10     S2         P1        B2       B  20.70163
#> 11     S2         P1        C1       C  21.05378
#> 12     S2         P1        D1       D  22.20131
#> 13     S1         P2        E1       E  15.78655
#> 14     S1         P2        F1       F  13.21559
#> 15     S2         P2        E1       E  15.72799
#> 16     S2         P2        F1       F  14.49363

# Calculate protein abundances
protein_abundance <- calculate_protein_abundance(
  data,
  sample = sample,
  protein_id = protein_id,
  precursor = precursor,
  peptide = peptide,
  intensity_log2 = intensity,
  method = "sum",
  for_plot = FALSE
)

protein_abundance
#> # A tibble: 2 × 3
#>   sample protein_id intensity
#>   <chr>  <chr>          <dbl>
#> 1 S1     P1              19.9
#> 2 S2     P1              24.1

# Calculate protein abundances and retain precursor
# abundances that can be used in a peptide profile plot
complete_abundances <- calculate_protein_abundance(
  data,
  sample = sample,
  protein_id = protein_id,
  precursor = precursor,
  peptide = peptide,
  intensity_log2 = intensity,
  method = "sum",
  for_plot = TRUE
)

complete_abundances
#> # A tibble: 14 × 5
#>    sample protein_id intensity precursor         peptide
#>    <chr>  <chr>          <dbl> <chr>             <chr>  
#>  1 S1     P1              19.9 protein_intensity NA     
#>  2 S2     P1              24.1 protein_intensity NA     
#>  3 S1     P1              17.5 A1                A      
#>  4 S1     P1              13.7 A2                A      
#>  5 S1     P1              16.7 B1                B      
#>  6 S1     P1              16.9 B2                B      
#>  7 S1     P1              18.2 C1                C      
#>  8 S1     P1              18.1 D1                D      
#>  9 S2     P1              22.1 A1                A      
#> 10 S2     P1              19.7 A2                A      
#> 11 S2     P1              21.6 B1                B      
#> 12 S2     P1              20.7 B2                B      
#> 13 S2     P1              21.1 C1                C      
#> 14 S2     P1              22.2 D1                D      
# }