R/calculate_protein_abundance.R
calculate_protein_abundance.RdDetermines relative protein abundances from ion quantification. Only proteins with at least three peptides are considered for quantification. The three peptide rule applies for each sample independently.
calculate_protein_abundance(
data,
sample,
protein_id,
precursor,
peptide,
intensity_log2,
min_n_peptides = 3,
method = "sum",
for_plot = FALSE,
retain_columns = NULL
)a data frame that contains at least the input variables.
a character column in the data data frame that contains the sample name.
a character column in the data data frame that contains the protein
accession numbers.
a character column in the data data frame that contains precursors.
a character column in the data data frame that contains peptide sequences.
This column is needed to filter for proteins with at least 3 unique peptides. This can equate
to more than three precursors. The quantification is done on the precursor level.
a numeric column in the data data frame that contains log2
transformed precursor intensities.
An integer specifying the minimum number of peptides required for a protein to be included in the analysis. The default value is 3, which means proteins with fewer than three unique peptides will be excluded from the analysis.
a character value specifying with which method protein quantities should be
calculated. Possible options include "sum", which takes the sum of all precursor
intensities as the protein abundance. Another option is "iq", which performs protein
quantification based on a maximal peptide ratio extraction algorithm that is adapted from the
MaxLFQ algorithm of the MaxQuant software. Functions from the
iq package (doi:10.1093/bioinformatics/btz961
) are used. Default is "iq".
a logical value indicating whether the result should be only protein intensities
or protein intensities together with precursor intensities that can be used for plotting using
peptide_profile_plot(). Default is FALSE.
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns retain_columns = NULL. Specific
columns can be retained by providing their names (not in quotations marks, just like other
column names, but in a vector).
If for_plot = FALSE, protein abundances are returned, if for_plot = TRUE
also precursor intensities are returned in a data frame. The later output is ideal for plotting
with peptide_profile_plot() and can be filtered to only include protein abundances.
# \donttest{
# Create example data
data <- data.frame(
sample = c(
rep("S1", 6),
rep("S2", 6),
rep("S1", 2),
rep("S2", 2)
),
protein_id = c(
rep("P1", 12),
rep("P2", 4)
),
precursor = c(
rep(c("A1", "A2", "B1", "B2", "C1", "D1"), 2),
rep(c("E1", "F1"), 2)
),
peptide = c(
rep(c("A", "A", "B", "B", "C", "D"), 2),
rep(c("E", "F"), 2)
),
intensity = c(
rnorm(n = 6, mean = 15, sd = 2),
rnorm(n = 6, mean = 21, sd = 1),
rnorm(n = 2, mean = 15, sd = 1),
rnorm(n = 2, mean = 15, sd = 2)
)
)
data
#> sample protein_id precursor peptide intensity
#> 1 S1 P1 A1 A 17.47465
#> 2 S1 P1 A2 A 13.69787
#> 3 S1 P1 B1 B 16.66303
#> 4 S1 P1 B2 B 16.90101
#> 5 S1 P1 C1 C 18.17304
#> 6 S1 P1 D1 D 18.10713
#> 7 S2 P1 A1 A 22.14845
#> 8 S2 P1 A2 A 19.71673
#> 9 S2 P1 B1 B 21.58946
#> 10 S2 P1 B2 B 20.70163
#> 11 S2 P1 C1 C 21.05378
#> 12 S2 P1 D1 D 22.20131
#> 13 S1 P2 E1 E 15.78655
#> 14 S1 P2 F1 F 13.21559
#> 15 S2 P2 E1 E 15.72799
#> 16 S2 P2 F1 F 14.49363
# Calculate protein abundances
protein_abundance <- calculate_protein_abundance(
data,
sample = sample,
protein_id = protein_id,
precursor = precursor,
peptide = peptide,
intensity_log2 = intensity,
method = "sum",
for_plot = FALSE
)
protein_abundance
#> # A tibble: 2 × 3
#> sample protein_id intensity
#> <chr> <chr> <dbl>
#> 1 S1 P1 19.9
#> 2 S2 P1 24.1
# Calculate protein abundances and retain precursor
# abundances that can be used in a peptide profile plot
complete_abundances <- calculate_protein_abundance(
data,
sample = sample,
protein_id = protein_id,
precursor = precursor,
peptide = peptide,
intensity_log2 = intensity,
method = "sum",
for_plot = TRUE
)
complete_abundances
#> # A tibble: 14 × 5
#> sample protein_id intensity precursor peptide
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 S1 P1 19.9 protein_intensity NA
#> 2 S2 P1 24.1 protein_intensity NA
#> 3 S1 P1 17.5 A1 A
#> 4 S1 P1 13.7 A2 A
#> 5 S1 P1 16.7 B1 B
#> 6 S1 P1 16.9 B2 B
#> 7 S1 P1 18.2 C1 C
#> 8 S1 P1 18.1 D1 D
#> 9 S2 P1 22.1 A1 A
#> 10 S2 P1 19.7 A2 A
#> 11 S2 P1 21.6 B1 B
#> 12 S2 P1 20.7 B2 B
#> 13 S2 P1 21.1 C1 C
#> 14 S2 P1 22.2 D1 D
# }