R/calculate_protein_abundance.R
calculate_protein_abundance.Rd
Determines relative protein abundances from ion quantification. Only proteins with at least three peptides are considered for quantification. The three peptide rule applies for each sample independently.
calculate_protein_abundance(
data,
sample,
protein_id,
precursor,
peptide,
intensity_log2,
min_n_peptides = 3,
method = "sum",
for_plot = FALSE,
retain_columns = NULL
)
a data frame that contains at least the input variables.
a character column in the data
data frame that contains the sample name.
a character column in the data
data frame that contains the protein
accession numbers.
a character column in the data
data frame that contains precursors.
a character column in the data
data frame that contains peptide sequences.
This column is needed to filter for proteins with at least 3 unique peptides. This can equate
to more than three precursors. The quantification is done on the precursor level.
a numeric column in the data
data frame that contains log2
transformed precursor intensities.
An integer specifying the minimum number of peptides required for a protein to be included in the analysis. The default value is 3, which means proteins with fewer than three unique peptides will be excluded from the analysis.
a character value specifying with which method protein quantities should be
calculated. Possible options include "sum"
, which takes the sum of all precursor
intensities as the protein abundance. Another option is "iq"
, which performs protein
quantification based on a maximal peptide ratio extraction algorithm that is adapted from the
MaxLFQ algorithm of the MaxQuant software. Functions from the
iq
package (doi:10.1093/bioinformatics/btz961
) are used. Default is "iq"
.
a logical value indicating whether the result should be only protein intensities
or protein intensities together with precursor intensities that can be used for plotting using
peptide_profile_plot()
. Default is FALSE
.
a vector indicating if certain columns should be retained from the input
data frame. Default is not retaining additional columns retain_columns = NULL
. Specific
columns can be retained by providing their names (not in quotations marks, just like other
column names, but in a vector).
If for_plot = FALSE
, protein abundances are returned, if for_plot = TRUE
also precursor intensities are returned in a data frame. The later output is ideal for plotting
with peptide_profile_plot()
and can be filtered to only include protein abundances.
# \donttest{
# Create example data
data <- data.frame(
sample = c(
rep("S1", 6),
rep("S2", 6),
rep("S1", 2),
rep("S2", 2)
),
protein_id = c(
rep("P1", 12),
rep("P2", 4)
),
precursor = c(
rep(c("A1", "A2", "B1", "B2", "C1", "D1"), 2),
rep(c("E1", "F1"), 2)
),
peptide = c(
rep(c("A", "A", "B", "B", "C", "D"), 2),
rep(c("E", "F"), 2)
),
intensity = c(
rnorm(n = 6, mean = 15, sd = 2),
rnorm(n = 6, mean = 21, sd = 1),
rnorm(n = 2, mean = 15, sd = 1),
rnorm(n = 2, mean = 15, sd = 2)
)
)
data
#> sample protein_id precursor peptide intensity
#> 1 S1 P1 A1 A 17.47465
#> 2 S1 P1 A2 A 13.69787
#> 3 S1 P1 B1 B 16.66303
#> 4 S1 P1 B2 B 16.90101
#> 5 S1 P1 C1 C 18.17304
#> 6 S1 P1 D1 D 18.10713
#> 7 S2 P1 A1 A 22.14845
#> 8 S2 P1 A2 A 19.71673
#> 9 S2 P1 B1 B 21.58946
#> 10 S2 P1 B2 B 20.70163
#> 11 S2 P1 C1 C 21.05378
#> 12 S2 P1 D1 D 22.20131
#> 13 S1 P2 E1 E 15.78655
#> 14 S1 P2 F1 F 13.21559
#> 15 S2 P2 E1 E 15.72799
#> 16 S2 P2 F1 F 14.49363
# Calculate protein abundances
protein_abundance <- calculate_protein_abundance(
data,
sample = sample,
protein_id = protein_id,
precursor = precursor,
peptide = peptide,
intensity_log2 = intensity,
method = "sum",
for_plot = FALSE
)
protein_abundance
#> # A tibble: 2 × 3
#> sample protein_id intensity
#> <chr> <chr> <dbl>
#> 1 S1 P1 19.9
#> 2 S2 P1 24.1
# Calculate protein abundances and retain precursor
# abundances that can be used in a peptide profile plot
complete_abundances <- calculate_protein_abundance(
data,
sample = sample,
protein_id = protein_id,
precursor = precursor,
peptide = peptide,
intensity_log2 = intensity,
method = "sum",
for_plot = TRUE
)
complete_abundances
#> # A tibble: 14 × 5
#> sample protein_id intensity precursor peptide
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 S1 P1 19.9 protein_intensity NA
#> 2 S2 P1 24.1 protein_intensity NA
#> 3 S1 P1 17.5 A1 A
#> 4 S1 P1 13.7 A2 A
#> 5 S1 P1 16.7 B1 B
#> 6 S1 P1 16.9 B2 B
#> 7 S1 P1 18.2 C1 C
#> 8 S1 P1 18.1 D1 D
#> 9 S2 P1 22.1 A1 A
#> 10 S2 P1 19.7 A2 A
#> 11 S2 P1 21.6 B1 B
#> 12 S2 P1 20.7 B2 B
#> 13 S2 P1 21.1 C1 C
#> 14 S2 P1 22.2 D1 D
# }