Calculate scores for each amino acid position in a protein sequence

Calculate a score for each amino acid position in a protein sequence based on the product of the -log10(adjusted p-value) and the absolute log2(fold change) per peptide covering this amino acid. In detail, all the peptides are aligned along the sequence of the corresponding protein, and the average score per amino acid position is computed. In a limited proteolysis coupled to mass spectrometry (LiP-MS) experiment, the score allows to prioritize and narrow down structurally affected regions.

calculate_aa_scores(
  data,
  protein,
  diff = diff,
  adj_pval = adj_pval,
  start_position,
  end_position,
  retain_columns = NULL
)

Arguments

data: a data frame containing at least the input columns.
protein: a character column in the data frame containing the protein identifier or name.
diff: a numeric column in the data data frame containing the log2 fold change.
adj_pval: a numeric column in the data data frame containing the adjusted p-value.
start_position: a numeric column data in the data frame containing the start position of a peptide or precursor.
end_position: a numeric column in the data frame containing the end position of a peptide or precursor.
retain_columns: a vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains the aggregated scores per amino acid position, enabling to draw fingerprints for each individual protein.

Author

Patrick Stalder

Examples


data <- data.frame(
  pg_protein_accessions = c(rep("protein_1", 10)),
  diff = c(2, -3, 1, 2, 3, -3, 5, 1, -0.5, 2),
  adj_pval = c(0.001, 0.01, 0.2, 0.05, 0.002, 0.5, 0.4, 0.7, 0.001, 0.02),
  start = c(1, 3, 5, 10, 15, 25, 28, 30, 41, 51),
  end = c(6, 8, 10, 16, 23, 35, 35, 35, 48, 55)
)
calculate_aa_scores(
  data,
  protein = pg_protein_accessions,
  diff = diff,
  adj_pval = adj_pval,
  start_position = start,
  end_position = end
)
#> # A tibble: 47 × 3
#> # Groups:   pg_protein_accessions, residue [47]
#>    pg_protein_accessions residue amino_acid_score
#>    <chr>                   <int>            <dbl>
#>  1 protein_1                   1            6    
#>  2 protein_1                   2            6    
#>  3 protein_1                   3            6    
#>  4 protein_1                   4            6    
#>  5 protein_1                   5            4.23 
#>  6 protein_1                   6            4.23 
#>  7 protein_1                   7            3.35 
#>  8 protein_1                   8            3.35 
#>  9 protein_1                   9            0.699
#> 10 protein_1                  10            1.65 
#> # ℹ 37 more rows