Maps peptides onto a PDB structure or AlphaFold prediction — map_peptides_on

Peptides are mapped onto PDB structures or AlphaFold prediction based on their positions. This is accomplished by replacing the B-factor information in the structure file with values that allow highlighting of peptides, protein regions or amino acids when the structure is coloured by B-factor. In addition to simply highlighting peptides, protein regions or amino acids, a continuous variable such as fold changes associated with them can be mapped onto the structure as a colour gradient.

map_peptides_on_structure(
  peptide_data,
  uniprot_id,
  pdb_id,
  chain,
  auth_seq_id,
  map_value,
  file_format = ".cif",
  scale_per_structure = TRUE,
  export_location = NULL,
  structure_file = NULL,
  show_progress = TRUE
)

Arguments

peptide_data: a data frame that contains the input columns to this function. If structure or prediction files should be fetched automatically, please provide column names to the following arguments: uniprot_id, pdb_id, chain, auth_seq_id, map_value. If no PDB structure for a protein is available the pdb_id and chain column should contain NA at these positions. If a structure or prediction file is provided in the structure_file argument, this data frame should only contain information associated with the provided structure. In case of a user provided structure, column names should be provided to the following arguments: uniprot_id, chain, auth_seq_id, map_value.
uniprot_id: a character column in the peptide_data data frame that contains UniProt identifiers for a corresponding peptide, protein region or amino acid.
pdb_id: a character column in the peptide_data data frame that contains PDB identifiers for structures in which a corresponding peptide, protein region or amino acid is found. If a protein prediction should be fetched from AlphaFold, this column should contain NA. This column is not required if a structure or prediction file is provided in the structure_file argument.
chain: a character column in the peptide_data data frame that contains the name of the chain from the PDB structure in which the peptide, protein region or amino acid is found. If a protein prediction should be fetched from AlphaFold, this column should contain NA. If an AlphaFold prediction is provided to the structure_file argument the chain should be provided as usual (All AlphaFold predictions only have chain A). Important: please provide the author defined chain definitions for both ".cif" and ".pdb" files. When the output of the find_peptide_in_structure function is used as the input for this function, this corresponds to the auth_asym_id column.
auth_seq_id: optional, a character (or numeric) column in the peptide_data data frame that contains semicolon separated positions of peptides, protein regions or amino acids in the corresponding PDB structure or AlphaFold prediction. This information can be obtained from the find_peptide_in_structure function. The corresponding column in the output is called auth_seq_id. In case of AlphaFold predictions, UniProt positions should be used. If signal positions and not stretches of amino acids are provided, the column can be numeric and does not need to contain the semicolon separator.
map_value: a numeric column in the peptide_data data frame that contains a value associated with each peptide, protein region or amino acid. If one start to end position pair has multiple different map values, the maximum will be used. This value will be displayed as a colour gradient when mapped onto the structure. The value can for example be the fold change, p-value or score associated with each peptide, protein region or amino acid (selection). If the selections should be displayed with just one colour, the value in this column should be the same for every selection. For the mapping, values are scaled between 50 and 100. Regions in the structure that do not map any selection receive a value of 0. If an amino acid position is associated with multiple mapped values, e.g. from different peptides, the maximum mapped value will be displayed.
file_format: a character vector containing the file format of the structure that will be fetched from the database for the PDB identifiers provided in the pdb_id column. This can be either ".cif" or ".pdb". The default is ".cif". We recommend using ".cif" files since every structure contains a ".cif" file but not every structure contains a ".pdb" file. Fetching and mapping onto ".cif" files takes longer than for ".pdb" files. If a structure file is provided in the structure_file argument, the file format is detected automatically and does not need to be provided.
scale_per_structure: a logical value that specifies if scaling should be performed for each structure independently (TRUE) or over the whole data set (FALSE). The default is TRUE, which scales the scores of each structure independently so that each structure has a score range from 50 to 100.
export_location: optional, a character argument specifying the path to the location in which the fetched and altered structure files should be saved. If left empty, they will be saved in the current working directory. The location should be provided in the following format "folderA/folderB".
structure_file: optional, a character argument specifying the path to the location and name of a structure file in ".cif" or ".pdb" format. If a structure is provided the peptide_data data frame should only contain mapping information for this structure.
show_progress: a logical, if show_progress = TRUE, a progress bar will be shown (default is TRUE).

Value

The function exports a modified ".pdb" or ".cif" structure file. B-factors have been replaced with scaled (50-100) values provided in the map_value column.

Examples

# \donttest{
# Load libraries
library(dplyr)

# Create example data
peptide_data <- data.frame(
  uniprot_id = c("P0A8T7", "P0A8T7", "P60906"),
  peptide_sequence = c(
    "SGIVSFGKETKGKRRLVITPVDGSDPYEEMIPKWRQLNV",
    "NVFEGERVER",
    "AIGEVTDVVEKE"
  ),
  start = c(1160, 1197, 55),
  end = c(1198, 1206, 66),
  map_value = c(70, 100, 100)
)

# Find peptide positions in structures
positions_structure <- find_peptide_in_structure(
  peptide_data = peptide_data,
  peptide = peptide_sequence,
  start = start,
  end = end,
  uniprot_id = uniprot_id,
  retain_columns = c(map_value)) %>%
  filter(pdb_ids %in% c("6UU2", "2EL9"))
#> [2/6] Extract experimental conditions ... 
#> DONE (0.02s)
#> [3/6] Extracting polymer information: 
#> -> 1/6 UniProt IDs ... 
#> DONE (0.4s)
#> -> 2/6 UniProt alignment ... 
#> DONE (0.4s)
#> -> 3/6 Ligand binding sites ... 
#> DONE (2.87s)
#> -> 4/6 Modified monomers ... 
#> DONE (0.13s)
#> -> 5/6 Secondary structure ... 
#> DONE (0.75s)
#> -> 6/6 Unmodeled residues ... 
#> DONE (0.16s)
#> [4/6] Correct author sequence positions for some PDB IDs ... 
#> None to correct(0.22s)
#> [5/6] Extract non-polymer information ... 
#> DONE (0.01s)
#> [6/6] Combine information ... 
#> DONE (0.44s)

# Map peptides on structures
# You can determine the preferred output location
# with the export_location argument. Currently it
# is saved in the working directory.
map_peptides_on_structure(
  peptide_data = positions_structure,
  uniprot_id = uniprot_id,
  pdb_id = pdb_ids,
  chain = auth_asym_id,
  auth_seq_id = auth_seq_id,
  map_value = map_value,
  file_format = ".pdb",
  export_location = getwd()
)
#> The following structures were not fetched, likely because no ".pdb"
#> file is available. Try using the ".cif" format for these.6UU2_P0A8T7

# }