Fetches atom level data for AlphaFold predictions either for selected proteins or whole organisms.

fetch_alphafold_prediction(
  uniprot_ids = NULL,
  organism_name = NULL,
  version = "v4",
  timeout = 3600,
  max_tries = 5,
  return_data_frame = FALSE,
  show_progress = TRUE
)

Arguments

uniprot_ids

optional, a character vector of UniProt identifiers for which predictions should be fetched. This argument is mutually exclusive to the organism_name argument.

organism_name

optional, a character value providing the name of an organism for which all available AlphaFold predictions should be retreived. The name should be the capitalised scientific species name (e.g. "Homo sapiens"). Note: Some organisms contain a lot of predictions which might take a considerable amount of time and memory to fetch. Therefore, you should be sure that your system can handle fetching predictions for these organisms. This argument is mutually exclusive to the uniprot_ids argument.

version

a character value that specifies the alphafold version that should be used. This is regularly updated by the database. We always try to make the current version the default version. Available version can be found here: https://ftp.ebi.ac.uk/pub/databases/alphafold/

timeout

a numeric value specifying the time in seconds until the download of an organism archive times out. The default is 3600 seconds.

max_tries

a numeric value that specifies the number of times the function tries to download the data in case an error occurs. The default is 5. This only applies if uniprot_ids were provided.

return_data_frame

a logical value that specifies if true, a data frame instead of a list is returned. It is recommended to only use this if information for few proteins is retrieved. Default is FALSE.

show_progress

a logical value that specifies if true, a progress bar will be shown. Default is TRUE.

Value

A list that contains atom level data for AlphaFold predictions. If return_data_frame is TRUE, a data frame with this information is returned instead. The data frame contains the following columns:

  • label_id: Uniquely identifies every atom in the prediction following the standardised convention for mmCIF files.

  • type_symbol: The code used to identify the atom species representing this atom type. This code is the element symbol.

  • label_atom_id: Uniquely identifies every atom for the given residue following the standardised convention for mmCIF files.

  • label_comp_id: A chemical identifier for the residue. This is the three- letter code for the amino acid.

  • label_asym_id: Chain identifier following the standardised convention for mmCIF files. Since every prediction only contains one protein this is always "A".

  • label_seq_id: Uniquely and sequentially identifies residues for each protein. The numbering corresponds to the UniProt amino acid positions.

  • x: The x coordinate of the atom.

  • y: The y coordinate of the atom.

  • z: The z coordinate of the atom.

  • prediction_score: Contains the prediction score for each residue.

  • auth_seq_id: Same as label_seq_id. But of type character.

  • auth_comp_id: Same as label_comp_id.

  • auth_asym_id: Same as label_asym_id.

  • uniprot_id: The UniProt identifier of the predicted protein.

  • score_quality: Score annotations.

Examples

# \donttest{
alphafold <- fetch_alphafold_prediction(
  uniprot_ids = c("F4HVG8", "O15552"),
  return_data_frame = TRUE
)

head(alphafold, n = 10)
#> # A tibble: 10 × 15
#>    label_id type_symbol label_atom_id label_comp_id label_asym_id label_seq_id
#>       <dbl> <chr>       <chr>         <chr>         <chr>                <dbl>
#>  1        1 N           N             MET           A                        1
#>  2        2 C           CA            MET           A                        1
#>  3        3 C           C             MET           A                        1
#>  4        4 C           CB            MET           A                        1
#>  5        5 O           O             MET           A                        1
#>  6        6 C           CG            MET           A                        1
#>  7        7 S           SD            MET           A                        1
#>  8        8 C           CE            MET           A                        1
#>  9        9 N           N             LEU           A                        2
#> 10       10 C           CA            LEU           A                        2
#> # ℹ 9 more variables: x <dbl>, y <dbl>, z <dbl>, prediction_score <dbl>,
#> #   auth_seq_id <chr>, auth_comp_id <chr>, auth_asym_id <chr>,
#> #   uniprot_id <chr>, score_quality <chr>
# }