Fetch PDB structure atom data from RCSB — fetch_pdb

Fetches atom data for a PDB structure from RCSB. If you want to retrieve metadata about PDB structures, use the function fetch_pdb(). The information retrieved is based on the .cif file of the structure, which may vary from the .pdb file.

fetch_pdb_structure(pdb_ids, return_data_frame = FALSE, show_progress = TRUE)

Arguments

pdb_ids: a character vector of PDB identifiers.
return_data_frame: a logical value that indicates if a data frame instead of a list is returned. It is recommended to only use this if not many pdb structures are retrieved. Default is FALSE.
show_progress: a logical value that indicates if a progress bar will be shown. Default is TRUE.

Value

A list that contains atom data for each PDB structures provided. If return_data_frame is TRUE, a data frame with this information is returned instead. The data frame contains the following columns:

label_id: Uniquely identifies every atom in the structure following the standardised convention for mmCIF files. Example value: "5", "C12", "Ca3g28", "Fe3+17", "H*251", "boron2a", "C a phe 83 a 0", "Zn Zn 301 A 0"
type_symbol: The code used to identify the atom species representing this atom type. Normally this code is the element symbol. The code may be composed of any character except an underscore with the additional proviso that digits designate an oxidation state and must be followed by a + or - character. Example values: "C", "Cu2+", "H(SDS)", "dummy", "FeNi".
label_atom_id: Uniquely identifies every atom for the given residue following the standardised convention for mmCIF files. Example values: "CA", "HB1", "CB", "N"
label_comp_id: A chemical identifier for the residue. For protein polymer entities, this is the three- letter code for the amino acid. For nucleic acid polymer entities, this is the one-letter code for the base. Example values: "ala", "val", "A", "C".
label_asym_id: Chain identifier following the standardised convention for mmCIF files. Example values: "1", "A", "2B3".
entity_id: Records details about the molecular entities that are present in the crystallographic structure. Usually all different types of molecular entities such as polymer entities, non-polymer entities or water molecules are numbered once for each structure. Each type of non-polymer entity has its own number. Thus, the highest number in this column represents the number of different molecule types in the structure.
label_seq_id: Uniquely and sequentially identifies residues for each label_asym_id. This is always a number and the sequence of numbers always progresses in increasing numerical order.
x: The x coordinate of the atom.
y: The y coordinate of the atom.
z: The z coordinate of the atom.
site_occupancy: The fraction of the atom type present at this site.
b_iso_or_equivalent: Contains the B-factor or isotopic atomic displacement factor for each atom.
formal_charge: The net integer charge assigned to this atom. This is the formal charge assignment normally found in chemical diagrams. It is currently only assigned in a small subset of structures.
auth_seq_id: An alternative residue identifier (label_seq_id) provided by the author of the structure in order to match the identification used in the publication that describes the structure. This does not need to be numeric and is therefore of type character.
auth_comp_id: An alternative chemical identifier (label_comp_id) provided by the author of the structure in order to match the identification used in the publication that describes the structure.
auth_asym_id: An alternative chain identifier (label_asym_id) provided by the author of the structure in order to match the identification used in the publication that describes the structure.
pdb_model_number: The PDB model number.
pdb_id: The protein database identifier for the structure.

Examples

# \donttest{
pdb_structure <- fetch_pdb_structure(
  pdb_ids = c("6HG1", "1E9I", "6D3Q", "4JHW"),
  return_data_frame = TRUE
)

head(pdb_structure, n = 10)
#> # A tibble: 10 × 18
#>    label_id type_symbol label_atom_id label_comp_id label_asym_id entity_id
#>       <dbl> <chr>       <chr>         <chr>         <chr>             <dbl>
#>  1        1 N           N             ALA           A                     1
#>  2        2 C           CA            ALA           A                     1
#>  3        3 C           C             ALA           A                     1
#>  4        4 O           O             ALA           A                     1
#>  5        5 C           CB            ALA           A                     1
#>  6        6 H           HA            ALA           A                     1
#>  7        7 H           HB1           ALA           A                     1
#>  8        8 H           HB2           ALA           A                     1
#>  9        9 H           HB3           ALA           A                     1
#> 10       10 N           N             LYS           A                     1
#> # ℹ 12 more variables: label_seq_id <dbl>, x <dbl>, y <dbl>, z <dbl>,
#> #   site_occupancy <dbl>, b_iso_or_equivalent <dbl>, formal_charge <chr>,
#> #   auth_seq_id <chr>, auth_comp_id <chr>, auth_asym_id <chr>,
#> #   pdb_model_number <dbl>, pdb_id <chr>
# }