Information of metal binding proteins is extracted from UniProt data retrieved with fetch_uniprot. ChEBI IDs, potential sub-IDs for metal cations, binding site locations in the protein and sub-ID evidence level (based on metal presence as cofactor) are extracted.

extract_metal_binders(
  data,
  protein_id = id,
  feature_metal_binding = feature_metal_binding,
  chebi_cofactor = chebi_cofactor,
  chebi_catalytic_activity = chebi_catalytic_activity,
  chebi_data = NULL,
  chebi_relation_data = NULL
)

Arguments

data

a data frame containing at least the input columns.

protein_id

a character column in the data data frame that contains the protein identifiers.

feature_metal_binding

a character column in the data data frame that contains the feature metal binding information from UniProt.

chebi_cofactor

a character column in the data data frame that contains the ChEBI cofactor information from UniProt.

chebi_catalytic_activity

a character column in the data data frame that contains the ChEBI catalytic activity information from UniProt.

chebi_data

optional, a data frame that can be manually obtained with fetch_chebi(). If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

chebi_relation_data

optional, a data frame that can be manually obtained with fetch_chebi(relation = TRUE). If not provided it will be fetched within the function. If the function is run many times it is recommended to provide the data frame to save time.

Value

A data frame containing information on protein metal binding state. It contains the following types of columns (the naming might vary based on the input):

  • protein_id: UniProt protein identifier.

  • source: The source of the information, can be either feature_metal_binding, chebi_cofactor or chebi_catalytic_activity.

  • ids: ChEBI ID assigned to protein and binding site based on metal_type column name. These are general IDs that have sub-IDs. Thus, they generally describe the type of metal ion bound to the protein.

  • metal_position: Amino acid position within the protein that is involved in metal binding.

  • metal_type: Metal name extracted from feature_metal_binding information. This is the name that is used as a search pattern in order to assign a ChEBI ID with the split_metal_name helper function within this function.

  • sub_ids: ChEBI ID that is a sub-ID (incoming) of the ID in the ids column. Thus, they more specifically describe the potential nature of the metal ion.

  • main_id_name: Official ChEBI name associated with the ID in the ids column.

  • multi_evidence: If there is overlapping information in feature_metal_binding and chebi_cofactor or chebi_catalytic_activity, only feature_metal_binding is retained and multi_evidence is TRUE.

  • sub_id_name: Official ChEBI name associated with the ID in the sub_ids column.

Examples

# \donttest{ # Create example data data <- fetch_uniprot( uniprot_ids = c("Q03640", "Q03778", "P22276"), columns = c( "feature(METAL BINDING)", "chebi(Cofactor)", "chebi(Catalytic activity)" ) ) # Extract metal binding information metal_info <- extract_metal_binders( data = data, protein_id = id, feature_metal_binding = feature_metal_binding, chebi_cofactor = chebi_cofactor, chebi_catalytic_activity = chebi_catalytic_activity ) metal_info
#> # A tibble: 54 × 9 #> # Groups: id [3] #> id source ids metal_position metal_type sub_ids main_id_name #> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> #> 1 Q03640 feature_metal_b… 39123 1150 Calcium 39123 calcium cati… #> 2 Q03640 feature_metal_b… 39123 1150 Calcium 29108 calcium cati… #> 3 Q03640 feature_metal_b… 39123 1150 Calcium 39099 calcium cati… #> 4 Q03640 feature_metal_b… 39124 1150 Calcium 39124 calcium ion #> 5 Q03640 feature_metal_b… 39124 1150 Calcium 39123 calcium ion #> 6 Q03640 feature_metal_b… 39124 1150 Calcium 29108 calcium ion #> 7 Q03640 feature_metal_b… 39124 1150 Calcium 39099 calcium ion #> 8 Q03640 feature_metal_b… 39123 1156 Calcium 39123 calcium cati… #> 9 Q03640 feature_metal_b… 39123 1156 Calcium 29108 calcium cati… #> 10 Q03640 feature_metal_b… 39123 1156 Calcium 39099 calcium cati… #> # … with 44 more rows, and 2 more variables: multi_evidence <lgl>, #> # sub_id_name <chr>
# }