Fetches protein metadata from UniProt.

fetch_uniprot(
  uniprot_ids,
  columns = c("protein names", "length", "sequence", "genes", "database(GeneID)",
    "database(String)", "go(molecular function)", "go(biological process)",
    "go(cellular compartment)", "interactor", "feature(ACTIVE SITE)",
    "feature(BINDING SITE)", "feature(METAL BINDING)", "chebi(Cofactor)",
    "chebi(Catalytic activity)", "database(PDB)"),
  batchsize = 200,
  show_progress = TRUE
)

Arguments

uniprot_ids

a character vector of UniProt accession numbers.

columns

a character vector of metadata columns that should be imported from UniProt (all possible columns can be found here.)

batchsize

a numeric value that specifies the number of proteins processed in a single single query. Default is 200.

show_progress

a logical value that determines if a progress bar will be shown. Default is TRUE.

Value

A data frame that contains all protein metadata specified in columns for the proteins provided. If an invalid ID was provided that contains a valid UniProt ID, the valid portion of the ID is fetched and the invalid input ID is saved in a column called input_id.

Examples

# \donttest{ fetch_uniprot(c("P36578", "O43324", "Q00796"))
#> # A tibble: 3 × 17 #> id protein_names length sequence genes database_gene_id database_string #> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> #> 1 P36578 60S ribosomal … 427 MACARPLI… RPL4… 6124; 9606.ENSP00000… #> 2 Q00796 Sorbitol dehyd… 357 MAAAAKPN… SORD 6652; 9606.ENSP00000… #> 3 O43324 Eukaryotic tra… 174 MAAAAELS… EEF1… 9521; 9606.ENSP00000… #> # … with 10 more variables: go_molecular_function <chr>, #> # go_biological_process <chr>, go_cellular_compartment <chr>, #> # interactor <chr>, feature_active_site <lgl>, feature_binding_site <chr>, #> # feature_metal_binding <chr>, chebi_cofactor <chr>, #> # chebi_catalytic_activity <chr>, database_pdb <chr>
# }