Fetches either domain level information with e.g. gene ontology annotations or residue level information from the InterPro database.

fetch_interpro(
  uniprot_ids = NULL,
  return_residue_info = FALSE,
  manual_query = NULL,
  page_size = 200,
  max_tries = 3,
  timeout = 20,
  show_progress = TRUE
)

Arguments

uniprot_ids

a character vector of UniProt accession numbers.

return_residue_info

a logical value that specifies if either domain or residue information should be returned by the function. The default is FALSE.

manual_query

optional, a character value that is a custom query to the InterPro database. This query is pastes after "https://www.ebi.ac.uk/interpro/api/" and before "&page_size=200". The raw data of the query is returned as a list.

page_size

a numeric value that specifies the number of entries that should be retrieved per page of a request. The function anyway iterates through all pages, but this parameters allows you to finetune the number of iterations and thus number of requests to the database. Default is 200.

max_tries

a numeric value that specifies the number of times the function tries to download the data in case an error occurs. The default is 3.

timeout

a numeric value that specifies the maximum request time per try. Default is 20 seconds.

show_progress

a logical value that determines if a progress bar will be shown. Default is TRUE.

Value

A data frame that contains either domain or residue level information for the provided UniProt IDs.

Examples

# \donttest{
uniprot_ids <- c("P36578", "O43324", "Q00796", "O32583")

domain_info <- fetch_interpro(uniprot_ids = uniprot_ids)
#> Fetching InterPro Domains ■■■■■■■■■                         25% (1/4) ETA:  4s

head(domain_info)
#> # A tibble: 6 × 13
#>   identifier identifier_name        identifier_source_da…¹ identifier_type go_id
#>   <chr>      <chr>                  <chr>                  <chr>           <chr>
#> 1 IPR002136  Large ribosomal subun… interpro               family          GO:0…
#> 2 IPR002136  Large ribosomal subun… interpro               family          GO:0…
#> 3 IPR002136  Large ribosomal subun… interpro               family          GO:0…
#> 4 IPR013000  Large ribosomal subun… interpro               conserved_site  NA   
#> 5 IPR023574  Large ribosomal subun… interpro               homologous_sup… GO:0…
#> 6 IPR023574  Large ribosomal subun… interpro               homologous_sup… GO:0…
#> # ℹ abbreviated name: ¹​identifier_source_database
#> # ℹ 8 more variables: go_name <chr>, go_code <chr>, go_type <chr>, start <int>,
#> #   end <int>, dc_status <chr>, representative <lgl>, accession <chr>

residue_info <- fetch_interpro(
  uniprot_ids = uniprot_ids,
  return_residue_info = TRUE
)

head(residue_info)
#>   accession start end residues     fragment_description source_database
#> 1    P36578    NA  NA     <NA>                     <NA>            <NA>
#> 2    O43324    70  70        I putative MetRS interface             cdd
#> 3    O43324    97  97        D putative MetRS interface             cdd
#> 4    O43324   100 100        S putative MetRS interface             cdd
#> 5    O43324   101 101        Y putative MetRS interface             cdd
#> 6    O43324   103 103        E putative MetRS interface             cdd
#>   source_accession source_name
#> 1             <NA>        <NA>
#> 2          cd10305 GST_C_AIMP3
#> 3          cd10305 GST_C_AIMP3
#> 4          cd10305 GST_C_AIMP3
#> 5          cd10305 GST_C_AIMP3
#> 6          cd10305 GST_C_AIMP3
# }