Analyses enrichment of KEGG pathways associated with proteins in the fraction of significant proteins compared to all detected proteins. A Fisher's exact test is performed to test significance of enrichment.

calculate_kegg_enrichment(
  data,
  protein_id,
  is_significant,
  pathway_id = pathway_id,
  pathway_name = pathway_name,
  plot = TRUE,
  plot_cutoff = "adj_pval top10"
)

Arguments

data

a data frame that contains at least the input variables.

protein_id

a character column in the data data frame that contains the protein accession numbers.

is_significant

a logical column in the data data frame that indicates if the corresponding protein has a significantly changing peptide. The input data frame may contain peptide level information with significance information. The function is able to extract protein level information from this.

pathway_id

a character column in the data data frame that contains KEGG pathway identifiers. These can be obtained from KEGG using fetch_kegg.

pathway_name

a character column in the data data frame that contains KEGG pathway names. These can be obtained from KEGG using fetch_kegg.

plot

a logical value indicating whether the result should be plotted or returned as a table.

plot_cutoff

a character value indicating if the plot should contain the top 10 most significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used to determine the number of GO terms in the plot. This information should be provided with the type first followed by the threshold separated by a space. Example are plot_cutoff = "adj_pval top10", plot_cutoff = "pval 0.05" or plot_cutoff = "adj_pval 0.01". The threshold can be chosen freely.

Value

A bar plot displaying negative log10 adjusted p-values for the top 10 enriched pathways. Bars are coloured according to the direction of the enrichment. If plot = FALSE, a data frame is returned.

Examples

# \donttest{ # Load libraries library(dplyr) set.seed(123) # Makes example reproducible # Create example data data <- fetch_kegg(species = "eco") %>% group_by(uniprot_id) %>% mutate(significant = rep(sample( x = c(TRUE, FALSE), size = 1, replace = TRUE, prob = c(0.2, 0.8) ), n = n() )) # Plot KEGG enrichment calculate_kegg_enrichment( data, protein_id = uniprot_id, is_significant = significant, pathway_id = pathway_id, pathway_name = pathway_name, plot = TRUE, plot_cutoff = "pval 0.05" )
# Calculate KEGG enrichment kegg <- calculate_kegg_enrichment( data, protein_id = uniprot_id, is_significant = significant, pathway_id = pathway_id, pathway_name = pathway_name, plot = FALSE ) head(kegg, n = 10)
#> # A tibble: 10 × 10 #> pathway_id pathway_name pval adj_pval n_detected_prot… n_detected_prot… #> <chr> <chr> <dbl> <dbl> <int> <dbl> #> 1 path:eco00740 Riboflavin … 0.00342 0.414 1588 12 #> 2 path:eco01250 Biosynthesi… 0.0101 0.613 1588 44 #> 3 path:eco02030 Bacterial c… 0.0209 0.651 1588 20 #> 4 path:eco00541 O-Antigen n… 0.0215 0.651 1588 21 #> 5 path:eco00040 Pentose and… 0.0324 0.765 1588 37 #> 6 path:eco00590 Arachidonic… 0.0379 0.765 1588 2 #> 7 path:eco01230 Biosynthesi… 0.0870 1 1588 117 #> 8 path:eco00730 Thiamine me… 0.0907 1 1588 15 #> 9 path:eco02026 Biofilm for… 0.115 1 1588 53 #> 10 path:eco00540 Lipopolysac… 0.149 1 1588 38 #> # … with 4 more variables: n_significant_proteins <int>, #> # n_significant_proteins_in_pathway <dbl>, n_proteins_expected <dbl>, #> # direction <chr>
# }