Perform KEGG pathway enrichment analysis — calculate_kegg

Analyses enrichment of KEGG pathways associated with proteins in the fraction of significant proteins compared to all detected proteins. A Fisher's exact test is performed to test significance of enrichment.

calculate_kegg_enrichment(
  data,
  protein_id,
  is_significant,
  pathway_id = pathway_id,
  pathway_name = pathway_name,
  plot = TRUE,
  plot_cutoff = "adj_pval top10"
)

Arguments

data: a data frame that contains at least the input variables.
protein_id: a character column in the data data frame that contains the protein accession numbers.
is_significant: a logical column in the data data frame that indicates if the corresponding protein has a significantly changing peptide. The input data frame may contain peptide level information with significance information. The function is able to extract protein level information from this.
pathway_id: a character column in the data data frame that contains KEGG pathway identifiers. These can be obtained from KEGG using fetch_kegg.
pathway_name: a character column in the data data frame that contains KEGG pathway names. These can be obtained from KEGG using fetch_kegg.
plot: a logical value indicating whether the result should be plotted or returned as a table.
plot_cutoff: a character value indicating if the plot should contain the top 10 most significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used to determine the number of GO terms in the plot. This information should be provided with the type first followed by the threshold separated by a space. Example are plot_cutoff = "adj_pval top10", plot_cutoff = "pval 0.05" or plot_cutoff = "adj_pval 0.01". The threshold can be chosen freely.

Value

A bar plot displaying negative log10 adjusted p-values for the top 10 enriched pathways. Bars are coloured according to the direction of the enrichment. If plot = FALSE, a data frame is returned.

Examples

# \donttest{
# Load libraries
library(dplyr)

set.seed(123) # Makes example reproducible

# Create example data
kegg_data <- fetch_kegg(species = "eco")

if (!is.null(kegg_data)) { # only proceed if information was retrieved
  data <- kegg_data %>%
    group_by(uniprot_id) %>%
    mutate(significant = rep(
      sample(
        x = c(TRUE, FALSE),
        size = 1,
        replace = TRUE,
        prob = c(0.2, 0.8)
      ),
      n = n()
    ))

  # Plot KEGG enrichment
  calculate_kegg_enrichment(
    data,
    protein_id = uniprot_id,
    is_significant = significant,
    pathway_id = pathway_id,
    pathway_name = pathway_name,
    plot = TRUE,
    plot_cutoff = "pval 0.05"
  )

  # Calculate KEGG enrichment
  kegg <- calculate_kegg_enrichment(
    data,
    protein_id = uniprot_id,
    is_significant = significant,
    pathway_id = pathway_id,
    pathway_name = pathway_name,
    plot = FALSE
  )

  head(kegg, n = 10)
}
#> # A tibble: 10 × 10
#>    pathway_id pathway_name                     pval adj_pval n_detected_proteins
#>    <chr>      <chr>                           <dbl>    <dbl>               <int>
#>  1 eco00550   Peptidoglycan biosynthesis    0.00837    0.917                1634
#>  2 eco00480   Glutathione metabolism        0.0157     0.917                1634
#>  3 eco02030   Bacterial chemotaxis          0.0210     0.917                1634
#>  4 eco03070   Bacterial secretion system    0.0309     1                    1634
#>  5 eco00240   Pyrimidine metabolism         0.0580     1                    1634
#>  6 eco00230   Purine metabolism             0.0590     1                    1634
#>  7 eco01230   Biosynthesis of amino acids   0.0638     1                    1634
#>  8 eco01232   Nucleotide metabolism         0.0762     1                    1634
#>  9 eco00520   Amino sugar and nucleotide s… 0.0869     1                    1634
#> 10 eco00730   Thiamine metabolism           0.0906     1                    1634
#> # ℹ 5 more variables: n_detected_proteins_in_pathway <int>,
#> #   n_significant_proteins <int>, n_significant_proteins_in_pathway <int>,
#> #   n_proteins_expected <dbl>, direction <chr>
# }