R/calculate_kegg_enrichment.R
calculate_kegg_enrichment.Rd
Analyses enrichment of KEGG pathways associated with proteins in the fraction of significant proteins compared to all detected proteins. A Fisher's exact test is performed to test significance of enrichment.
calculate_kegg_enrichment(
data,
protein_id,
is_significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = TRUE,
plot_cutoff = "adj_pval top10"
)
a data frame that contains at least the input variables.
a character column in the data
data frame that contains the protein
accession numbers.
a logical column in the data
data frame that indicates if the
corresponding protein has a significantly changing peptide. The input data frame may contain
peptide level information with significance information. The function is able to extract
protein level information from this.
a character column in the data
data frame that contains KEGG pathway
identifiers. These can be obtained from KEGG using fetch_kegg
.
a character column in the data
data frame that contains KEGG pathway
names. These can be obtained from KEGG using fetch_kegg
.
a logical value indicating whether the result should be plotted or returned as a table.
a character value indicating if the plot should contain the top 10 most
significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used
to determine the number of GO terms in the plot. This information should be provided with the
type first followed by the threshold separated by a space. Example are
plot_cutoff = "adj_pval top10"
, plot_cutoff = "pval 0.05"
or
plot_cutoff = "adj_pval 0.01"
. The threshold can be chosen freely.
A bar plot displaying negative log10 adjusted p-values for the top 10 enriched pathways.
Bars are coloured according to the direction of the enrichment. If plot = FALSE
, a data
frame is returned.
# \donttest{
# Load libraries
library(dplyr)
set.seed(123) # Makes example reproducible
# Create example data
kegg_data <- fetch_kegg(species = "eco")
if (!is.null(kegg_data)) { # only proceed if information was retrieved
data <- kegg_data %>%
group_by(uniprot_id) %>%
mutate(significant = rep(
sample(
x = c(TRUE, FALSE),
size = 1,
replace = TRUE,
prob = c(0.2, 0.8)
),
n = n()
))
# Plot KEGG enrichment
calculate_kegg_enrichment(
data,
protein_id = uniprot_id,
is_significant = significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = TRUE,
plot_cutoff = "pval 0.05"
)
# Calculate KEGG enrichment
kegg <- calculate_kegg_enrichment(
data,
protein_id = uniprot_id,
is_significant = significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = FALSE
)
head(kegg, n = 10)
}
#> # A tibble: 10 × 10
#> pathway_id pathway_name pval adj_pval n_detected_proteins
#> <chr> <chr> <dbl> <dbl> <int>
#> 1 eco00740 Riboflavin metabolism 0.0234 1 1627
#> 2 eco01230 Biosynthesis of amino acids 0.0271 1 1627
#> 3 eco01210 2-Oxocarboxylic acid metaboli… 0.0686 1 1627
#> 4 eco00040 Pentose and glucuronate inter… 0.0955 1 1627
#> 5 eco00592 alpha-Linolenic acid metaboli… 0.0986 1 1627
#> 6 eco00360 Phenylalanine metabolism 0.113 1 1627
#> 7 eco00330 Arginine and proline metaboli… 0.127 1 1627
#> 8 eco01503 Cationic antimicrobial peptid… 0.133 1 1627
#> 9 eco00670 One carbon pool by folate 0.147 1 1627
#> 10 eco00130 Ubiquinone and other terpenoi… 0.151 1 1627
#> # ℹ 5 more variables: n_detected_proteins_in_pathway <int>,
#> # n_significant_proteins <int>, n_significant_proteins_in_pathway <int>,
#> # n_proteins_expected <dbl>, direction <chr>
# }