R/calculate_kegg_enrichment.R
calculate_kegg_enrichment.Rd
Analyses enrichment of KEGG pathways associated with proteins in the fraction of significant proteins compared to all detected proteins. A Fisher's exact test is performed to test significance of enrichment.
calculate_kegg_enrichment(
data,
protein_id,
is_significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = TRUE,
plot_cutoff = "adj_pval top10"
)
a data frame that contains at least the input variables.
a character column in the data
data frame that contains the protein
accession numbers.
a logical column in the data
data frame that indicates if the
corresponding protein has a significantly changing peptide. The input data frame may contain
peptide level information with significance information. The function is able to extract
protein level information from this.
a character column in the data
data frame that contains KEGG pathway
identifiers. These can be obtained from KEGG using fetch_kegg
.
a character column in the data
data frame that contains KEGG pathway
names. These can be obtained from KEGG using fetch_kegg
.
a logical value indicating whether the result should be plotted or returned as a table.
a character value indicating if the plot should contain the top 10 most
significant proteins (p-value or adjusted p-value), or if a significance cutoff should be used
to determine the number of GO terms in the plot. This information should be provided with the
type first followed by the threshold separated by a space. Example are
plot_cutoff = "adj_pval top10"
, plot_cutoff = "pval 0.05"
or
plot_cutoff = "adj_pval 0.01"
. The threshold can be chosen freely.
A bar plot displaying negative log10 adjusted p-values for the top 10 enriched pathways.
Bars are coloured according to the direction of the enrichment. If plot = FALSE
, a data
frame is returned.
# \donttest{
# Load libraries
library(dplyr)
set.seed(123) # Makes example reproducible
# Create example data
kegg_data <- fetch_kegg(species = "eco")
if (!is.null(kegg_data)) { # only proceed if information was retrieved
data <- kegg_data %>%
group_by(uniprot_id) %>%
mutate(significant = rep(
sample(
x = c(TRUE, FALSE),
size = 1,
replace = TRUE,
prob = c(0.2, 0.8)
),
n = n()
))
# Plot KEGG enrichment
calculate_kegg_enrichment(
data,
protein_id = uniprot_id,
is_significant = significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = TRUE,
plot_cutoff = "pval 0.05"
)
# Calculate KEGG enrichment
kegg <- calculate_kegg_enrichment(
data,
protein_id = uniprot_id,
is_significant = significant,
pathway_id = pathway_id,
pathway_name = pathway_name,
plot = FALSE
)
head(kegg, n = 10)
}
#> # A tibble: 10 × 10
#> pathway_id pathway_name pval adj_pval n_detected_proteins
#> <chr> <chr> <dbl> <dbl> <int>
#> 1 eco00550 Peptidoglycan biosynthesis 0.00837 0.917 1634
#> 2 eco00480 Glutathione metabolism 0.0157 0.917 1634
#> 3 eco02030 Bacterial chemotaxis 0.0210 0.917 1634
#> 4 eco03070 Bacterial secretion system 0.0309 1 1634
#> 5 eco00240 Pyrimidine metabolism 0.0580 1 1634
#> 6 eco00230 Purine metabolism 0.0590 1 1634
#> 7 eco01230 Biosynthesis of amino acids 0.0638 1 1634
#> 8 eco01232 Nucleotide metabolism 0.0762 1 1634
#> 9 eco00520 Amino sugar and nucleotide s… 0.0869 1 1634
#> 10 eco00730 Thiamine metabolism 0.0906 1 1634
#> # ℹ 5 more variables: n_detected_proteins_in_pathway <int>,
#> # n_significant_proteins <int>, n_significant_proteins_in_pathway <int>,
#> # n_proteins_expected <dbl>, direction <chr>
# }