3  Pathway Analysis

The mapa package offers two key functions for pathway enrichment:

Both support multiple databases and handle gene and metabolite data (metabolites currently ORA only). A unified workflow lets you analyze multiple databases simultaneously in one function call, which saves separate runs.

library(mapa)
Important

Prerequisites: Before running enrichment analysis, ensure your data has been properly preprocessed using the convert_id() function as described in Chapter 2 - Data Input and Preprocessing. The variable_info used in this chapter should be the output from the ID conversion step.

Alternatively, you can load our example data by following 2.1.2 of Section 2.1.

3.1 Gene-based Enrichment

3.1.1 Supported Databases and Keytypes

The mapa package leverages the powerful enrichment functions from the clusterProfiler package for pathway analysis. The following table shows which databases you can use for your organism and the supported key types:

Database Supported Organism Keytype Options
Gene Ontology (GO) Any organism with OrgDb object (organism annotation database package from Bioconductor or OrgDb object retrieved from AnnotationHub, see Section 2.2.1) Any keyType supported by your OrgDb object
KEGG All KEGG organisms "kegg", "ncbi-geneid", "ncbi-proteinid", "uniprot"
Reactome human, rat, mouse, celegans, yeast, zebrafish, fly, bovine, canine, chicken ENTREZID only
Tip
  1. To check available GO keytype for your organism:

    For model organisms with Bioconductor annotation packages

    ## For model organisms with Bioconductor annotation packages
    library(org.Mm.eg.db)
    AnnotationDbi::keytypes(org.Mm.eg.db)
    # [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "ENTREZID" "ENZYME"  "EVIDENCE" "EVIDENCEALL" "GENENAME"    
    # [11] "GENETYPE" "GO" "GOALL" "IPI" "MAP" "OMIM" "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM"        
    # [21] "PMID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG" "UNIPROT" 

    For non-model organisms with annotation OrgDb retrieved from AnnotationHub

    # variable_info is the output from the ID conversion step in Chapter 2 - Data Input and Preprocessing
    AnnotationDbi::keytypes(variable_info$orgdb)
  2. For comprehensive understanding of enrichment analysis concepts, methodologies, we highly recommend reading the Biomedical Knowledge Mining using GOSemSim and clusterProfiler book. This book covers all the underlying methods that MAPA uses and will help you make informed decisions about your analysis parameters.

3.1.2 Basic Usage

Use enrich_pathway() for Over-Representation Analysis.

library(org.Mm.eg.db)
    
gene_enriched_pathways <- 
  enrich_pathway(
    variable_info = variable_info,
    query_type = "gene",
    database = c("go", "kegg", "reactome"),
    # GO parameters
    go.orgdb = org.Mm.eg.db, # or "org.Mm.eg.db" (in quotes) if you did not load `org.Mm.eg.db`
    go.keytype = "ENTREZID",
    go.ont = "ALL",
    # KEGG parameters
    kegg.organism = "mmu",
    kegg.keytype = "kegg",
    # Reactome parameters
    reactome.organism = "mouse",
    # Statistical parameters
    pvalueCutoff = 0.05,
    pAdjustMethod = "BH"
  )

# GO database...
# KEGG database...
# Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
# Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
# Reactome database...
# Done.

The do_gsea() function performs GSEA using ranked gene lists based on fold changes or other metrics.

library(org.Mm.eg.db)
    
gsea_pathways <- 
  do_gsea(
    variable_info = variable_info,
    query_type = "gene",
    order_by = "fc",              # Column to rank genes by
    database = c("go", "kegg", "reactome"),
    # Database parameters
    go.orgdb = org.Mm.eg.db, # or "org.Mm.eg.db" (in quotes) if you did not load `org.Mm.eg.db`
    go.ont = "ALL",
    go.keytype = "ENTREZID",
    kegg.organism = "mmu",
    kegg.keytype = "kegg",
    reactome.organism = "mouse",
    # GSEA parameters
    pvalueCutoff = 0.05,
    pAdjustMethod = "BH"
  )
Note
  • Model organisms: use variable_info directly and package name for go.orgdb
  • Non-model organisms: use variable_info$data for parameter variable_info and variable_info$orgdb for go.orgdb

3.2 Metabolite-based Enrichment

3.2.1 Supported Databases and Keytypes

Database Supported Organism Keytype Options
KEGG All KEGG organisms KEGG compound IDs
SMPDB Only for human HMDB IDs

3.2.2 Basic Usage

For metabolite data, ensure your variable_info contains HMDB IDs and/or KEGG compound IDs.

met_enriched_pathways <- 
  enrich_pathway(
    variable_info = met_variable_info,
    query_type = "metabolite",
    database = c("hmdb", "metkegg"),
    met_organism = "hsa",
    save_to_local = TRUE,
    pvalueCutoff = 0.05,
    pAdjustMethod = "BH"
  )

3.3 Results interpretation

After enrichment analysis, view the enrichment analysis summary:

gene_enriched_pathways
# -------------------- 
# Analysis method: enrich_pathway 
# -------------------- 
# -----------Variable information------------
# 66  features/markers in total
# -----------Enrichment results and modules of genes------------
# -----------GO------------
# 197 GO terms with p.adjust < 0.05 
# No GO modules
# -----------KEGG------------
# 30 KEGG pathways with p.adjust < 0.05 
# No KEGG modules
# -----------Reactome------------
# 16 Reactome pathways with p.adjust < 0.05 
# No Reactome modules
# -----------Enrichment results and modules of metabolites------------
# -----------HMDB------------
# No HMDB results
# No HMDB modules
# -----------KEGG Metabolite------------
# No KEGG metabolite results
# No KEGG modules
# -----------Functional modules------------
# No Functional modules
# -----------LLM module interpretation------------
# No LLM module interpretation results
# -------------------- 
# Processing information
# 1 processings in total
# enrich_pathway ---------- 
#   Package    Function.used                Time
# 1    mapa enrich_pathway() 2025-09-20 12:26:40

You can access specific database results via @:

head(gene_enriched_pathways@enrichment_go_result@result)
#            ONTOLOGY         ID                                         Description GeneRatio
# GO:0015980       BP GO:0015980 energy derivation by oxidation of organic compounds     12/63
# GO:0009060       BP GO:0009060                                 aerobic respiration      9/63
# GO:0045333       BP GO:0045333                                cellular respiration      9/63
# GO:0003012       BP GO:0003012                               muscle system process     10/63
# GO:0006941       BP GO:0006941                         striated muscle contraction      7/63
# GO:0006936       BP GO:0006936                                  muscle contraction      8/63
#              BgRatio RichFactor FoldEnrichment    zScore       pvalue     p_adjust       qvalue
# GO:0015980 380/28928 0.03157895      14.500251 12.376048 3.244556e-11 3.912934e-08 2.759580e-08
# GO:0009060 206/28928 0.04368932      20.061026 12.826517 6.707680e-10 4.044731e-07 2.852529e-07
# GO:0045333 271/28928 0.03321033      15.249341 11.010328 7.404865e-09 2.976756e-06 2.099344e-06
# GO:0003012 460/28928 0.02173913       9.982057  9.072196 5.645295e-08 1.702057e-05 1.200368e-05
# GO:0006941 181/28928 0.03867403      17.758134 10.565860 1.375932e-07 3.318748e-05 2.340533e-05
# GO:0006936 329/28928 0.02431611      11.165340  8.663245 5.781200e-07 1.065983e-04 7.517800e-05
#                                                                               geneID Count
# GO:0015980 66128/66043/94044/12833/66576/12867/78920/16828/78330/407785/102093/19045    12
# GO:0009060                    66128/66043/94044/12833/66576/12867/78920/78330/407785     9
# GO:0045333                    66128/66043/94044/12833/66576/12867/78920/78330/407785     9
# GO:0003012             11640/11464/12313/226594/12833/26399/17885/407785/56012/74166    10
# GO:0006941                                11464/12313/226594/26399/17885/56012/74166     7
# GO:0006936                         11464/12313/226594/26399/17885/407785/56012/74166     8
head(gsea_pathways@enrichment_kegg_result@result)
#                ID                                 Description setSize enrichmentScore       NES
# mmu05322 mmu05322                Systemic lupus erythematosus      17       0.4629405  2.010801
# mmu04664 mmu04664             Fc epsilon RI signaling pathway      16       0.4603239  1.973294
# mmu04141 mmu04141 Protein processing in endoplasmic reticulum      44      -0.4963270 -1.685713
# mmu05014 mmu05014               Amyotrophic lateral sclerosis      68      -0.4359574 -1.547397
#                pvalue   p_adjust     qvalue rank                    leading_edge
# mmu05322 0.0005485788 0.04951072 0.04812464 1343 tags=100%, list=54%, signal=46%
# mmu04664 0.0008427356 0.04951072 0.04812464 1349 tags=100%, list=54%, signal=46%
# mmu04141 0.0005266381 0.04951072 0.04812464  991  tags=77%, list=40%, signal=47%
# mmu05014 0.0007372888 0.04951072 0.04812464  878  tags=63%, list=35%, signal=42%
#                                                                                                                                                                                                                                                                            core_enrichment
# mmu05322                                                                                                                                                                           14960/14969/14961/12262/13035/326619/15078/109711/15270/26914/60595/12268/20821/20641/67332/50909/20823
# mmu04664                                                                                                                                                                                  19354/20963/234779/18783/22324/18707/16331/11651/18750/26417/17096/14784/19353/16653/22325/26416
# mmu04141                                                               19089/27061/68292/67075/69276/26408/26965/320011/20014/12333/27054/20224/56228/216440/14827/23802/22027/100037258/99683/50907/110379/103963/81500/22393/50527/66967/66212/12304/12330/81489/71853/14376/18453/67397
# mmu05014 26408/69654/53319/17184/71844/227197/107939/56480/227699/59288/225887/68342/59015/433702/17274/53598/53857/234865/108989/230908/269966/53379/56717/67665/69912/19069/56208/16573/237782/74764/110379/70699/19132/226977/214585/12864/67680/445007/11744/19172/11750/225326/103468
#          Count
# mmu05322    17
# mmu04664    16
# mmu04141    34
# mmu05014    43

Key columns in enrichment results:

Core Information:

  • ID: Pathway identifier (e.g., GO:0042060 for GO, hsa04060 for KEGG, R-HSA-5669034 for Reactome)
  • Description: Pathway name
  • ONTOLOGY: Biological ontology (for GO only: MF=molecular function, CC=cellular component, BP=biological process)
  • category/subcategory: KEGG pathway categories (for KEGG only)

Statistical Measures:

  • pvalue: Raw p-value from hypergeometric test (equivalent to one-sided Fisher’s exact test)
  • p_adjust: Adjusted p-value after multiple testing correction (BH method by default)
  • qvalue: Q-value for FDR control. For more information, see ?qvalue::qvalue

Gene Mapping:

  • GeneRatio: Ratio of input genes annotated to this pathway (format: “genes_in_pathway/total_input_genes”)
  • BgRatio: Ratio of all genes annotated to this pathway in the background universe (format: “pathway_genes/universe_genes”)
  • Count: Total number of genes from input list that match this pathway
  • geneID: Gene IDs that overlap between your gene list and the pathway (separated by “/”)

Enrichment Metrics:

  • RichFactor: Ratio of input genes annotated to a pathway vs. all genes annotated to this pathway
  • FoldEnrichment: Enrichment fold change (GeneRatio divided by BgRatio)
  • zScore: Standard deviations away from expected overlap (How unusual or extreme the observed enrichment is compared to what you’d expect by chance. Higher absolute z-score means more “surprising” or significant enrichment)

For GSEA results, additional columns specific to ranked list analysis include:

  • setSize: Total number of genes in the gene set/pathway
  • enrichmentScore: Degree of overrepresentation at top/bottom of ranked list
  • NES: Normalized Enrichment Score - main metric for interpretation
    • Positive NES: pathway enriched in upregulated genes (pathway activation)
    • Negative NES: pathway enriched in downregulated genes (pathway suppression)
  • rank: Position in ranked list where maximum enrichment score occurred
  • leading_edge: Statistics for the leading-edge subset of genes
  • core_enrichment: Core enriched genes that contribute most to the enrichment signal
  • Count: Number of core enriched genes

For metabolite ORA results, the result structure differs from gene-based analysis:

Pathway Information:

  • pathway_id: Small Molecule Pathway Database Pathway identifier (e.g., SMP0000028)
  • pathway_name: Name of the metabolic pathway (e.g., “Caffeine Metabolism”)
  • describtion: Detailed description of the pathway from the corresponding database
  • pathway_class: Classification of pathway (e.g., “Metabolic;primary_pathway”, “Disease;primary_pathway”)

Statistical Measures:

  • p_value: Raw p-value from enrichment test
  • p_adjust: Adjusted p-value for multiple testing correction

Metabolite Mapping:

  • all_id: All metabolite IDs in this pathway (separated by “;”)
  • all_number: Total number of metabolites in the pathway
  • mapped_id: Your input metabolite IDs that map to this pathway (separated by “;”)
  • mapped_number: Number of your metabolites that map to this pathway
  • mapped_percentage: Percentage of pathway metabolites covered by your input data

The enrichment results object is tailored for downstream similarity analysis and module identification in MAPA, so it cannot be substituted with outputs from other tools. Please make sure you complete the steps above before proceeding.