American Association for Cancer Research (AACR) 2025
April 21, 2025
Authors: Maximilien Colange; Guillaume Appe; Lea Meunier; Solene Weill; Akpeli Nordor; Abdelkader Behdenna
Abstract
We introduce InMoose, an open-source Python environment for omic data analysis. Due to its wide adoption, Python has grown as a de facto standard in fields increasingly important for bioinformatic pipelines, such as data science, machine learning, or AI. As a general-purpose language, Python is also recognized for its versatility and scalability. InMoose aims at bringing state-of-the-art tools, historically written in R, to the Python ecosystem. Our intent is to provide a drop-in replacement for R tools, so our approach focuses on the faithfulness to the original tools outcomes.
The first development phase has focused on bulk transcriptomic data, with current capabilities encompassing data simulation, batch effect correction, and differential analysis and meta-analysis. InMoose offers a Python implementation of several state-of-the-art tools originally written in R:
• ComBat and ComBat-Seq (batch effect correction)
• edgeR, DESeq2, limma (differential gene expression analysis)
• splatter (RNA-Seq data simulation)
To our knowledge, InMoose is the sole Python implementation of ComBat-Seq, edgeR and limma. InMoose also offers original features:
• a quality control report for cohorts built through the batch effect correction features;
• a differential gene expression meta-analysis module.
We present the range of capabilities of InMoose, illustrating them with an example workflow. We also compare InMoose with the R original implementations and alternative Python implementations when available. Our experiments show that the results of InMoose are very similar, if not identical, to those of the original R tools. This positions InMoose as a key tool to bridge R and Python ecosystem and to ensure reproducibility and comparability between R-based and Python-based bioinformatics pipelines.
We put the emphasis on making InMoose easy-to-use and open source to reach as many bioinformaticians as possible. Since the first version of InMoose has been made public, multiple users shared their experience with us, and even contributed code enhancements, which led us to update our software and plan to release more bioinformatics tools on Python in the future.
Python has grown as the language of choice for machine learning and AI, and streamlining ML and AI-powered pipelines for biomolecular data has become an essential step to unlock the full potential of biomolecular analysis. Single-cell omic data is well connected to AI/ML tools through scanpy and the scverse ecosystem, but no similar initiative exists for bulk transcriptomic data. Omicverse aggregates existing Python tools into a single-entrypoint, but has not endeavored to port R capacities to the Python world. InMoose addresses this gap, which hinders the smooth integration of bulk transcriptomic data with ML and AI tools.