Genular FAQ

Genular is a database and analytics platform that unites vast publicly available single-cell RNA sequencing (scRNA-seq) data with comprehensive multi-domain biological knowledge. By combining information from numerous sources (e.g., NCBI Gene, Human Protein Atlas, STRING, and UniProt), Genular empowers you to explore gene and protein function within specific cells or tissues in a straightforward data-driven way.

1) What exactly is Genular?
Genular is a gene-cell database and analytics platform integrating:
  • Extensive single-cell RNA-seq experiments (covering over 125 million cells across diverse tissues and conditions).
  • Critical information from 16+ external databases (protein details, disease links, pathways, ontologies, etc.).
  • Cell Significance Index (CSI) and its intuitive, cross-context Relative CSI (rCSI) score to highlight functionally prominent gene expression.
It helps you discover how genes behave and define cellular identity across tissues, diseases, and detailed cell types with unprecedented depth and scale.
2) What is the Cell Significance Index (CSI)?

The CSI is a specialized metric that pinpoints how uniquely and significantly a gene is expressed in a specific cell type compared to all other cell types within the same biological context (e.g., within a single tissue).

How it's derived: To calculate the CSI, we employ a robust statistical pipeline that analyzes data from many independent experiments. In simple terms, for a given gene in a `target cell type`, we compare its expression profile to its expression in `all other cell types` from the same tissue or disease state. After ensuring data quality and applying several statistical tests, we integrate the results to produce a final score.

A higher CSI means the gene is a more statistically significant and specific marker for that cell, indicating it likely plays a key role in that cell's unique identity or function within its environment.

2.1) What are the different significance scores (rCSI, PRS, p-value) and what do they mean?

The platform provides several complementary scores to help you understand a gene's significance from different angles.

  • rCSI (Relative CSI): This score answers the question, "How impactful is this gene compared to the top-performing gene in this context?"

    How it's derived: The rCSI is calculated by comparing a gene's significance score to the most significant gene found in that same context. This rescales the value to an intuitive 0-100 range.

    Use Case: Its 0-100 scale is ideal for comparing the magnitude of a gene's effect across different tissues or conditions. A score of 100.0 means the gene is one of the top markers for that specific context.


  • PRS (Percentile Rank Score): This score answers the question, "Where does this gene rank against all other genes in this same context?"

    How it's derived: This score is determined by ranking a gene against all other significant genes within the same context. A score of 99.5 means the gene's effect is greater than 99.5% of all other significant genes in that context.

    Use Case: It's best for understanding a gene's relative importance or rank across different contexts, even if the absolute effect sizes differ.


  • pAdjVal (Adjusted P-Value): This score answers the question, "How statistically confident are we that this gene's expression pattern is not just random noise?"

    How it's derived: This value comes from rigorous statistical testing that has been corrected to account for the millions of comparisons being made across the dataset. A low value indicates high confidence in the result.

    Use Case: It is the primary measure of statistical confidence. While a low p-value (e.g., < 0.05) gives you strong confidence, a result with a higher p-value might still be biologically interesting if the effect size (`deltaVal`) is strong and warrants further investigation.


  • deltaVal (Effect Size): This score answers the question, "How different and specific is this gene's expression in this cell type compared to all others?"

    How it's derived: This is a measure of effect size that quantifies how consistently a gene's expression is higher or lower in the target cell type compared to others. It ranges from -1 to +1.

    Use Case: It measures the specificity and direction of the expression. It is the core component that determines whether a gene is a positive or negative marker for a cell type.

2.2) When should I use the Z-score (csi_z), Median Expression (csi_m), or Weighted Expression (csi_w) based scores?

Each of the three CSI calculation methods provides a different "lens" to view gene significance, allowing you to answer different biological questions. Choosing the right one depends on your analytical goal.

  • csi_z (Specificity): This score emphasizes how unique and specific a gene's expression is to a particular cell type. A high `csi_z` score is excellent for identifying defining marker genes that distinguish one cell from its neighbors, even if the absolute expression isn't the highest.
    Use this to answer: "What genes make this cell type unique?"

  • csi_m (Abundance): This score emphasizes the raw abundance of a gene's transcripts. A high `csi_m` score identifies the "workhorse" genes that are the most actively expressed in a cell, indicating a central role in its function.
    Use this to answer: "What are the most active genes in this cell?"

  • csi_w (Overall Impact): This score is a composite metric that considers expression level, prevalence within the cell type, and the abundance of that cell type in the tissue. It emphasizes a gene's total biological footprint on the entire system.
    Use this to answer: "Which genes have the greatest overall impact on this tissue?"
3) How does Genular help researchers today?

Genular helps researchers save time and uncover new insights by connecting massive amounts of data in one place. Here's how:

  • Find what makes a cell unique. You can quickly spot the key genes that act as specific markers for any cell type, like memory T cells or a particular kind of neuron, in any health or disease context.
  • Get the full picture at once. Instead of jumping between a dozen different websites, you can look up a gene and immediately see its expression patterns, significance scores, protein interactions, and disease links.
  • Build and test new ideas faster. By comparing how a gene's significance changes between healthy and diseased tissues, you can easily form and validate new hypotheses about what drives a specific biological process.
  • Prepare for the future of research. Genular provides the clean, contextualized data needed to power next-generation AI models. This helps bridge the gap from analyzing what cells are doing now to predicting how they'll respond to new drugs or genetic changes.
4) Can you give some simple use cases for Genular?

Here are a few common questions Genular can help you answer instantly:

  • "What makes a memory T-cell remember?"

    You can find genes that are uniquely significant in long-lived memory T-cells compared to their short-lived counterparts. This helps pinpoint the core genes responsible for maintaining long-term immunity.

  • "Why is a lung macrophage different from a liver macrophage?"

    Compare macrophages from different organs to see which genes are specifically important in each location. This can reveal genes that are critical for tissue-specific functions, like fighting infections in the lung.

  • "Is my gene a good drug target?"

    Look up your gene and compare its significance in cancerous cells versus healthy cells. If it's only important in cancer cells, it becomes a much stronger and safer candidate for a new therapy or diagnostic test.

  • "How does a simple cell become a specialized one?"

    Follow a cell's journey as it matures for example, from a monocyte into a macrophage. By tracking how the significance of key genes shifts over time, you can identify the critical turning points that lock in a cell's final function.

5) Which databases are integrated into Genular?
Genular unifies data from over 16 major public resources and ontologies, including:
  • NCBI Gene: For reference gene information and functional summaries.
  • Human Protein Atlas (HPA): For protein expression, localization, and pathology data.
  • STRING: For known and predicted protein-protein interaction networks.
  • UniProt: For comprehensive protein sequence and functional annotations.
  • Gene Ontology (GO): For functional annotation of genes based on biological process, molecular function, and cellular component.
  • Reactome: For curated biological pathways.
  • Disease Ontology (DOID) & UBERON anatomy ontology: For standardized disease and anatomy terms, enabling consistent data aggregation and querying.
By merging these, Genular offers cross-referenced view of each gene's role within its broader biological context.
6) What is the long-term vision for Genular in the context of AI and predictive biology?

Genular provides a powerful foundation for understanding cellular states. Our long-term vision is to evolve Genular into a comprehensive Virtual Cellular System (VCS) design platform.

Why is this important for the future of biology? Imagine being able to:

  • Predict perturbation outcomes: Use integrated AI models to simulate how cells would respond if you overexpressed a gene, knocked one out, or introduced a therapeutic compound. Genular would provide the deep biological context to interpret these predictions.
  • Design functional cellular systems: Specify a desired cellular function (e.g., enhanced secretion of a therapeutic protein). The platform could then help define the "gene significance blueprint" and suggest genetic or environmental modifications to achieve this state computationally.
  • Accelerate therapeutic discovery & hypothesis testing: Rapidly test complex biological hypotheses about disease mechanisms or drug actions computationally, prioritizing the most promising avenues for subsequent lab experiments.

The Cell Significance Index (CSI) and the deeply integrated, harmonized data within Genular are critical first steps. They provide the quantitative, context-rich understanding of cellular identity necessary to train and interpret the AI models that will power these future capabilities.

7) You mentioned a "CSI-Former." What is that and why is it exciting?

While Genular uses the Cell Significance Index (CSI) for deep analysis of existing single-cell data, the "gene significance landscapes" it uncovers represent a rich, new form of biological information. We envision a future AI model, which we tentatively call a "CSI-Former," that would be trained to learn directly from these CSI profiles across many cell types and conditions.

Why is this a potentially transformative idea? A CSI-Former could:

  • Learn the "rules of gene significance": Move beyond simple co-expression to understand how the *functional prominence* (high CSI) of one gene or pathway influences the likely significance of others within a specific cellular context.
  • Predict shifts in cellular identity: Model how a cell's entire CSI profile – its unique "functional fingerprint" – changes during differentiation, disease progression, or in response to alterations in key regulatory genes.
  • Identify core functional "significance modules": Discover groups of genes that consistently gain or lose significance together, representing robust, co-regulated functional units that define key aspects of cellular behavior.

Essentially, a CSI-Former would learn the design principles of cells based on which genes are most functionally defining, offering a powerful new way to understand and potentially engineer cellular systems. Genular is building the foundational dataset that would make such a model possible.

8) Tissue & Disease Cell Statistics