Skip to content

Neighborhood & Niche Analysis

Characterizing cellular microenvironments from spatial transcriptomics data.


Overview

Spatial transcriptomics platforms (Xenium, CosMx) provide not just cell types, but the context in which those cells reside. A Niche is a recurring microenvironment defined by its cell type composition - for example, a "Tumor-Immune Interface" or a "Vascular Niche."

This method is based on the neighborhood analysis framework originally introduced by He et al. (Nature Biotechnology, 2022) for the CosMx SMI platform.

High-plex multi-omic analysis in FFPE tissue at single-cellular and subcellular resolution by spatial molecular imaging

This vignette demonstrates the standard SpatialCore workflow:

  1. Neighborhood Profiling: Quantify the local composition around every cell.
  2. Niche Identification: Cluster these profiles to find recurring archetypes.

Workflow

We use the Xenium Human Liver Cancer dataset for this demonstration (10x Genomics Dataset). It contains 162,155 cells with 64 cell types annotated via CellTypist.

First, we define the "neighborhood" of each cell. SpatialCore supports both k-Nearest Neighbors (k-NN) and fixed radius definitions.

import scanpy as sc
import spatialcore as spc

# Load data
adata = sc.read_h5ad("liver_cancer.h5ad")

# Compute neighborhood profiles (k=50)
spc.spatial.compute_neighborhood_profile(
    adata,
    celltype_column="cell_type",
    method="knn",
    k=50
)
  • Why k=50? For subcellular resolution data (Xenium/CosMx), larger neighborhoods (50-100) capture the broader tissue context better than small local neighborhoods (15-30).
  • Note: k excludes the center cell. Missing labels or empty neighborhoods raise an error. For radius-based neighborhoods, increase radius, switch to knn, or pre-filter isolated cells before profiling.

We then cluster these neighborhood vectors to find recurring patterns (niches).

# Identify 10 niche archetypes
spc.spatial.identify_niches(
    adata,
    n_niches=10,
    method="kmeans",
    random_state=42
)

Results

The identified niches map to distinct, biologically coherent regions of the tissue. Notice how Niche 10 (yellow) highlights the interface between the tumor and the immune-rich regions.

Spatial Niche Distribution

What defines these niches? We can examine the cell type composition of each cluster.

Niche Heatmap

Niche Dominant Composition Biological Interpretation
Niche 9 71% Hepatocytes Healthy Parenchyma
Niche 10 45% Kupffer + 21% Hepatocytes Immune-Parenchyma Interface
Niche 7 19% Cholangiocytes Bile Duct / Portal Triad
Niche 2 59% Hepatocytes Tumor Core

Projecting the neighborhood profiles (not expression) into UMAP space reveals the continuous nature of tissue microenvironments.

Niche UMAP


Validation: Python vs R

A core mission of SpatialCore is exact cross-language reproducibility. We benchmarked this Python implementation against an equivalent R workflow.

The R comparison uses the following implementation (FNN + ClusterR):

library(FNN)
library(ClusterR)

# 1. Find k nearest neighbors for each cell (exclude self)
k <- 50
neighbor_idx <- FNN::knn.index(spatial_coords, k = k + 1)

# 2. Build cell-type composition matrix
profile_matrix <- matrix(0, nrow = n_cells, ncol = n_celltypes)
for (i in seq_len(n_cells)) {
    neighbors <- neighbor_idx[i, ]
    neighbors <- neighbors[neighbors != i]
    if (length(neighbors) != k) {
        stop(paste0("Expected ", k, " neighbors excluding self for cell ", i))
    }
    neighbor_types <- cell_types[neighbors]
    for (ct in neighbor_types) {
        idx <- celltype_to_idx[[ct]]
        profile_matrix[i, idx] <- profile_matrix[i, idx] + 1
    }
}
row_sums <- rowSums(profile_matrix)
if (any(row_sums == 0)) {
    stop("Empty neighborhoods detected. Increase radius, switch to knn, or pre-filter isolated cells before profiling.")
}
profile_matrix <- profile_matrix / row_sums

# 3. Cluster with kmeans++
set.seed(42)
km_result <- ClusterR::KMeans_rcpp(
    profile_matrix,
    clusters = 10,
    num_init = 10,
    initializer = "kmeans++"
)
niche_labels <- paste0("niche_", km_result$clusters)
Metric Value Interpretation
NMI 0.769 High agreement
ARI 0.618 Moderate-High agreement

The diagonal structure confirms that Python and R identify the same biological structures, though random seed differences in k-means initialization cause minor label swaps.

Confusion Matrix

Side-by-side comparison of spatial niche assignments demonstrates strong biological concordance between implementations.

Python Implementation R Implementation
SpatialCore (Python) Seurat (R)
SpatialCore (Python) Seurat (R)

Best Practices

  • Choosing k:
    • 15-30: Subcellular interactions (contact-dependent).
    • 50-100: Tissue architecture (domains/niches).
  • Choosing n_niches:
    • Start with 8-12 for most tissues.
    • Use MiniBatchKMeans for datasets >100k cells for speed.
  • Quality Control:
    • Empty neighborhoods are errors. Increase radius, switch to knn, or pre-filter isolated cells before profiling.
    • Ensure niche sizes are balanced (avoid single-cell niches).

Citation

If you use this workflow, please cite:

@software{spatialcore,
  title = {SpatialCore: Standardized spatial statistics for computational biology},
  url = {https://github.com/mcap91/SpatialCore}
}