Neighborhood & Niche Analysis¶

Characterizing cellular microenvironments from spatial transcriptomics data.

Overview¶

Spatial transcriptomics platforms (Xenium, CosMx) provide not just cell types, but the context in which those cells reside. A Niche is a recurring microenvironment defined by its cell type composition - for example, a "Tumor-Immune Interface" or a "Vascular Niche."

This method is based on the neighborhood analysis framework originally introduced by He et al. (Nature Biotechnology, 2022) for the CosMx SMI platform.

High-plex multi-omic analysis in FFPE tissue at single-cellular and subcellular resolution by spatial molecular imaging

This vignette demonstrates the standard SpatialCore workflow:

Neighborhood Profiling: Quantify the local composition around every cell.
Niche Identification: Cluster these profiles to find recurring archetypes.

Workflow¶

We use the Xenium Human Liver Cancer dataset for this demonstration (10x Genomics Dataset). It contains 162,155 cells with 64 cell types annotated via CellTypist.

1. Compute Neighborhoods

First, we define the "neighborhood" of each cell. SpatialCore supports both k-Nearest Neighbors (k-NN) and fixed radius definitions.

import scanpy as sc
import spatialcore as spc

# Load data
adata = sc.read_h5ad("liver_cancer.h5ad")

# Compute neighborhood profiles (k=50)
spc.spatial.compute_neighborhood_profile(
    adata,
    celltype_column="cell_type",
    method="knn",
    k=50
)

Why k=50? For subcellular resolution data (Xenium/CosMx), larger neighborhoods (50-100) capture the broader tissue context better than small local neighborhoods (15-30).
Note: k excludes the center cell. Missing labels or empty neighborhoods raise an error. For radius-based neighborhoods, increase radius, switch to knn, or pre-filter isolated cells before profiling.

2. Identify Niches

We then cluster these neighborhood vectors to find recurring patterns (niches).

# Identify 10 niche archetypes
spc.spatial.identify_niches(
    adata,
    n_niches=10,
    method="kmeans",
    random_state=42
)

Results¶

Spatial Distribution

The identified niches map to distinct, biologically coherent regions of the tissue. Notice how Niche 10 (yellow) highlights the interface between the tumor and the immune-rich regions.

Niche Composition

What defines these niches? We can examine the cell type composition of each cluster.

Niche	Dominant Composition	Biological Interpretation
Niche 9	71% Hepatocytes	Healthy Parenchyma
Niche 10	45% Kupffer + 21% Hepatocytes	Immune-Parenchyma Interface
Niche 7	19% Cholangiocytes	Bile Duct / Portal Triad
Niche 2	59% Hepatocytes	Tumor Core

Manifold Projection (UMAP)

Projecting the neighborhood profiles (not expression) into UMAP space reveals the continuous nature of tissue microenvironments.

Validation: Python vs R¶

A core mission of SpatialCore is exact cross-language reproducibility. We benchmarked this Python implementation against an equivalent R workflow.

R Implementation

The R comparison uses the following implementation (FNN + ClusterR):

library(FNN)
library(ClusterR)

# 1. Find k nearest neighbors for each cell (exclude self)
k <- 50
neighbor_idx <- FNN::knn.index(spatial_coords, k = k + 1)

# 2. Build cell-type composition matrix
profile_matrix <- matrix(0, nrow = n_cells, ncol = n_celltypes)
for (i in seq_len(n_cells)) {
    neighbors <- neighbor_idx[i, ]
    neighbors <- neighbors[neighbors != i]
    if (length(neighbors) != k) {
        stop(paste0("Expected ", k, " neighbors excluding self for cell ", i))
    }
    neighbor_types <- cell_types[neighbors]
    for (ct in neighbor_types) {
        idx <- celltype_to_idx[[ct]]
        profile_matrix[i, idx] <- profile_matrix[i, idx] + 1
    }
}
row_sums <- rowSums(profile_matrix)
if (any(row_sums == 0)) {
    stop("Empty neighborhoods detected. Increase radius, switch to knn, or pre-filter isolated cells before profiling.")
}
profile_matrix <- profile_matrix / row_sums

# 3. Cluster with kmeans++
set.seed(42)
km_result <- ClusterR::KMeans_rcpp(
    profile_matrix,
    clusters = 10,
    num_init = 10,
    initializer = "kmeans++"
)
niche_labels <- paste0("niche_", km_result$clusters)

Metric	Value	Interpretation
NMI	0.769	High agreement
ARI	0.618	Moderate-High agreement

Confusion Matrix

The diagonal structure confirms that Python and R identify the same biological structures, though random seed differences in k-means initialization cause minor label swaps.

Visual Comparison

Side-by-side comparison of spatial niche assignments demonstrates strong biological concordance between implementations.

Python Implementation	R Implementation

SpatialCore (Python)	Seurat (R)

Best Practices¶

Choosing k:
- 15-30: Subcellular interactions (contact-dependent).
- 50-100: Tissue architecture (domains/niches).
Choosing n_niches:
- Start with 8-12 for most tissues.
- Use MiniBatchKMeans for datasets >100k cells for speed.
Quality Control:
- Empty neighborhoods are errors. Increase radius, switch to knn, or pre-filter isolated cells before profiling.
- Ensure niche sizes are balanced (avoid single-cell niches).

Citation¶

If you use this workflow, please cite:

@software{spatialcore,
  title = {SpatialCore: Standardized spatial statistics for computational biology},
  url = {https://github.com/mcap91/SpatialCore}
}