Tutorial 6: SEPAR Analysis on Human Colorectal Cancer Visium HD Dataset
This tutorial demonstrates SEPAR application to human colorectal cancer Visium HD high-resolution spatial transcriptomics data, showcasing GPU-accelerated computation and spatial pattern recognition in cancer tissue.
Dataset Information
Dataset: 10X Genomics Human Colorectal Cancer Visium HD (P2CRC sample)
Resolution: 8μm, near single-cell level
Scale: 545,913 spots, requiring GPU acceleration
Source: https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc
import pandas as pd
import numpy as np
import scanpy as sc
import anndata as ad
import os
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from sklearn.metrics import adjusted_rand_score
from sklearn.metrics.cluster import normalized_mutual_info_score as nmi
from numpy.linalg import norm
import time
import gc
# import pickle
from datetime import datetime
from SEPAR_model import SEPAR
GPU acceleration available with CuPy
Data Loading and SEPAR Initialization
# Setup analysis paths
savefile = '/home/lzhang/SEPAR/new_out/crc_out/visiumHD/'
result_file = '/home/lzhang/SEPAR/new_out/crc_out/visiumHD/'
filepath = '/scratch/lzhang/dataset_human_crc/visium_hd_adapted_008um.h5ad'
output_dir = '/scratch/lzhang/SEPARout/crcVisiumHD/'
# Load Visium HD dataset
adata = sc.read_h5ad(filepath)
adata.var_names_make_unique()
adata.obsm['spatial'] = adata.obsm['spatial'].astype(float)
loc = adata.obsm['spatial']
print(f"Dataset shape: {adata.shape}")
print(f"Spatial coordinates shape: {loc.shape}")
print(f"Resolution: 8μm per spot")
# Initialize SEPAR with GPU acceleration
separ = SEPAR(adata, n_cluster=16, use_gpu=True, dtype=np.float32)
Dataset shape: (545913, 18085)
Spatial coordinates shape: (545913, 2)
Resolution: 8μm per spot
Using GPU acceleration with device: <CUDA Device 0>
Data Preprocessing and Quality Control
# Preprocess with stringent filtering
separ.preprocess(min_cells=50)
print(f"After filtering: {separ.adata.shape}")
# Compute spatial adjacency matrix
separ.compute_graph(adjacency_method='kdtree')
# Select spatially variable genes using Moran's I
separ.select_morani(nslt=3000, morans_method='vectorized')
print("Spatial gene selection completed")
After filtering: (545913, 16946)
WARNING: adata.X seems to be already log-transformed.
After filtering: (545913, 16946)
Auto-calculated radius: 39.99
Computing spatial adjacency using KDTree...
KDTree computation completed in 0.43 seconds

Computing Moran's I for gene selection...
Computing Moran's I for 16946 genes using vectorized approach...
Selected top 3000 genes with highest Moran's I scores
Top 5 Moran's I scores: [0.83368266 0.75054314 0.73968617 0.68365797 0.67907498]
Gene selection based on Moran's I completed
Spatial gene selection completed
For large-scale Visium HD datasets (>100K spots), SEPAR provides optimized computational methods:
adjacency_method='kdtree': KDTree-based neighbor search for efficient spatial graph constructionmorans_method='vectorized': Vectorized Moran’s I computation for faster spatial gene selection
# Compute weighted adjacency with cosine similarity
separ.compute_weight(metric='cosine', weight_method='optimized')
print("Weighted adjacency computation completed")
# Optional: Save checkpoint before intensive computation
# save_checkpoint(separ, "weighted_adj")
Estimating sigma parameter...
Estimated sigma: 0.6617 (Time: 0.37s)
Computing weights for 2177356 adjacency pairs...
Computing weights: 100%|███████████████████████████████████████████████| 2177356/2177356 [00:39<00:00, 54983.39it/s]
Weighted adjacency computation completed
weight_method='optimized': Sparse matrix operations with sampling-based parameter estimation
Running SEPAR Algorithm on High-Resolution Data
Execute the core SEPAR algorithm with parameters optimized for Visium HD data:
separ.separ_algorithm(
r=30,
alpha=0.3,
)
# Save checkpoint after algorithm completion
# save_checkpoint(separ, "r30alpha03")
# Alternative: Load from existing checkpoint
# separ = load_checkpoint('/scratch/lzhang/SEPARout/crcVisiumHD/checkpoints/r30alpha03_0716_2230.pkl')
Starting SEPAR algorithm...
Using parameters: alpha=0.3, beta=0.02, gamma=0.3
Initializing factor matrices...
SEPAR iterations: 100%|███████████████████████████████████████████████████████████| 100/100 [03:42<00:00, 2.22s/it]
SEPAR algorithm completed in 617.46 seconds
separ.clustering(
n_cluster=12,
N1=10,
N2=9
)
separ.labelres = separ.labelres + 1
clusters = separ.adata.obs['clustering'].astype(int)
fig, ax = plt.subplots(figsize=(7, 5), dpi=100)
scatter = ax.scatter(separ.loc[:, 0], -separ.loc[:, 1],
c=separ.labelres, s=0.1, cmap='tab20')
plt.title("SEPAR Spatial Domains", fontsize=14)
plt.axis('off')
# Add legend
legend = ax.legend(*scatter.legend_elements(),
title="Clusters",
bbox_to_anchor=(1.05, 0.5),
loc='center left')
plt.subplots_adjust(right=0.75)
plt.show()

pattern_specific_features = separ.identify_pattern_specific_genes(
n_patterns=30,
threshold=0.3
)
# Visualize all spatial patterns
sim_slt = separ.sim_res(separ.Wpn, separ.Hpn, separ.Xt.T)
sim_argsort = np.argsort(-sim_slt)
num_patterns = 30
plt.figure(dpi=200, figsize=(24, 6))
for i in range(30):
ii = sim_argsort[i]
plt.subplot(3, np.int32(num_patterns/3), i + 1)
plt.scatter(separ.loc[:, 0], -separ.loc[:, 1],
c=separ.Wpn[:, ii].reshape(-1, 1),
s=0.1, cmap='Reds')
plt.axis('off')
plt.title(f'Pattern {i + 1}: {np.int32(separ.genes_per_pattern[ii])} genes',
fontsize=12)
plt.tight_layout()
plt.show()
