Module: reconstruct

File: src/spatial_graph_algorithms/reconstruct/ Status: MDS — stable. STRND — stable.

Purpose

Recover 2-D or 3-D node coordinates using only the edge topology. No ground truth positions are used during reconstruction — only the adjacency matrix.

The public entry point is reconstruct(sg, method, dim, seed). It returns a copy of sg with reconstructed_positions set.

Internal Structure

reconstruct/
├── __init__.py   — reconstruct() dispatcher
├── mds.py        — run_mds()
├── landmark_mds.py — run_landmark_mds()
├── strnd.py      — run_strnd()
└── quality.py    — cpd(), knn_preservation(), distortion()

quality.py is a peer of the method files — it measures results, it does not produce them.

MDS (`mds.py`)

Algorithm: all-pairs shortest-path distances → sklearn metric MDS.

scipy.sparse.csgraph.shortest_path computes D[i,j] = hop distance between every pair.
Disconnected pairs receive 2 × max_finite (not inf) so MDS does not crash.
sklearn.manifold.MDS(dissimilarity='precomputed') embeds the distance matrix.

When it works well: clean connected graphs, fast, no optional dependencies. When it struggles: noisy graphs with many false edges — short-circuit paths distort distances.

Landmark MDS (`landmark_mds.py`)

Algorithm: landmark shortest-path distances → metric MDS on landmarks → triangulation.

Randomly select n_landmarks nodes using the reconstruction seed.
Compute shortest-path distances from each landmark to every node with scipy.sparse.csgraph.shortest_path(indices=landmarks).
Embed the landmark distance matrix with sklearn.manifold.MDS.
Triangulate all nodes from squared distances to the landmark set.

n_landmarks is required: call reconstruct(sg, method="landmark_mds", n_landmarks=64, seed=42). Use weighted=True to use adjacency values as shortest-path edge weights.

When it works well: larger graphs where full all-pairs MDS is too expensive. When it struggles: too few landmarks or landmark choices that undersample the graph geometry.

STRND (`strnd.py`)

Algorithm: Node2Vec biased random walks → high-dim embedding → UMAP reduction.

This is an exact port of the NSC pipeline (compute_pecanpy_embeddings_from_df):

Export the adjacency as a CSV edge list (temp file).
pecanpy.PreComp computes Node2Vec embeddings (default: 64-dim, p=q=1).
umap.UMAP reduces embeddings to dim coordinates.

Why temp file instead of in-memory? pecanpy's PreComp mode reads from disk — it is the fastest pecanpy mode and matches how the NSC codebase uses it.

Compatibility shims applied at import time (in _ensure_strnd_deps()): - scipy.linalg.triu restored — removed in newer scipy, still imported by gensim. - sklearn.utils.validation.check_array force_all_finite kwarg dropped — renamed in sklearn 1.6+.

These shims are lazy: they only run the first time run_strnd is called.

When it works well: larger graphs (>200 nodes), noisy data, when spatial structure is global rather than just local neighbourhoods. .

Quality Metrics (`quality.py`)

All three metrics compare sg.positions (ground truth) to sg.reconstructed_positions. All are rotation-, reflection-, and translation-invariant.

Function	What it measures	Range
`cpd`	Pearson correlation of all pairwise distances	[-1, 1] — higher is better
`knn_preservation`	Mean k-NN neighbourhood overlap	[0, 1] — higher is better
`distortion`	Median relative pairwise distance error	[0, ∞) — lower is better

These are called by metrics.evaluate() — you generally do not call them directly.

How to Add a New Reconstruction Method

Create reconstruct/<method>.py with a function:

def run_<method>(
    adjacency: scipy.sparse.spmatrix,
    dim: int = 2,
    random_state: int | None = None,
    # add method-specific params here
) -> np.ndarray:   # shape (n_nodes, dim)

Add a branch in reconstruct/__init__.py::reconstruct():

elif method == "<method>":
    coords = run_<method>(sg.adjacency_matrix, dim=dim, random_state=seed)

If the method requires optional dependencies, use the lazy import pattern from strnd.py.
Add a test in tests/test_reconstruction_report.py.

Contract: - Must return a float array of shape (n_nodes, dim). - Must be deterministic when random_state is set. - Must not modify the input adjacency matrix. - Must handle disconnected graphs without crashing (fill disconnected pairs with a safe value).

How to Add a New Quality Metric

Add a function to reconstruct/quality.py:

def my_metric(
    true_positions: np.ndarray,
    recon_positions: np.ndarray,
) -> float:
    """One-line description. Range and interpretation."""
    ...

Add it to the evaluate() result dict in metrics/__init__.py.
Document the range and what "good" looks like in docs/api/metrics.md.

Tests

tests/test_reconstruction_report.py — shape checks for MDS, Landmark MDS, and STRND
tests/test_verify_report.py         — reconstruction quality CSV columns
tests/test_examples_verify.py       — end-to-end pipeline