Skip to content

Module: reconstruct

File: src/spatial_graph_algorithms/reconstruct/ Status: MDS — stable. STRND — stable.


Purpose

Recover 2-D or 3-D node coordinates using only the edge topology. No ground truth positions are used during reconstruction — only the adjacency matrix.

The public entry point is reconstruct(sg, method, dim, seed). It returns a copy of sg with reconstructed_positions set.


Internal Structure

reconstruct/
├── __init__.py   — reconstruct() dispatcher
├── mds.py        — run_mds()
├── landmark_mds.py — run_landmark_mds()
├── strnd.py      — run_strnd()
└── quality.py    — cpd(), knn_preservation(), distortion()

quality.py is a peer of the method files — it measures results, it does not produce them.


MDS (mds.py)

Algorithm: all-pairs shortest-path distances → sklearn metric MDS.

  1. scipy.sparse.csgraph.shortest_path computes D[i,j] = hop distance between every pair.
  2. Disconnected pairs receive 2 × max_finite (not inf) so MDS does not crash.
  3. sklearn.manifold.MDS(dissimilarity='precomputed') embeds the distance matrix.

When it works well: clean connected graphs, fast, no optional dependencies. When it struggles: noisy graphs with many false edges — short-circuit paths distort distances.


Landmark MDS (landmark_mds.py)

Algorithm: landmark shortest-path distances → metric MDS on landmarks → triangulation.

  1. Randomly select n_landmarks nodes using the reconstruction seed.
  2. Compute shortest-path distances from each landmark to every node with scipy.sparse.csgraph.shortest_path(indices=landmarks).
  3. Embed the landmark distance matrix with sklearn.manifold.MDS.
  4. Triangulate all nodes from squared distances to the landmark set.

n_landmarks is required: call reconstruct(sg, method="landmark_mds", n_landmarks=64, seed=42). Use weighted=True to use adjacency values as shortest-path edge weights.

When it works well: larger graphs where full all-pairs MDS is too expensive. When it struggles: too few landmarks or landmark choices that undersample the graph geometry.


STRND (strnd.py)

Algorithm: Node2Vec biased random walks → high-dim embedding → UMAP reduction.

This is an exact port of the NSC pipeline (compute_pecanpy_embeddings_from_df):

  1. Export the adjacency as a CSV edge list (temp file).
  2. pecanpy.PreComp computes Node2Vec embeddings (default: 64-dim, p=q=1).
  3. umap.UMAP reduces embeddings to dim coordinates.

Why temp file instead of in-memory? pecanpy's PreComp mode reads from disk — it is the fastest pecanpy mode and matches how the NSC codebase uses it.

Compatibility shims applied at import time (in _ensure_strnd_deps()): - scipy.linalg.triu restored — removed in newer scipy, still imported by gensim. - sklearn.utils.validation.check_array force_all_finite kwarg dropped — renamed in sklearn 1.6+.

These shims are lazy: they only run the first time run_strnd is called.

When it works well: larger graphs (>200 nodes), noisy data, when spatial structure is global rather than just local neighbourhoods. .


Quality Metrics (quality.py)

All three metrics compare sg.positions (ground truth) to sg.reconstructed_positions. All are rotation-, reflection-, and translation-invariant.

Function What it measures Range
cpd Pearson correlation of all pairwise distances [-1, 1] — higher is better
knn_preservation Mean k-NN neighbourhood overlap [0, 1] — higher is better
distortion Median relative pairwise distance error [0, ∞) — lower is better

These are called by metrics.evaluate() — you generally do not call them directly.


How to Add a New Reconstruction Method

  1. Create reconstruct/<method>.py with a function:
    def run_<method>(
        adjacency: scipy.sparse.spmatrix,
        dim: int = 2,
        random_state: int | None = None,
        # add method-specific params here
    ) -> np.ndarray:   # shape (n_nodes, dim)
    
  2. Add a branch in reconstruct/__init__.py::reconstruct():
    elif method == "<method>":
        coords = run_<method>(sg.adjacency_matrix, dim=dim, random_state=seed)
    
  3. If the method requires optional dependencies, use the lazy import pattern from strnd.py.
  4. Add a test in tests/test_reconstruction_report.py.

Contract: - Must return a float array of shape (n_nodes, dim). - Must be deterministic when random_state is set. - Must not modify the input adjacency matrix. - Must handle disconnected graphs without crashing (fill disconnected pairs with a safe value).


How to Add a New Quality Metric

  1. Add a function to reconstruct/quality.py:
    def my_metric(
        true_positions: np.ndarray,
        recon_positions: np.ndarray,
    ) -> float:
        """One-line description. Range and interpretation."""
        ...
    
  2. Add it to the evaluate() result dict in metrics/__init__.py.
  3. Document the range and what "good" looks like in docs/api/metrics.md.

Tests

tests/test_reconstruction_report.py — shape checks for MDS, Landmark MDS, and STRND
tests/test_verify_report.py         — reconstruction quality CSV columns
tests/test_examples_verify.py       — end-to-end pipeline