Module: reconstruct
File: src/spatial_graph_algorithms/reconstruct/
Status: MDS — stable. STRND — stable.
Purpose
Recover 2-D or 3-D node coordinates using only the edge topology. No ground truth positions are used during reconstruction — only the adjacency matrix.
The public entry point is reconstruct(sg, method, dim, seed).
It returns a copy of sg with reconstructed_positions set.
Internal Structure
reconstruct/
├── __init__.py — reconstruct() dispatcher
├── mds.py — run_mds()
├── landmark_mds.py — run_landmark_mds()
├── strnd.py — run_strnd()
└── quality.py — cpd(), knn_preservation(), distortion()
quality.py is a peer of the method files — it measures results, it does not produce them.
MDS (mds.py)
Algorithm: all-pairs shortest-path distances → sklearn metric MDS.
scipy.sparse.csgraph.shortest_pathcomputesD[i,j]= hop distance between every pair.- Disconnected pairs receive
2 × max_finite(notinf) so MDS does not crash. sklearn.manifold.MDS(dissimilarity='precomputed')embeds the distance matrix.
When it works well: clean connected graphs, fast, no optional dependencies. When it struggles: noisy graphs with many false edges — short-circuit paths distort distances.
Landmark MDS (landmark_mds.py)
Algorithm: landmark shortest-path distances → metric MDS on landmarks → triangulation.
- Randomly select
n_landmarksnodes using the reconstruction seed. - Compute shortest-path distances from each landmark to every node with
scipy.sparse.csgraph.shortest_path(indices=landmarks). - Embed the landmark distance matrix with
sklearn.manifold.MDS. - Triangulate all nodes from squared distances to the landmark set.
n_landmarks is required: call
reconstruct(sg, method="landmark_mds", n_landmarks=64, seed=42).
Use weighted=True to use adjacency values as shortest-path edge weights.
When it works well: larger graphs where full all-pairs MDS is too expensive. When it struggles: too few landmarks or landmark choices that undersample the graph geometry.
STRND (strnd.py)
Algorithm: Node2Vec biased random walks → high-dim embedding → UMAP reduction.
This is an exact port of the NSC pipeline (compute_pecanpy_embeddings_from_df):
- Export the adjacency as a CSV edge list (temp file).
pecanpy.PreCompcomputes Node2Vec embeddings (default: 64-dim, p=q=1).umap.UMAPreduces embeddings todimcoordinates.
Why temp file instead of in-memory? pecanpy's PreComp mode reads from disk —
it is the fastest pecanpy mode and matches how the NSC codebase uses it.
Compatibility shims applied at import time (in _ensure_strnd_deps()):
- scipy.linalg.triu restored — removed in newer scipy, still imported by gensim.
- sklearn.utils.validation.check_array force_all_finite kwarg dropped — renamed in sklearn 1.6+.
These shims are lazy: they only run the first time run_strnd is called.
When it works well: larger graphs (>200 nodes), noisy data, when spatial structure is global rather than just local neighbourhoods. .
Quality Metrics (quality.py)
All three metrics compare sg.positions (ground truth) to sg.reconstructed_positions.
All are rotation-, reflection-, and translation-invariant.
| Function | What it measures | Range |
|---|---|---|
cpd |
Pearson correlation of all pairwise distances | [-1, 1] — higher is better |
knn_preservation |
Mean k-NN neighbourhood overlap | [0, 1] — higher is better |
distortion |
Median relative pairwise distance error | [0, ∞) — lower is better |
These are called by metrics.evaluate() — you generally do not call them directly.
How to Add a New Reconstruction Method
- Create
reconstruct/<method>.pywith a function: - Add a branch in
reconstruct/__init__.py::reconstruct(): - If the method requires optional dependencies, use the lazy import pattern from
strnd.py. - Add a test in
tests/test_reconstruction_report.py.
Contract:
- Must return a float array of shape (n_nodes, dim).
- Must be deterministic when random_state is set.
- Must not modify the input adjacency matrix.
- Must handle disconnected graphs without crashing (fill disconnected pairs with a safe value).
How to Add a New Quality Metric
- Add a function to
reconstruct/quality.py: - Add it to the
evaluate()result dict inmetrics/__init__.py. - Document the range and what "good" looks like in
docs/api/metrics.md.