Skip to content

Module: metrics

File: src/spatial_graph_algorithms/metrics/ Status: Stable.


Purpose

Three responsibilities, kept in separate files:

  1. Graph structure properties (graph_properties.py) — degree, density, transitivity. These work on any SpatialGraph, with or without positions.

  2. Graph report (report.py) — GraphReport and graph_report(). Pre-reconstruction characterisation: topology, spatial geometry, and false-edge statistics in one object. Works on any SpatialGraph; spatial section is populated only when positions is set.

  3. Reconstruction quality (__init__.py, delegates to reconstruct/quality.py) — CPD, KNN preservation, distortion. These require both positions and reconstructed_positions to be set.

The unified entry point evaluate() combines responsibilities 1 and 3 and optionally writes a CSV row. quality_table() wraps evaluate() for side-by-side comparison of multiple reconstructions. graph_report() is the entry point for responsibility 2.


What evaluate() Returns

{
    # graph structure (always present)
    "n_nodes": int,
    "n_edges": int,
    "min_degree": float,
    "max_degree": float,
    "mean_degree": float,
    "degree_std": float,
    "density": float,
    "transitivity": float,         # global clustering coefficient
    "largest_component_fraction": float,

    # reconstruction quality (None when positions unavailable)
    "cpd": float | None,
    "knn": float | None,
    "distortion": float | None,   # [0, 1]; None unless compute_distortion=True
}

Metric Interpretation

Metric Range Excellent Acceptable Poor
CPD [0, 1] > 0.95 > 0.85 < 0.7
KNN [0, 1] > 0.70 > 0.50 < 0.30
Distortion [0, 1] < 0.05 < 0.20 > 0.40

All three are rotation-, reflection-, and translation-invariant — they measure structural fidelity, not absolute coordinate agreement.

CPD is the most interpretable: 1.0 means all inter-node distances were perfectly preserved. It is sensitive to global layout errors.

KNN preservation is more sensitive to local neighbourhood recovery — it catches reconstruction methods that get the global shape right but scramble local clusters.

Distortion penalises scale errors that CPD (a correlation) does not. The reconstruction is scale-aligned to the ground truth before scoring, so the result is always in [0, 1].


quality_table()

quality_table(reconstructions, *, k_neighbors=15) is the recommended way to compare multiple reconstructions in one call:

from spatial_graph_algorithms.metrics import quality_table

qt = quality_table({"MDS": sg_mds, "STRND": sg_strnd})
# Returns a DataFrame indexed by method with columns CPD, KNN, Distortion.

It always computes distortion (unlike evaluate(), which requires compute_distortion=True).


GraphReport

graph_report(sg) returns a GraphReport object — the recommended entry point for understanding any SpatialGraph before reconstruction.

from spatial_graph_algorithms.metrics import graph_report

r = graph_report(sg)
r                              # styled HTML table in Jupyter
r.n_connected_components       # topology, always available
r.edge_length_stats            # spatial, None when positions absent
r.diameter                     # on-demand, O(n²), cached after first access
r.plot_degree_distribution()   # returns matplotlib Figure
r.to_dict()                    # flat dict for CSV / pandas pipelines

Always-computed (topology): n_nodes, n_edges, density, mean/min/max_degree, degree_std, n_connected_components, largest_component_fraction, transitivity, average_clustering_coefficient, assortativity.

Computed when positions is set (spatial): edge_length_stats (mean/median/std/min/max), spatial_extent (bounding box + area/volume), local_spatial_density, false_edge_fraction.

On-demand (lazy, cached): diameter, average_path_length, betweenness_centrality_stats.

graph_report and evaluate are complementary, not redundant:

graph_report() evaluate()
When to use Before reconstruction After reconstruction
Inputs needed Any SpatialGraph reconstructed_positions must be set
Spatial metrics Edge lengths, bounding box CPD, KNN, distortion

Design Decisions

Why is evaluate() in metrics/__init__.py rather than a standalone function? It is the public face of the module. Users import from spatial_graph_algorithms.metrics import evaluate and should not need to know about the internal split between graph_properties.py and reconstruct/quality.py.

Why does evaluate() import networkx lazily (import networkx as nx inside the function)? transitivity and largest_component_fraction require constructing a NetworkX graph. This is potentially slow for large graphs. Lazy import keeps the cost visible at the call site rather than at module import time. (In practice, sg.graph is cached so repeated calls do not recompute it.)


How to Add a New Graph Property

  1. Add a function to metrics/graph_properties.py:
    def my_property(sg: SpatialGraph) -> float:
        """One-line description. What range, what edge cases."""
        ...
    
  2. Add it to the results dict in metrics/__init__.py::evaluate().
  3. Export it from metrics/__init__.py::__all__.

How to Add a New Reconstruction Metric

See docs/modules/reconstruct.md — quality metrics live in reconstruct/quality.py and are imported into metrics/__init__.py::evaluate().


Tests

tests/test_graph_properties.py   — degree_sequence, degree_distribution, graph_summary
tests/test_graph_report.py       — GraphReport, graph_report() (27 tests)
tests/test_verify_report.py      — evaluate() called via run_report