Skip to content

Module: compare

File: src/spatial_graph_algorithms/compare/ Status: Experimental.


Purpose

compare orchestrates metric-first comparative studies. It is designed for questions such as:

  • How do mds, strnd, and landmark_mds perform on the same graph?
  • How does one reconstruction method behave as false-edge noise increases?
  • How sensitive are results to graph-generation mode, shape, or seed?

It returns a ComparisonResult — a thin wrapper around a tidy pandas.DataFrame with built-in helpers for summarising, ranking, and plotting. The raw DataFrame is always accessible via .df for full pandas flexibility.


Typical Usage

from spatial_graph_algorithms.compare import parameter_grid, run_comparison

graphs = parameter_grid(
    base={"n": 500, "dim": 2, "shape": "square", "k": 8},
    vary={
        "mode": ["knn", "delaunay_corrected"],
        "false_edges_fraction": [0.0, 0.05, 0.10],
    },
)

reconstructions = parameter_grid(
    cases=[
        {"method": "mds"},
        {"method": "strnd"},
        {"method": "landmark_mds", "n_landmarks": 32},
        {"method": "landmark_mds", "n_landmarks": 64},
    ],
)

results = run_comparison(
    graph_specs=graphs,
    reconstruction_specs=reconstructions,
    seeds=[1, 2, 3],
)

# Summarise: mean CPD and KNN per (graph condition, method)
results.summary()

# Rank: best method per graph type
results.best(metric="cpd")

# Visualise: grouped bar chart
results.plot(metric="cpd", by="method", hue="graph_label")

# Access the raw tidy DataFrame at any time
results.df.head()

Preview the planned work before running it:

from spatial_graph_algorithms.compare import dry_run_comparison

plan = dry_run_comparison(
    graph_specs=graphs,
    reconstruction_specs=reconstructions,
    seeds=[1, 2, 3],
)

plan[["graph_label", "reconstruction_label", "seed", "method"]]

Save and reload results:

results.save("results/study.csv")
from spatial_graph_algorithms.compare import ComparisonResult
reloaded = ComparisonResult.load("results/study.csv")
reloaded.summary()

Avoiding Unwanted Combinations

Use grouped grids when different graph modes need different parameters. This keeps the study explicit and avoids invalid cartesian products.

graphs = parameter_grid(
    groups=[
        {
            "base": {"n": 500, "dim": 2, "shape": "square", "mode": "knn"},
            "vary": {"k": [4, 8], "false_edges_fraction": [0.0, 0.10]},
        },
        {
            "base": {"n": 500, "dim": 2, "shape": "square", "mode": "epsilon"},
            "vary": {"epsilon": [0.10, 0.20], "false_edges_fraction": [0.0, 0.10]},
        },
        {
            "base": {"n": 500, "dim": 2, "shape": "square", "mode": "delaunay_corrected"},
            "vary": {"false_edges_fraction": [0.0, 0.10]},
        },
    ],
)

For smaller cases, where= can filter an expanded grid, and drop_none=True can remove inactive parameters after filtering.


Output: ComparisonResult

run_comparison() returns a ComparisonResult with one row per:

graph_spec × seed × reconstruction_spec

Built-in analysis methods

Method What it does
.summary(by=..., metrics=...) Mean metrics grouped by ["graph_label", "method"] (default)
.best(metric=..., by=...) Best method per group by one metric
.plot(metric=..., by=..., hue=...) Grouped bar chart; returns matplotlib.Figure
.save(path) Write to CSV or Parquet (inferred from extension)
.load(path) Class method; restore from CSV or Parquet
.df The raw pandas.DataFrame for full pandas access

DataFrame columns

  • graph_label, reconstruction_label, seed, method
  • graph_<param> — generation parameters with prefix
  • recon_<param> — reconstruction parameters with prefix
  • status and error
  • generation_seconds and reconstruction_seconds
  • graph metrics from metrics.evaluate()
  • reconstruction quality metrics: cpd, knn, and optional distortion

Failed graph generation or reconstruction is recorded in the row and the study continues.


Relationship to verify

Use verify.run_report() when you want a complete artifact directory with CSVs and plots for one run.

Use compare.run_comparison() when you want a single in-memory table for many parameter combinations.