Skip to content

Module: denoise

File: src/spatial_graph_algorithms/denoise/__init__.py Status: Implemented — scorers, filters, one-call denoising, and ground-truth evaluation.


Purpose

Remove likely-false edges from a SpatialGraph without knowing the ground truth. The input is a noisy graph; the output is a SpatialGraph with suspect edges removed.

This module is the complement of the false-edge injection in simulate — the goal is to recover an edge set closer to the true proximity graph.


API

from spatial_graph_algorithms.denoise import denoise

sg_clean = denoise(sg_noisy, method="jaccard", fraction_to_remove=0.05)

For a two-step workflow, score first and pass the same method name to the filterer. Score polarity is resolved from SCORE_POLARITY; user code should not track score direction manually.

from spatial_graph_algorithms.denoise import EdgeFilterer, score_edges
from spatial_graph_algorithms.metrics import evaluate_denoising

scores = score_edges(sg_noisy, method="jaccard")
sg_clean = EdgeFilterer().by_fraction(
    sg_noisy, scores, fraction=0.10, method="jaccard"
)
report = evaluate_denoising(sg_clean, scores=scores, method="jaccard")

Scoring Methods

Each method returns a dict[tuple[int, int], float] keyed by canonical edge tuples (min(u, v), max(u, v)). Polarity (high or low score = suspect) is registered in SCORE_POLARITY and resolved automatically by all filterers.

Method Polarity How it works Cost
betweenness high = suspect Edge betweenness centrality via NetworkX O(VE) — slow on large graphs
jaccard low = suspect Jaccard neighbourhood overlap; unipartite only O(E·k)
square_bipartite low = suspect 4-cycle count; bipartite only O(E·k²)
community low = suspect Louvain intra-community consistency (igraph, n_runs=5) O(E log V) per run
random_walk low = suspect Stochastic walk visitation frequency with bridge rejection O(n · walks · steps); allocates n×n matrix
degree high = suspect max(deg_u, deg_v) from CSR indptr; no graph construction O(n + E)

The degree scorer reads node degrees directly from np.diff(adj.indptr) — the fastest possible path since the CSR row-pointer array encodes degree implicitly.


Filtering Strategies

All strategies accept the same (sg, scores, *, method) signature and return a new SpatialGraph with edge_metadata["is_removed"] set.

Strategy Description
by_fraction(fraction=0.05) Remove the worst fraction of scored edges
by_threshold(threshold=t) Remove edges above/below an explicit score value
by_percentile(percentile=5.0) Remove edges outside a percentile band
by_optimal_f1(sg, scores) Oracle: sweep thresholds to maximise F1 vs ground truth

by_optimal_f1 is a ground-truth oracle and only works on simulated graphs with edge_metadata["is_false"]. Use it to establish the upper bound for a scorer.


Implementation Notes

  • SpatialGraph is never mutated — all filterers return a new object via sg.replace().
  • Removed edges are recorded in edge_metadata["is_removed"]; the column is created from scratch if edge_metadata was None.
  • Score polarity lives in SCORE_POLARITY; consumers must not pass a direction flag.
  • jaccard raises ValueError on bipartite graphs; square_bipartite raises on unipartite.

File Structure

denoise/
└── __init__.py          — SCORE_POLARITY, EdgeScorer, EdgeFilterer, score_edges(), denoise()