Module: denoise
File: src/spatial_graph_algorithms/denoise/__init__.py
Status: Implemented — scorers, filters, one-call denoising, and ground-truth evaluation.
Purpose
Remove likely-false edges from a SpatialGraph without knowing the ground truth.
The input is a noisy graph; the output is a SpatialGraph with suspect edges removed.
This module is the complement of the false-edge injection in simulate —
the goal is to recover an edge set closer to the true proximity graph.
API
from spatial_graph_algorithms.denoise import denoise
sg_clean = denoise(sg_noisy, method="jaccard", fraction_to_remove=0.05)
For a two-step workflow, score first and pass the same method name to the
filterer. Score polarity is resolved from SCORE_POLARITY; user code should
not track score direction manually.
from spatial_graph_algorithms.denoise import EdgeFilterer, score_edges
from spatial_graph_algorithms.metrics import evaluate_denoising
scores = score_edges(sg_noisy, method="jaccard")
sg_clean = EdgeFilterer().by_fraction(
sg_noisy, scores, fraction=0.10, method="jaccard"
)
report = evaluate_denoising(sg_clean, scores=scores, method="jaccard")
Scoring Methods
Each method returns a dict[tuple[int, int], float] keyed by canonical edge
tuples (min(u, v), max(u, v)). Polarity (high or low score = suspect) is
registered in SCORE_POLARITY and resolved automatically by all filterers.
| Method | Polarity | How it works | Cost |
|---|---|---|---|
betweenness |
high = suspect | Edge betweenness centrality via NetworkX | O(VE) — slow on large graphs |
jaccard |
low = suspect | Jaccard neighbourhood overlap; unipartite only | O(E·k) |
square_bipartite |
low = suspect | 4-cycle count; bipartite only | O(E·k²) |
community |
low = suspect | Louvain intra-community consistency (igraph, n_runs=5) | O(E log V) per run |
random_walk |
low = suspect | Stochastic walk visitation frequency with bridge rejection | O(n · walks · steps); allocates n×n matrix |
degree |
high = suspect | max(deg_u, deg_v) from CSR indptr; no graph construction |
O(n + E) |
The degree scorer reads node degrees directly from np.diff(adj.indptr) — the
fastest possible path since the CSR row-pointer array encodes degree implicitly.
Filtering Strategies
All strategies accept the same (sg, scores, *, method) signature and return
a new SpatialGraph with edge_metadata["is_removed"] set.
| Strategy | Description |
|---|---|
by_fraction(fraction=0.05) |
Remove the worst fraction of scored edges |
by_threshold(threshold=t) |
Remove edges above/below an explicit score value |
by_percentile(percentile=5.0) |
Remove edges outside a percentile band |
by_optimal_f1(sg, scores) |
Oracle: sweep thresholds to maximise F1 vs ground truth |
by_optimal_f1 is a ground-truth oracle and only works on simulated graphs
with edge_metadata["is_false"]. Use it to establish the upper bound for a scorer.
Implementation Notes
SpatialGraphis never mutated — all filterers return a new object viasg.replace().- Removed edges are recorded in
edge_metadata["is_removed"]; the column is created from scratch ifedge_metadatawasNone. - Score polarity lives in
SCORE_POLARITY; consumers must not pass a direction flag. jaccardraisesValueErroron bipartite graphs;square_bipartiteraises on unipartite.