Module: denoise

File: src/spatial_graph_algorithms/denoise/__init__.py Status: Implemented — scorers, filters, one-call denoising, and ground-truth evaluation.

Purpose

Remove likely-false edges from a SpatialGraph without knowing the ground truth. The input is a noisy graph; the output is a SpatialGraph with suspect edges removed.

This module is the complement of the false-edge injection in simulate — the goal is to recover an edge set closer to the true proximity graph.

API

from spatial_graph_algorithms.denoise import denoise

sg_clean = denoise(sg_noisy, method="jaccard", fraction_to_remove=0.05)

For a two-step workflow, score first and pass the same method name to the filterer. Score polarity is resolved from SCORE_POLARITY; user code should not track score direction manually.

from spatial_graph_algorithms.denoise import EdgeFilterer, score_edges
from spatial_graph_algorithms.metrics import evaluate_denoising

scores = score_edges(sg_noisy, method="jaccard")
sg_clean = EdgeFilterer().by_fraction(
    sg_noisy, scores, fraction=0.10, method="jaccard"
)
report = evaluate_denoising(sg_clean, scores=scores, method="jaccard")

Scoring Methods

Each method returns a dict[tuple[int, int], float] keyed by canonical edge tuples (min(u, v), max(u, v)). Polarity (high or low score = suspect) is registered in SCORE_POLARITY and resolved automatically by all filterers.

Method	Polarity	How it works	Cost
`betweenness`	high = suspect	Edge betweenness centrality via NetworkX	O(VE) — slow on large graphs
`jaccard`	low = suspect	Jaccard neighbourhood overlap; unipartite only	O(E·k)
`square_bipartite`	low = suspect	4-cycle count; bipartite only	O(E·k²)
`community`	low = suspect	Louvain intra-community consistency (igraph, n_runs=5)	O(E log V) per run
`random_walk`	low = suspect	Stochastic walk visitation frequency with bridge rejection	O(n · walks · steps); allocates n×n matrix
`degree`	high = suspect	`max(deg_u, deg_v)` from CSR indptr; no graph construction	O(n + E)

The degree scorer reads node degrees directly from np.diff(adj.indptr) — the fastest possible path since the CSR row-pointer array encodes degree implicitly.

Filtering Strategies

All strategies accept the same (sg, scores, *, method) signature and return a new SpatialGraph with edge_metadata["is_removed"] set.

Strategy	Description
`by_fraction(fraction=0.05)`	Remove the worst fraction of scored edges
`by_threshold(threshold=t)`	Remove edges above/below an explicit score value
`by_percentile(percentile=5.0)`	Remove edges outside a percentile band
`by_optimal_f1(sg, scores)`	Oracle: sweep thresholds to maximise F1 vs ground truth

by_optimal_f1 is a ground-truth oracle and only works on simulated graphs with edge_metadata["is_false"]. Use it to establish the upper bound for a scorer.

Implementation Notes

SpatialGraph is never mutated — all filterers return a new object via sg.replace().
Removed edges are recorded in edge_metadata["is_removed"]; the column is created from scratch if edge_metadata was None.
Score polarity lives in SCORE_POLARITY; consumers must not pass a direction flag.
jaccard raises ValueError on bipartite graphs; square_bipartite raises on unipartite.

File Structure

denoise/
└── __init__.py          — SCORE_POLARITY, EdgeScorer, EdgeFilterer, score_edges(), denoise()