Module: network (SpatialGraph)
File: src/spatial_graph_algorithms/network.py
Status: Stable — core schema. Breaking changes require a major version bump.
Purpose
SpatialGraph is the single shared data object for the entire package.
Every pipeline function takes one as input and returns one as output.
Its job is to hold adjacency, positions, metadata, and reconstructed coordinates
in one place, with a small set of invariants enforced at construction time.
Loading Real Data
Use :func:spatial_graph_algorithms.load_edge_list to load an observational edge list.
It accepts a CSV path or a pandas DataFrame — whichever you already have.
No ground-truth positions, false-edge labels, or simulation metadata are set, so
has_ground_truth is False and positions, edge_metadata, and node_metadata
are all None.
From a CSV file:
from spatial_graph_algorithms import load_edge_list
sg = load_edge_list("my_experiment.csv") # default column names
sg = load_edge_list("edges.csv", source_col="cell_a", target_col="cell_b") # custom names
From a pandas DataFrame (in-memory edge list):
Node identifiers must be integers — a ValueError is raised otherwise.
If your IDs are strings or other types, convert them before loading:
# encode string cell IDs to integers first
df["source"] = df["source"].astype("category").cat.codes
df["target"] = df["target"].astype("category").cat.codes
import pandas as pd
from spatial_graph_algorithms import load_edge_list
df = pd.DataFrame({
"source": [0, 1, 2],
"target": [1, 2, 0],
})
sg = load_edge_list(df)
# Custom column names work the same way
df2 = pd.DataFrame({"u": [0, 1], "v": [1, 2]})
sg2 = load_edge_list(df2, source_col="u", target_col="v")
print(sg)
# SpatialGraph(nodes=3, edges=3, has_ground_truth=False, reconstructed=False)
print(sg.has_ground_truth) # False
print(sg.positions) # None
print(sg.edge_metadata) # None — no is_false column
print(sg.node_metadata) # None — no simulation params
print(sg.node_id_map) # {0: 0, 1: 1, 2: 2}
Once loaded, pass sg to reconstruct() and metrics.graph_properties() as normal.
Functions that require ground truth (e.g. evaluate() comparing reconstructed vs. true
positions) will raise a clear error if has_ground_truth is False.
Design Decisions
Why a dataclass, not a class?
A dataclass makes all fields visible at a glance and avoids boilerplate __init__.
The eq=False override prevents accidental equality comparisons on large sparse matrices.
Why is adjacency stored as CSR? CSR is the standard format for row-wise sparse operations (shortest paths, degree sums). The setter normalises any input format to CSR automatically so callers don't need to know.
Why does the constructor extract the LCC by default?
Disconnected graphs cause undefined behaviour in shortest-path reconstruction.
keep_lcc=True is a safe default that silently discards isolated nodes with a warning.
Set keep_lcc=False when you know the graph is connected or want to preserve all nodes.
Why is nothing mutated in place?
Functions that transform a graph (e.g. reconstruct) return a .copy() with the new
field set. This prevents silent state aliasing bugs in pipeline code.
Invariants
These hold after __post_init__ and must be preserved by any code that modifies the object:
adjacency_matrixis symmetric, CSR, with zero diagonal and no duplicate entries.- If
positionsis set,positions.shape[0] == adjacency_matrix.shape[0]. node_id_mapmaps every original ID to a unique integer in0..n-1.
How to Extend
Adding a new field:
Add it to the dataclass with Optional[...] = None. Update copy() to deep-copy it.
If it should survive LCC extraction, add the slice logic to _extract_lcc.
Adding a new constructor:
Follow the pattern of from_edge_list and from_positions — they are @classmethod
methods that build the adjacency matrix and call cls(...). Always pass keep_lcc
through **kwargs so callers can control LCC behaviour.
Do not:
- Add algorithm logic to network.py. It is a data container only.
- Change the adjacency setter to accept non-square inputs.
- Store mutable defaults in field declarations.
Computed Properties
The following are always available as read-only properties — no imports needed:
| Property | Type | Notes |
|---|---|---|
n_nodes |
int |
adjacency_matrix.shape[0] |
n_edges |
int |
adjacency_matrix.nnz // 2 |
edge_density |
float |
2m / (n(n-1)) |
degree_sequence |
np.ndarray |
cached, invalidated on adjacency change |
mean_degree |
float |
mean of degree_sequence |
degree_distribution |
np.ndarray |
counts per degree value |
n_connected_components |
int |
lazy, cached; always 1 for default keep_lcc=True graphs |
largest_component_fraction |
float |
lazy, cached alongside n_connected_components |
false_edge_fraction |
float \| None |
reads edge_metadata["is_false"]; None when absent |
has_ground_truth |
bool |
positions is not None |
graph |
nx.Graph |
lazy NetworkX view, cached |
n_connected_components and largest_component_fraction are computed together in one
scipy call and both cached. For graphs built with keep_lcc=True (the default), the
call happens during construction and costs nothing at property access time.
Common Mistakes
| Mistake | Why it breaks | Fix |
|---|---|---|
sg.adjacency_matrix = new_adj without going through the setter |
Bypasses normalisation | Use the property setter |
sg2 = sg; sg2.positions = ... |
Aliases the object, mutates the original | Use sg.copy() |
Storing a dense array in adjacency_matrix |
Breaks nnz-based edge count |
Convert to sparse first |
Tests
tests/test_network.py — construction, LCC extraction, from_edge_list, from_positions,
to_edge_dataframe, to_positions_dataframe, copy.