Skip to content

Module: network (SpatialGraph)

File: src/spatial_graph_algorithms/network.py Status: Stable — core schema. Breaking changes require a major version bump.


Purpose

SpatialGraph is the single shared data object for the entire package. Every pipeline function takes one as input and returns one as output. Its job is to hold adjacency, positions, metadata, and reconstructed coordinates in one place, with a small set of invariants enforced at construction time.


Loading Real Data

Use :func:spatial_graph_algorithms.load_edge_list to load an observational edge list. It accepts a CSV path or a pandas DataFrame — whichever you already have. No ground-truth positions, false-edge labels, or simulation metadata are set, so has_ground_truth is False and positions, edge_metadata, and node_metadata are all None.

From a CSV file:

from spatial_graph_algorithms import load_edge_list

sg = load_edge_list("my_experiment.csv")                           # default column names
sg = load_edge_list("edges.csv", source_col="cell_a", target_col="cell_b")  # custom names

From a pandas DataFrame (in-memory edge list):

Node identifiers must be integers — a ValueError is raised otherwise. If your IDs are strings or other types, convert them before loading:

# encode string cell IDs to integers first
df["source"] = df["source"].astype("category").cat.codes
df["target"] = df["target"].astype("category").cat.codes
import pandas as pd
from spatial_graph_algorithms import load_edge_list

df = pd.DataFrame({
    "source": [0, 1, 2],
    "target": [1, 2, 0],
})
sg = load_edge_list(df)

# Custom column names work the same way
df2 = pd.DataFrame({"u": [0, 1], "v": [1, 2]})
sg2 = load_edge_list(df2, source_col="u", target_col="v")
print(sg)
# SpatialGraph(nodes=3, edges=3, has_ground_truth=False, reconstructed=False)

print(sg.has_ground_truth)   # False
print(sg.positions)          # None
print(sg.edge_metadata)      # None — no is_false column
print(sg.node_metadata)      # None — no simulation params
print(sg.node_id_map)        # {0: 0, 1: 1, 2: 2}

Once loaded, pass sg to reconstruct() and metrics.graph_properties() as normal. Functions that require ground truth (e.g. evaluate() comparing reconstructed vs. true positions) will raise a clear error if has_ground_truth is False.


Design Decisions

Why a dataclass, not a class? A dataclass makes all fields visible at a glance and avoids boilerplate __init__. The eq=False override prevents accidental equality comparisons on large sparse matrices.

Why is adjacency stored as CSR? CSR is the standard format for row-wise sparse operations (shortest paths, degree sums). The setter normalises any input format to CSR automatically so callers don't need to know.

Why does the constructor extract the LCC by default? Disconnected graphs cause undefined behaviour in shortest-path reconstruction. keep_lcc=True is a safe default that silently discards isolated nodes with a warning. Set keep_lcc=False when you know the graph is connected or want to preserve all nodes.

Why is nothing mutated in place? Functions that transform a graph (e.g. reconstruct) return a .copy() with the new field set. This prevents silent state aliasing bugs in pipeline code.


Invariants

These hold after __post_init__ and must be preserved by any code that modifies the object:

  1. adjacency_matrix is symmetric, CSR, with zero diagonal and no duplicate entries.
  2. If positions is set, positions.shape[0] == adjacency_matrix.shape[0].
  3. node_id_map maps every original ID to a unique integer in 0..n-1.

How to Extend

Adding a new field: Add it to the dataclass with Optional[...] = None. Update copy() to deep-copy it. If it should survive LCC extraction, add the slice logic to _extract_lcc.

Adding a new constructor: Follow the pattern of from_edge_list and from_positions — they are @classmethod methods that build the adjacency matrix and call cls(...). Always pass keep_lcc through **kwargs so callers can control LCC behaviour.

Do not: - Add algorithm logic to network.py. It is a data container only. - Change the adjacency setter to accept non-square inputs. - Store mutable defaults in field declarations.


Computed Properties

The following are always available as read-only properties — no imports needed:

Property Type Notes
n_nodes int adjacency_matrix.shape[0]
n_edges int adjacency_matrix.nnz // 2
edge_density float 2m / (n(n-1))
degree_sequence np.ndarray cached, invalidated on adjacency change
mean_degree float mean of degree_sequence
degree_distribution np.ndarray counts per degree value
n_connected_components int lazy, cached; always 1 for default keep_lcc=True graphs
largest_component_fraction float lazy, cached alongside n_connected_components
false_edge_fraction float \| None reads edge_metadata["is_false"]; None when absent
has_ground_truth bool positions is not None
graph nx.Graph lazy NetworkX view, cached

n_connected_components and largest_component_fraction are computed together in one scipy call and both cached. For graphs built with keep_lcc=True (the default), the call happens during construction and costs nothing at property access time.


Common Mistakes

Mistake Why it breaks Fix
sg.adjacency_matrix = new_adj without going through the setter Bypasses normalisation Use the property setter
sg2 = sg; sg2.positions = ... Aliases the object, mutates the original Use sg.copy()
Storing a dense array in adjacency_matrix Breaks nnz-based edge count Convert to sparse first

Tests

tests/test_network.py — construction, LCC extraction, from_edge_list, from_positions, to_edge_dataframe, to_positions_dataframe, copy.