Skip to content

SpatialGraph

The central data container for the entire pipeline. Every function accepts and returns SpatialGraph objects.

Constructor patterns

Method Use when
SpatialGraph(adjacency_matrix, ...) You already have a scipy sparse matrix
SpatialGraph.from_edge_list(path_or_df) Loading from a CSV edge list
SpatialGraph.from_positions(array) Starting from a point cloud with no edges

Key attributes

Attribute Type Description
adjacency_matrix scipy.sparse.csr_matrix Symmetric adjacency, zero diagonal
positions np.ndarray or None Ground-truth coordinates (n, dim)
reconstructed_positions np.ndarray or None Set by reconstruct()
edge_metadata pd.DataFrame or None Includes is_false column when simulated
node_metadata pd.DataFrame or None Per-node attributes
graph nx.Graph Lazy NetworkX view (cached)

API Reference

spatial_graph_algorithms.network.SpatialGraph dataclass

Core graph and coordinate container for spatial network analysis.

All pipeline functions accept and return SpatialGraph objects. The adjacency matrix is stored internally as a CSR sparse matrix with the diagonal forced to zero. When keep_lcc is True (the default), only the largest connected component is retained on construction.

Parameters:

Name Type Description Default
positions ndarray

Ground-truth node coordinates, shape (n_nodes, dim).

None
reconstructed_positions ndarray

Coordinates recovered by a reconstruction algorithm, same shape as positions. Set by :func:spatial_graph_algorithms.reconstruct.reconstruct.

None
node_metadata DataFrame

Per-node attributes. Row i corresponds to node i.

None
edge_metadata DataFrame

Per-edge attributes with columns source, target, and optionally is_false (bool) when the graph was created by :func:spatial_graph_algorithms.simulate.generate.

None
node_id_map dict

Mapping from original node identifiers to integer indices 0..n-1. Built automatically when using :meth:from_edge_list.

None
keep_lcc bool

If True (default), only the largest connected component is kept on construction. Set to False when you know the graph is connected or want to preserve all nodes.

True

Attributes:

Name Type Description
adjacency_matrix csr_matrix

Symmetric, unweighted adjacency matrix. Any format accepted by scipy.sparse.csr_matrix is converted automatically. Self-loops are removed silently.

n_nodes int

Number of nodes (always cheap — direct from adjacency shape).

n_edges int

Number of undirected edges (always cheap — nnz // 2).

edge_density float

Fraction of all possible edges present.

degree_sequence ndarray

Per-node degree array; computed once on first access and cached. Cache is cleared automatically when the adjacency matrix changes.

mean_degree float

Mean of :attr:degree_sequence.

degree_distribution ndarray

Counts per degree value; derived from :attr:degree_sequence.

mean_shortest_path float or None

Mean pairwise shortest-path length. None until populated by a graph-structure computation (see :mod:spatial_graph_algorithms.metrics).

shortest_path_distribution ndarray or None

Histogram of pairwise shortest-path lengths. None until populated alongside :attr:mean_shortest_path.

Examples:

Build from a scipy sparse matrix:

>>> import scipy.sparse
>>> adj = scipy.sparse.eye(3, format="csr") * 0  # 3 isolated nodes
>>> sn = SpatialGraph(adjacency_matrix=adj, keep_lcc=False)
>>> sn.adjacency_matrix.shape
(3, 3)
Source code in src/spatial_graph_algorithms/network.py
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
@dataclass(eq=False)
class SpatialGraph:
    """Core graph and coordinate container for spatial network analysis.

    All pipeline functions accept and return ``SpatialGraph`` objects.  The
    adjacency matrix is stored internally as a CSR sparse matrix with the
    diagonal forced to zero.  When *keep_lcc* is ``True`` (the default), only
    the largest connected component is retained on construction.

    Parameters
    ----------
    positions : np.ndarray, optional
        Ground-truth node coordinates, shape ``(n_nodes, dim)``.
    reconstructed_positions : np.ndarray, optional
        Coordinates recovered by a reconstruction algorithm, same shape as
        *positions*.  Set by :func:`spatial_graph_algorithms.reconstruct.reconstruct`.
    node_metadata : pd.DataFrame, optional
        Per-node attributes.  Row *i* corresponds to node *i*.
    edge_metadata : pd.DataFrame, optional
        Per-edge attributes with columns ``source``, ``target``, and optionally
        ``is_false`` (bool) when the graph was created by
        :func:`spatial_graph_algorithms.simulate.generate`.
    node_id_map : dict, optional
        Mapping from original node identifiers to integer indices ``0..n-1``.
        Built automatically when using :meth:`from_edge_list`.
    keep_lcc : bool
        If ``True`` (default), only the largest connected component is kept on
        construction.  Set to ``False`` when you know the graph is connected or
        want to preserve all nodes.

    Attributes
    ----------
    adjacency_matrix : scipy.sparse.csr_matrix
        Symmetric, unweighted adjacency matrix.  Any format accepted by
        ``scipy.sparse.csr_matrix`` is converted automatically.  Self-loops are
        removed silently.
    n_nodes : int
        Number of nodes (always cheap — direct from adjacency shape).
    n_edges : int
        Number of undirected edges (always cheap — ``nnz // 2``).
    edge_density : float
        Fraction of all possible edges present.
    degree_sequence : np.ndarray
        Per-node degree array; computed once on first access and cached.
        Cache is cleared automatically when the adjacency matrix changes.
    mean_degree : float
        Mean of :attr:`degree_sequence`.
    degree_distribution : np.ndarray
        Counts per degree value; derived from :attr:`degree_sequence`.
    mean_shortest_path : float or None
        Mean pairwise shortest-path length.  ``None`` until populated by a
        graph-structure computation (see :mod:`spatial_graph_algorithms.metrics`).
    shortest_path_distribution : np.ndarray or None
        Histogram of pairwise shortest-path lengths.  ``None`` until populated
        alongside :attr:`mean_shortest_path`.

    Examples
    --------
    Build from a scipy sparse matrix:

    >>> import scipy.sparse
    >>> adj = scipy.sparse.eye(3, format="csr") * 0  # 3 isolated nodes
    >>> sn = SpatialGraph(adjacency_matrix=adj, keep_lcc=False)
    >>> sn.adjacency_matrix.shape
    (3, 3)
    """

    adjacency_matrix: scipy.sparse.csr_matrix
    positions: np.ndarray | None = None
    reconstructed_positions: np.ndarray | None = None
    node_metadata: pd.DataFrame | None = None
    edge_metadata: pd.DataFrame | None = None
    node_id_map: dict[Any, int] | None = None
    source_indices: np.ndarray | None = None
    keep_lcc: bool = True
    _nx_cache: nx.Graph | None = None
    _degree_cache: np.ndarray | None = None
    _mean_shortest_path_cache: float | None = None
    _shortest_path_dist_cache: np.ndarray | None = None
    _n_components_cache: int | None = None
    _lcc_fraction_cache: float | None = None

    def __post_init__(self) -> None:
        # Move to backing field once, then route all future writes via property setter.
        initial_adj = self._normalize_adjacency(self.adjacency_matrix)
        object.__setattr__(self, "_adjacency_matrix", initial_adj)
        self.__dict__.pop("adjacency_matrix", None)

        if self.node_id_map is None:
            n = self._adjacency_matrix.shape[0]
            self.node_id_map = {i: i for i in range(n)}

        if (
            self.positions is not None
            and self.positions.shape[0] != self._adjacency_matrix.shape[0]
        ):
            raise ValueError(
                "positions row count must match adjacency_matrix shape[0] "
                f"({self.positions.shape[0]} != {self._adjacency_matrix.shape[0]})"
            )

        if self.keep_lcc:
            self._extract_lcc()

    @staticmethod
    def _normalize_adjacency(value: scipy.sparse.csr_matrix) -> scipy.sparse.csr_matrix:
        adj = scipy.sparse.csr_matrix(value).copy()
        adj.setdiag(0)
        adj.eliminate_zeros()
        return adj

    @property
    def adjacency_matrix(self) -> scipy.sparse.csr_matrix:
        """CSR adjacency matrix with zero diagonal and no duplicate entries."""
        return self._adjacency_matrix

    @adjacency_matrix.setter
    def adjacency_matrix(self, value: scipy.sparse.csr_matrix) -> None:
        self._adjacency_matrix = self._normalize_adjacency(value)
        self._nx_cache = None
        self._degree_cache = None
        self._mean_shortest_path_cache = None
        self._shortest_path_dist_cache = None
        self._n_components_cache = None
        self._lcc_fraction_cache = None

    @property
    def graph(self) -> nx.Graph:
        """NetworkX undirected graph built lazily from the adjacency matrix.

        The result is cached; assigning a new *adjacency_matrix* invalidates
        the cache automatically.

        Returns
        -------
        nx.Graph
            Undirected graph with integer node labels ``0..n-1``.
        """
        if self._nx_cache is None:
            self._nx_cache = nx.from_scipy_sparse_array(self._adjacency_matrix)
        return self._nx_cache

    @property
    def n_nodes(self) -> int:
        """Number of nodes."""
        return self._adjacency_matrix.shape[0]

    @property
    def n_edges(self) -> int:
        """Number of undirected edges (no self-loops)."""
        return self._adjacency_matrix.nnz // 2

    @property
    def edge_density(self) -> float:
        """Fraction of all possible edges that are present.

        Returns 0.0 for graphs with fewer than 2 nodes.
        """
        n = self.n_nodes
        return 2.0 * self.n_edges / (n * (n - 1)) if n > 1 else 0.0

    @property
    def degree_sequence(self) -> np.ndarray:
        """Degree of every node as a 1-D integer array, computed once and cached.

        Returns
        -------
        np.ndarray
            Shape ``(n_nodes,)``, ``dtype=int``.
        """
        if self._degree_cache is None:
            self._degree_cache = (
                np.asarray(self._adjacency_matrix.sum(axis=1)).ravel().astype(int)
            )
        return self._degree_cache

    @property
    def mean_degree(self) -> float:
        """Mean node degree."""
        return float(self.degree_sequence.mean())

    @property
    def degree_distribution(self) -> np.ndarray:
        """Counts per degree value: index *k* gives the number of degree-*k* nodes.

        Returns
        -------
        np.ndarray
            Shape ``(max_degree + 1,)``, integer counts.
        """
        return np.bincount(self.degree_sequence)

    @property
    def mean_shortest_path(self) -> float | None:
        """Mean pairwise shortest-path length, or ``None`` if not yet computed.

        Populated automatically when a graph-structure computation (e.g. in
        :mod:`spatial_graph_algorithms.metrics`) runs and caches the result, or
        by direct assignment to ``sn._mean_shortest_path_cache``.

        TODO: implement lazy auto-computation via
        ``scipy.sparse.csgraph.shortest_path`` (O(n²)–O(n³); too expensive to
        trigger silently on large graphs).
        """
        return self._mean_shortest_path_cache

    @property
    def shortest_path_distribution(self) -> np.ndarray | None:
        """Histogram of pairwise shortest-path lengths, or ``None`` if not yet computed.

        Index *k* gives the count of node pairs at shortest-path distance *k*.
        Populated alongside :attr:`mean_shortest_path`.

        TODO: implement together with ``mean_shortest_path``.
        """
        return self._shortest_path_dist_cache

    def _extract_lcc(self) -> None:
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        if n_components <= 1:
            # Cache the result we just computed — free since we already have it.
            self._n_components_cache = int(n_components)
            n = self._adjacency_matrix.shape[0]
            self._lcc_fraction_cache = 1.0 if n > 0 else 0.0
            return

        counts = np.bincount(labels)
        lcc_label = int(np.argmax(counts))
        keep_idx = np.where(labels == lcc_label)[0]

        warnings.warn(
            f"Found {n_components} connected components; "
            f"keeping largest with {len(keep_idx)} nodes.",
            UserWarning,
            stacklevel=2,
        )

        self._adjacency_matrix = self._adjacency_matrix[keep_idx][:, keep_idx].tocsr()
        self._nx_cache = None
        self._degree_cache = None
        self._mean_shortest_path_cache = None
        self._shortest_path_dist_cache = None
        # After extraction the graph is connected by construction.
        self._n_components_cache = 1
        self._lcc_fraction_cache = 1.0

        if self.positions is not None:
            self.positions = self.positions[keep_idx]
        if self.reconstructed_positions is not None:
            self.reconstructed_positions = self.reconstructed_positions[keep_idx]
        if self.node_metadata is not None:
            self.node_metadata = self.node_metadata.iloc[keep_idx].reset_index(drop=True)

        if self.node_id_map is not None:
            inverse = {v: k for k, v in self.node_id_map.items()}
            kept_original = [inverse[int(i)] for i in keep_idx]
            self.node_id_map = {orig: new_i for new_i, orig in enumerate(kept_original)}

        # Compose with any prior filtering so source_indices always points to the
        # root graph, regardless of how many extraction steps have been chained.
        n_before = labels.shape[0]
        base = self.source_indices if self.source_indices is not None else np.arange(n_before)
        self.source_indices = base[keep_idx]

        if (
            self.edge_metadata is not None
            and isinstance(self.edge_metadata, pd.DataFrame)
            and {"source", "target"}.issubset(self.edge_metadata.columns)
        ):
            keep_set = set(keep_idx.tolist())
            mask = (
                self.edge_metadata["source"].isin(keep_set)
                & self.edge_metadata["target"].isin(keep_set)
            )
            remap = {int(old): new for new, old in enumerate(keep_idx)}
            meta = self.edge_metadata.loc[mask].copy()
            meta["source"] = meta["source"].map(remap)
            meta["target"] = meta["target"].map(remap)
            self.edge_metadata = meta.reset_index(drop=True)

    @classmethod
    def from_edge_list(
        cls,
        source: str | pd.DataFrame,
        source_col: str = "source",
        target_col: str = "target",
        **kwargs: Any,
    ) -> SpatialGraph:
        """Construct a SpatialGraph from a CSV edge list or a DataFrame.

        This is the entry point for **observational (real) data**.  Only the
        topology is known, so :attr:`positions`, :attr:`edge_metadata`, and
        :attr:`node_metadata` are all ``None`` after construction.  Use
        :attr:`has_ground_truth` to check this programmatically.

        Node identifiers in the edge list can be arbitrary (strings, integers,
        etc.).  They are remapped to a contiguous integer range ``0..n-1`` and
        the mapping is stored in :attr:`node_id_map`.

        Parameters
        ----------
        source : str or pd.DataFrame
            Path to a CSV file or a DataFrame with at least two columns
            containing source and target node identifiers.
        source_col : str
            Column name for the source node.  Default is ``"source"``.
        target_col : str
            Column name for the target node.  Default is ``"target"``.
        **kwargs
            Additional keyword arguments forwarded to the constructor (e.g.
            ``keep_lcc``).

        Returns
        -------
        SpatialGraph
            A new instance with *adjacency_matrix* and *node_id_map* populated.
            *positions*, *edge_metadata*, and *node_metadata* are all ``None``
            — no ground truth is available.

        Raises
        ------
        KeyError
            If *source_col* or *target_col* are not present in the data.
        ValueError
            If any node identifier is not an integer.  Convert non-integer IDs
            before calling this method, e.g.
            ``df["source"] = df["source"].astype(int)``.

        Examples
        --------
        >>> import pandas as pd
        >>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
        >>> sn = SpatialGraph.from_edge_list(df)
        >>> sn.has_ground_truth
        False
        >>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")
        """
        if isinstance(source, str):
            df = pd.read_csv(source)
        else:
            df = source

        all_ids = set(df[source_col].tolist()).union(df[target_col].tolist())
        non_int = [v for v in all_ids if not isinstance(v, (int, np.integer))]
        if non_int:
            sample = non_int[:5]
            raise ValueError(
                f"Node identifiers must be integers. "
                f"Found non-integer values: {sample!r}. "
                "Convert them first, e.g.: "
                f"df['{source_col}'] = df['{source_col}'].astype(int)"
            )

        node_ids = sorted(all_ids)
        id_map = {orig: i for i, orig in enumerate(node_ids)}

        n = len(node_ids)
        rows = np.array([id_map[v] for v in df[source_col]], dtype=int)
        cols = np.array([id_map[v] for v in df[target_col]], dtype=int)
        data = np.ones(len(rows), dtype=float)

        adj = scipy.sparse.csr_matrix(
            (
                np.concatenate([data, data]),
                (np.concatenate([rows, cols]), np.concatenate([cols, rows])),
            ),
            shape=(n, n),
        )

        return cls(adjacency_matrix=adj, node_id_map=id_map, **kwargs)

    @classmethod
    def from_positions(
        cls,
        source: str | pd.DataFrame | np.ndarray,
        **kwargs: Any,
    ) -> SpatialGraph:
        """Construct a SpatialGraph from a point-cloud (no edges).

        The adjacency matrix is initialised as an empty *n × n* matrix.  Use
        :func:`spatial_graph_algorithms.simulate.generate` if you want edges built
        automatically from the positions.

        Parameters
        ----------
        source : str, pd.DataFrame, or np.ndarray
            Node coordinates.  A CSV path or DataFrame is converted to a float
            array; numeric columns are used as spatial dimensions.
        **kwargs
            Additional keyword arguments forwarded to the constructor.  Note
            that *keep_lcc* defaults to ``False`` here because an empty graph
            has no connected components.

        Returns
        -------
        SpatialGraph
            A new instance with *positions* set and an empty adjacency matrix.

        Examples
        --------
        >>> import numpy as np
        >>> pts = np.random.default_rng(0).random((50, 2))
        >>> sn = SpatialGraph.from_positions(pts)
        >>> sn.positions.shape
        (50, 2)
        """
        if isinstance(source, str):
            positions = pd.read_csv(source).values.astype(float)
        elif isinstance(source, pd.DataFrame):
            positions = source.values.astype(float)
        else:
            positions = np.asarray(source, dtype=float)

        n = positions.shape[0]
        adj = scipy.sparse.csr_matrix((n, n))
        id_map = {i: i for i in range(n)}

        keep_lcc = kwargs.pop("keep_lcc", False)
        return cls(
            adjacency_matrix=adj,
            positions=positions,
            node_id_map=id_map,
            keep_lcc=keep_lcc,
            **kwargs,
        )

    def to_edge_dataframe(self) -> pd.DataFrame:
        """Export unique edges as a DataFrame.

        Returns
        -------
        pd.DataFrame
            Two-column DataFrame with integer columns ``source`` and ``target``.
            Only the upper-triangle edges are returned (each edge once).
        """
        coo = self._adjacency_matrix.tocoo()
        mask = coo.row < coo.col
        return pd.DataFrame({"source": coo.row[mask], "target": coo.col[mask]})

    def to_positions_dataframe(self) -> pd.DataFrame:
        """Export node positions as a DataFrame.

        Column names are ``x``, ``y`` for 2-D and ``x``, ``y``, ``z`` for 3-D.
        Higher-dimensional arrays use ``x0``, ``x1``, ... notation.

        Returns
        -------
        pd.DataFrame
            One row per node, one column per spatial dimension.

        Raises
        ------
        ValueError
            If :attr:`positions` is ``None``.
        """
        if self.positions is None:
            raise ValueError("positions is None — nothing to export")

        dim = self.positions.shape[1]
        if dim == 2:
            cols = ["x", "y"]
        elif dim == 3:
            cols = ["x", "y", "z"]
        else:
            cols = [f"x{i}" for i in range(dim)]

        return pd.DataFrame(self.positions, columns=cols)

    @property
    def n_connected_components(self) -> int:
        """Number of connected components in the graph.

        Computed lazily on first access and cached.  For graphs constructed
        with ``keep_lcc=True`` (the default) this is always 1 after
        construction; graphs loaded with ``keep_lcc=False`` may have more.
        """
        if self._n_components_cache is None:
            self._compute_components()
        return self._n_components_cache  # type: ignore[return-value]

    @property
    def largest_component_fraction(self) -> float:
        """Fraction of nodes in the largest connected component.

        1.0 when the graph is fully connected.  Computed together with
        :attr:`n_connected_components` and cached on first access.
        """
        if self._lcc_fraction_cache is None:
            self._compute_components()
        return self._lcc_fraction_cache  # type: ignore[return-value]

    def _compute_components(self) -> None:
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        self._n_components_cache = int(n_components)
        n = self._adjacency_matrix.shape[0]
        if n > 0 and n_components > 0:
            lcc_size = int(np.bincount(labels).max())
            self._lcc_fraction_cache = lcc_size / n
        else:
            self._lcc_fraction_cache = 0.0

    @property
    def false_edge_fraction(self) -> float | None:
        """Fraction of edges marked as false (injected noise), or ``None``.

        Returns ``None`` when :attr:`edge_metadata` is absent or has no
        ``is_false`` column (e.g. real-data graphs loaded via
        :meth:`from_edge_list`).

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, false_edges_fraction=0.10, seed=0)
        >>> sg.false_edge_fraction > 0
        True
        """
        if self.edge_metadata is None or "is_false" not in self.edge_metadata.columns:
            return None
        return float(self.edge_metadata["is_false"].mean())

    @property
    def has_ground_truth(self) -> bool:
        """True when positions were set at construction (simulated or labelled data).

        Returns ``False`` for graphs loaded from a raw edge list where no
        ground-truth coordinates or false-edge labels are available.  Pipeline
        functions that require ground truth (e.g. :func:`metrics.evaluate` with
        ``positions`` comparison) can check this flag before running.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sn = generate(n=100, seed=0)
        >>> sn.has_ground_truth
        True
        >>> sn_real = SpatialGraph.from_edge_list(sn.to_edge_dataframe())
        >>> sn_real.has_ground_truth
        False
        """
        return self.positions is not None

    def __repr__(self) -> str:
        gt = self.has_ground_truth
        rec = self.reconstructed_positions is not None
        return (
            f"SpatialGraph(nodes={self.n_nodes}, edges={self.n_edges}, "
            f"has_ground_truth={gt}, reconstructed={rec})"
        )

    def graph_report(
        self,
        *,
        plot: bool = True,
        figsize: tuple = (4.5, 3.0),
        with_all_pairs: bool = False,
        bins: int = 30,
        max_edge_sample: int | None = 50_000,
        max_pair_sample: int = 200_000,
        seed: int | None = None,
    ) -> str:
        """Return a human-readable summary of the graph and optionally show plots.

        When *plot* is ``True``, two figures are displayed:

        - **Degree distribution** — histogram of node degrees with mean and
          median marked.
        - **Edge length histogram** — requires *positions*.  With
          *with_all_pairs* enabled, overlays the all-pairs distance
          distribution as a reference.

        Parameters
        ----------
        plot : bool
            If ``True`` (default), render the degree distribution and, when
            positions are available, the edge-length histogram.
        figsize : tuple of float
            Figure size ``(width, height)`` in inches, applied to both plots.
        with_all_pairs : bool
            When ``True``, overlay the all-pairs distance reference line on
            the edge-length histogram.  Ignored when *plot* is ``False`` or
            positions are unavailable.
        bins : int
            Number of bins for the edge-length histogram.
        max_edge_sample : int or None
            Maximum edges used for the edge-length histogram; larger graphs
            are sampled.  ``None`` disables sampling.
        max_pair_sample : int
            Maximum all-pair distances sampled for the reference line.
        seed : int or None
            Random seed for reproducible sampling.

        Returns
        -------
        str
            Multi-line summary of basic graph statistics.
        """
        n = self._adjacency_matrix.shape[0]
        e = self._adjacency_matrix.nnz // 2
        density = e / (n * (n - 1) / 2) if n > 1 else 0
        gt = self.has_ground_truth
        rec = self.reconstructed_positions is not None
        bipartite = nx.is_bipartite(self.graph)

        if plot:
            from spatial_graph_algorithms.plot import (
                plot_degree_distribution,
                plot_edge_length_histogram,
            )
            plot_degree_distribution(self, figsize=figsize)
            if self.positions is not None:
                plot_edge_length_histogram(
                    self,
                    figsize=figsize,
                    bins=bins,
                    with_all_pairs=with_all_pairs,
                    max_edge_sample=max_edge_sample,
                    max_pair_sample=max_pair_sample,
                    seed=seed,
                )

        return (
            f"SpatialGraph Report:\n"
            f"Nodes: {n}\n"
            f"Edges: {e}\n"
            f"Density: {density:.4f}\n"
            f"Bipartite: {bipartite}\n"
            f"Has Ground Truth: {gt}\n"
            f"Has Reconstructed Positions: {rec}"
        )

    def copy(self) -> SpatialGraph:
        """Return a deep copy of this SpatialGraph.

        The copy is constructed with ``keep_lcc=False`` because LCC extraction
        has already been applied to the source object.

        Returns
        -------
        SpatialGraph
            Independent copy with all arrays and DataFrames deep-copied.
        """
        return SpatialGraph(
            adjacency_matrix=self._adjacency_matrix.copy(),
            positions=None if self.positions is None else self.positions.copy(),
            reconstructed_positions=(
                None
                if self.reconstructed_positions is None
                else self.reconstructed_positions.copy()
            ),
            node_metadata=(
                None if self.node_metadata is None else self.node_metadata.copy(deep=True)
            ),
            edge_metadata=(
                None if self.edge_metadata is None else self.edge_metadata.copy(deep=True)
            ),
            node_id_map=None if self.node_id_map is None else dict(self.node_id_map),
            source_indices=None if self.source_indices is None else self.source_indices.copy(),
            keep_lcc=False,
        )

    def subgraph(self, node_indices: np.ndarray | list[int]) -> SpatialGraph:
        """Return a new SpatialGraph restricted to *node_indices*.

        All metadata fields are sliced consistently and edge indices are
        remapped to the new contiguous range ``0..len(node_indices)-1``.
        This covers both one-off node selection and repeated LCC re-extraction
        after pipeline steps that change graph topology.

        Parameters
        ----------
        node_indices : array-like of int
            Indices of nodes to keep, in any order.  Duplicates are silently
            deduplicated; order is preserved after deduplication.

        Returns
        -------
        SpatialGraph
            New instance with ``keep_lcc=False``.  ``source_indices`` is
            composed with the parent's ``source_indices`` so it always refers
            back to the original graph's node numbering.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, seed=0)
        >>> sg_sub = sg.subgraph(range(20))
        >>> sg_sub.n_nodes
        20
        """
        keep = np.asarray(list(dict.fromkeys(int(i) for i in node_indices)), dtype=np.intp)
        remap = {int(old): new for new, old in enumerate(keep)}

        new_adj = self._adjacency_matrix[keep][:, keep].tocsr()

        new_positions = None if self.positions is None else self.positions[keep]
        new_rec = (
            None
            if self.reconstructed_positions is None
            else self.reconstructed_positions[keep]
        )
        new_node_meta = (
            None
            if self.node_metadata is None
            else self.node_metadata.iloc[keep].reset_index(drop=True)
        )

        new_edge_meta = None
        if (
            self.edge_metadata is not None
            and isinstance(self.edge_metadata, pd.DataFrame)
            and {"source", "target"}.issubset(self.edge_metadata.columns)
        ):
            keep_set = set(keep.tolist())
            mask = (
                self.edge_metadata["source"].isin(keep_set)
                & self.edge_metadata["target"].isin(keep_set)
            )
            meta = self.edge_metadata.loc[mask].copy()
            meta["source"] = meta["source"].map(remap)
            meta["target"] = meta["target"].map(remap)
            new_edge_meta = meta.reset_index(drop=True)

        new_node_id_map = None
        if self.node_id_map is not None:
            inverse = {v: k for k, v in self.node_id_map.items()}
            new_node_id_map = {
                inverse[int(old)]: new for new, old in enumerate(keep)
            }

        base = self.source_indices if self.source_indices is not None else np.arange(
            self._adjacency_matrix.shape[0]
        )
        new_source_indices = base[keep]

        return SpatialGraph(
            adjacency_matrix=new_adj,
            positions=new_positions,
            reconstructed_positions=new_rec,
            node_metadata=new_node_meta,
            edge_metadata=new_edge_meta,
            node_id_map=new_node_id_map,
            source_indices=new_source_indices,
            keep_lcc=False,
        )

    def lcc_subgraph(self) -> SpatialGraph:
        """Return a new SpatialGraph containing only the largest connected component.

        Unlike the ``keep_lcc=True`` constructor option, this method can be
        called at any point in the pipeline — after denoising, BFS sampling,
        or manual edge filtering — without discarding metadata.

        Returns
        -------
        SpatialGraph
            New instance restricted to the LCC, with all metadata fields
            filtered and remapped consistently.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, seed=0, keep_lcc=False)
        >>> sg_lcc = sg.lcc_subgraph()
        >>> sg_lcc.n_connected_components
        1
        """
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        if n_components <= 1:
            return self.copy()
        lcc_label = int(np.argmax(np.bincount(labels)))
        keep_idx = np.where(labels == lcc_label)[0]
        return self.subgraph(keep_idx)

    def replace(self, **changes: Any) -> SpatialGraph:
        """Return a new SpatialGraph with selected fields replaced.

        Unlike :meth:`copy`, the adjacency matrix is shared by reference when
        not in *changes*, making pipeline steps that only attach positions or
        metadata essentially free in terms of adjacency allocation.  The
        NetworkX cache is also shared when the adjacency matrix is unchanged.

        Passing a ``adjacency_matrix`` that is a different object from the
        current one emits :class:`AdjacencyReplaceWarning` to surface the
        allocation cost on large graphs.

        Parameters
        ----------
        **changes
            Fields to override.  Valid keys: ``adjacency_matrix``,
            ``positions``, ``reconstructed_positions``, ``node_metadata``,
            ``edge_metadata``, ``node_id_map``, ``source_indices``.

        Returns
        -------
        SpatialGraph
            New instance.  Shares ``_adjacency_matrix`` and ``_nx_cache``
            with the original when ``adjacency_matrix`` is not in *changes*.

        Warns
        -----
        AdjacencyReplaceWarning
            When *adjacency_matrix* is replaced with a different object.

        Examples
        --------
        >>> sn_rec = sn.replace(reconstructed_positions=coords)
        >>> sn_rec.adjacency_matrix is sn.adjacency_matrix
        True
        """
        new_adj = changes.pop("adjacency_matrix", _MISSING)

        if new_adj is not _MISSING and new_adj is not self._adjacency_matrix:
            warnings.warn(
                f"replace() is allocating a new adjacency_matrix copy "
                f"({self._adjacency_matrix.shape[0]} nodes). "
                "Reuse the existing matrix if the topology has not changed.",
                AdjacencyReplaceWarning,
                stacklevel=2,
            )
            return SpatialGraph(
                adjacency_matrix=new_adj,
                positions=changes.get("positions", self.positions),
                reconstructed_positions=changes.get(
                    "reconstructed_positions", self.reconstructed_positions
                ),
                node_metadata=changes.get("node_metadata", self.node_metadata),
                edge_metadata=changes.get("edge_metadata", self.edge_metadata),
                node_id_map=changes.get("node_id_map", self.node_id_map),
                source_indices=changes.get("source_indices", self.source_indices),
                keep_lcc=False,
            )

        # Fast path: adjacency unchanged — share reference, bypass __post_init__
        inst = object.__new__(SpatialGraph)
        object.__setattr__(inst, "_adjacency_matrix", self._adjacency_matrix)
        inst._nx_cache = self._nx_cache
        inst._degree_cache = self._degree_cache
        inst._mean_shortest_path_cache = self._mean_shortest_path_cache
        inst._shortest_path_dist_cache = self._shortest_path_dist_cache
        inst._n_components_cache = self._n_components_cache
        inst._lcc_fraction_cache = self._lcc_fraction_cache
        inst.positions = changes.get("positions", self.positions)
        inst.reconstructed_positions = changes.get(
            "reconstructed_positions", self.reconstructed_positions
        )
        inst.node_metadata = changes.get("node_metadata", self.node_metadata)
        inst.edge_metadata = changes.get("edge_metadata", self.edge_metadata)
        inst.node_id_map = changes.get("node_id_map", self.node_id_map)
        inst.source_indices = changes.get("source_indices", self.source_indices)
        inst.keep_lcc = False
        return inst

Attributes

adjacency_matrix property writable

CSR adjacency matrix with zero diagonal and no duplicate entries.

graph property

NetworkX undirected graph built lazily from the adjacency matrix.

The result is cached; assigning a new adjacency_matrix invalidates the cache automatically.

Returns:

Type Description
Graph

Undirected graph with integer node labels 0..n-1.

n_nodes property

Number of nodes.

n_edges property

Number of undirected edges (no self-loops).

edge_density property

Fraction of all possible edges that are present.

Returns 0.0 for graphs with fewer than 2 nodes.

degree_sequence property

Degree of every node as a 1-D integer array, computed once and cached.

Returns:

Type Description
ndarray

Shape (n_nodes,), dtype=int.

mean_degree property

Mean node degree.

degree_distribution property

Counts per degree value: index k gives the number of degree-k nodes.

Returns:

Type Description
ndarray

Shape (max_degree + 1,), integer counts.

mean_shortest_path property

Mean pairwise shortest-path length, or None if not yet computed.

Populated automatically when a graph-structure computation (e.g. in :mod:spatial_graph_algorithms.metrics) runs and caches the result, or by direct assignment to sn._mean_shortest_path_cache.

TODO: implement lazy auto-computation via scipy.sparse.csgraph.shortest_path (O(n²)–O(n³); too expensive to trigger silently on large graphs).

shortest_path_distribution property

Histogram of pairwise shortest-path lengths, or None if not yet computed.

Index k gives the count of node pairs at shortest-path distance k. Populated alongside :attr:mean_shortest_path.

TODO: implement together with mean_shortest_path.

n_connected_components property

Number of connected components in the graph.

Computed lazily on first access and cached. For graphs constructed with keep_lcc=True (the default) this is always 1 after construction; graphs loaded with keep_lcc=False may have more.

largest_component_fraction property

Fraction of nodes in the largest connected component.

1.0 when the graph is fully connected. Computed together with :attr:n_connected_components and cached on first access.

false_edge_fraction property

Fraction of edges marked as false (injected noise), or None.

Returns None when :attr:edge_metadata is absent or has no is_false column (e.g. real-data graphs loaded via :meth:from_edge_list).

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, false_edges_fraction=0.10, seed=0)
>>> sg.false_edge_fraction > 0
True

has_ground_truth property

True when positions were set at construction (simulated or labelled data).

Returns False for graphs loaded from a raw edge list where no ground-truth coordinates or false-edge labels are available. Pipeline functions that require ground truth (e.g. :func:metrics.evaluate with positions comparison) can check this flag before running.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sn = generate(n=100, seed=0)
>>> sn.has_ground_truth
True
>>> sn_real = SpatialGraph.from_edge_list(sn.to_edge_dataframe())
>>> sn_real.has_ground_truth
False

Functions

from_edge_list(source, source_col='source', target_col='target', **kwargs) classmethod

Construct a SpatialGraph from a CSV edge list or a DataFrame.

This is the entry point for observational (real) data. Only the topology is known, so :attr:positions, :attr:edge_metadata, and :attr:node_metadata are all None after construction. Use :attr:has_ground_truth to check this programmatically.

Node identifiers in the edge list can be arbitrary (strings, integers, etc.). They are remapped to a contiguous integer range 0..n-1 and the mapping is stored in :attr:node_id_map.

Parameters:

Name Type Description Default
source str or DataFrame

Path to a CSV file or a DataFrame with at least two columns containing source and target node identifiers.

required
source_col str

Column name for the source node. Default is "source".

'source'
target_col str

Column name for the target node. Default is "target".

'target'
**kwargs Any

Additional keyword arguments forwarded to the constructor (e.g. keep_lcc).

{}

Returns:

Type Description
SpatialGraph

A new instance with adjacency_matrix and node_id_map populated. positions, edge_metadata, and node_metadata are all None — no ground truth is available.

Raises:

Type Description
KeyError

If source_col or target_col are not present in the data.

ValueError

If any node identifier is not an integer. Convert non-integer IDs before calling this method, e.g. df["source"] = df["source"].astype(int).

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
>>> sn = SpatialGraph.from_edge_list(df)
>>> sn.has_ground_truth
False
>>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")
Source code in src/spatial_graph_algorithms/network.py
@classmethod
def from_edge_list(
    cls,
    source: str | pd.DataFrame,
    source_col: str = "source",
    target_col: str = "target",
    **kwargs: Any,
) -> SpatialGraph:
    """Construct a SpatialGraph from a CSV edge list or a DataFrame.

    This is the entry point for **observational (real) data**.  Only the
    topology is known, so :attr:`positions`, :attr:`edge_metadata`, and
    :attr:`node_metadata` are all ``None`` after construction.  Use
    :attr:`has_ground_truth` to check this programmatically.

    Node identifiers in the edge list can be arbitrary (strings, integers,
    etc.).  They are remapped to a contiguous integer range ``0..n-1`` and
    the mapping is stored in :attr:`node_id_map`.

    Parameters
    ----------
    source : str or pd.DataFrame
        Path to a CSV file or a DataFrame with at least two columns
        containing source and target node identifiers.
    source_col : str
        Column name for the source node.  Default is ``"source"``.
    target_col : str
        Column name for the target node.  Default is ``"target"``.
    **kwargs
        Additional keyword arguments forwarded to the constructor (e.g.
        ``keep_lcc``).

    Returns
    -------
    SpatialGraph
        A new instance with *adjacency_matrix* and *node_id_map* populated.
        *positions*, *edge_metadata*, and *node_metadata* are all ``None``
        — no ground truth is available.

    Raises
    ------
    KeyError
        If *source_col* or *target_col* are not present in the data.
    ValueError
        If any node identifier is not an integer.  Convert non-integer IDs
        before calling this method, e.g.
        ``df["source"] = df["source"].astype(int)``.

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
    >>> sn = SpatialGraph.from_edge_list(df)
    >>> sn.has_ground_truth
    False
    >>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")
    """
    if isinstance(source, str):
        df = pd.read_csv(source)
    else:
        df = source

    all_ids = set(df[source_col].tolist()).union(df[target_col].tolist())
    non_int = [v for v in all_ids if not isinstance(v, (int, np.integer))]
    if non_int:
        sample = non_int[:5]
        raise ValueError(
            f"Node identifiers must be integers. "
            f"Found non-integer values: {sample!r}. "
            "Convert them first, e.g.: "
            f"df['{source_col}'] = df['{source_col}'].astype(int)"
        )

    node_ids = sorted(all_ids)
    id_map = {orig: i for i, orig in enumerate(node_ids)}

    n = len(node_ids)
    rows = np.array([id_map[v] for v in df[source_col]], dtype=int)
    cols = np.array([id_map[v] for v in df[target_col]], dtype=int)
    data = np.ones(len(rows), dtype=float)

    adj = scipy.sparse.csr_matrix(
        (
            np.concatenate([data, data]),
            (np.concatenate([rows, cols]), np.concatenate([cols, rows])),
        ),
        shape=(n, n),
    )

    return cls(adjacency_matrix=adj, node_id_map=id_map, **kwargs)

from_positions(source, **kwargs) classmethod

Construct a SpatialGraph from a point-cloud (no edges).

The adjacency matrix is initialised as an empty n × n matrix. Use :func:spatial_graph_algorithms.simulate.generate if you want edges built automatically from the positions.

Parameters:

Name Type Description Default
source str, pd.DataFrame, or np.ndarray

Node coordinates. A CSV path or DataFrame is converted to a float array; numeric columns are used as spatial dimensions.

required
**kwargs Any

Additional keyword arguments forwarded to the constructor. Note that keep_lcc defaults to False here because an empty graph has no connected components.

{}

Returns:

Type Description
SpatialGraph

A new instance with positions set and an empty adjacency matrix.

Examples:

>>> import numpy as np
>>> pts = np.random.default_rng(0).random((50, 2))
>>> sn = SpatialGraph.from_positions(pts)
>>> sn.positions.shape
(50, 2)
Source code in src/spatial_graph_algorithms/network.py
@classmethod
def from_positions(
    cls,
    source: str | pd.DataFrame | np.ndarray,
    **kwargs: Any,
) -> SpatialGraph:
    """Construct a SpatialGraph from a point-cloud (no edges).

    The adjacency matrix is initialised as an empty *n × n* matrix.  Use
    :func:`spatial_graph_algorithms.simulate.generate` if you want edges built
    automatically from the positions.

    Parameters
    ----------
    source : str, pd.DataFrame, or np.ndarray
        Node coordinates.  A CSV path or DataFrame is converted to a float
        array; numeric columns are used as spatial dimensions.
    **kwargs
        Additional keyword arguments forwarded to the constructor.  Note
        that *keep_lcc* defaults to ``False`` here because an empty graph
        has no connected components.

    Returns
    -------
    SpatialGraph
        A new instance with *positions* set and an empty adjacency matrix.

    Examples
    --------
    >>> import numpy as np
    >>> pts = np.random.default_rng(0).random((50, 2))
    >>> sn = SpatialGraph.from_positions(pts)
    >>> sn.positions.shape
    (50, 2)
    """
    if isinstance(source, str):
        positions = pd.read_csv(source).values.astype(float)
    elif isinstance(source, pd.DataFrame):
        positions = source.values.astype(float)
    else:
        positions = np.asarray(source, dtype=float)

    n = positions.shape[0]
    adj = scipy.sparse.csr_matrix((n, n))
    id_map = {i: i for i in range(n)}

    keep_lcc = kwargs.pop("keep_lcc", False)
    return cls(
        adjacency_matrix=adj,
        positions=positions,
        node_id_map=id_map,
        keep_lcc=keep_lcc,
        **kwargs,
    )

to_edge_dataframe()

Export unique edges as a DataFrame.

Returns:

Type Description
DataFrame

Two-column DataFrame with integer columns source and target. Only the upper-triangle edges are returned (each edge once).

Source code in src/spatial_graph_algorithms/network.py
def to_edge_dataframe(self) -> pd.DataFrame:
    """Export unique edges as a DataFrame.

    Returns
    -------
    pd.DataFrame
        Two-column DataFrame with integer columns ``source`` and ``target``.
        Only the upper-triangle edges are returned (each edge once).
    """
    coo = self._adjacency_matrix.tocoo()
    mask = coo.row < coo.col
    return pd.DataFrame({"source": coo.row[mask], "target": coo.col[mask]})

to_positions_dataframe()

Export node positions as a DataFrame.

Column names are x, y for 2-D and x, y, z for 3-D. Higher-dimensional arrays use x0, x1, ... notation.

Returns:

Type Description
DataFrame

One row per node, one column per spatial dimension.

Raises:

Type Description
ValueError

If :attr:positions is None.

Source code in src/spatial_graph_algorithms/network.py
def to_positions_dataframe(self) -> pd.DataFrame:
    """Export node positions as a DataFrame.

    Column names are ``x``, ``y`` for 2-D and ``x``, ``y``, ``z`` for 3-D.
    Higher-dimensional arrays use ``x0``, ``x1``, ... notation.

    Returns
    -------
    pd.DataFrame
        One row per node, one column per spatial dimension.

    Raises
    ------
    ValueError
        If :attr:`positions` is ``None``.
    """
    if self.positions is None:
        raise ValueError("positions is None — nothing to export")

    dim = self.positions.shape[1]
    if dim == 2:
        cols = ["x", "y"]
    elif dim == 3:
        cols = ["x", "y", "z"]
    else:
        cols = [f"x{i}" for i in range(dim)]

    return pd.DataFrame(self.positions, columns=cols)

graph_report(*, plot=True, figsize=(4.5, 3.0), with_all_pairs=False, bins=30, max_edge_sample=50000, max_pair_sample=200000, seed=None)

Return a human-readable summary of the graph and optionally show plots.

When plot is True, two figures are displayed:

  • Degree distribution — histogram of node degrees with mean and median marked.
  • Edge length histogram — requires positions. With with_all_pairs enabled, overlays the all-pairs distance distribution as a reference.

Parameters:

Name Type Description Default
plot bool

If True (default), render the degree distribution and, when positions are available, the edge-length histogram.

True
figsize tuple of float

Figure size (width, height) in inches, applied to both plots.

(4.5, 3.0)
with_all_pairs bool

When True, overlay the all-pairs distance reference line on the edge-length histogram. Ignored when plot is False or positions are unavailable.

False
bins int

Number of bins for the edge-length histogram.

30
max_edge_sample int or None

Maximum edges used for the edge-length histogram; larger graphs are sampled. None disables sampling.

50000
max_pair_sample int

Maximum all-pair distances sampled for the reference line.

200000
seed int or None

Random seed for reproducible sampling.

None

Returns:

Type Description
str

Multi-line summary of basic graph statistics.

Source code in src/spatial_graph_algorithms/network.py
def graph_report(
    self,
    *,
    plot: bool = True,
    figsize: tuple = (4.5, 3.0),
    with_all_pairs: bool = False,
    bins: int = 30,
    max_edge_sample: int | None = 50_000,
    max_pair_sample: int = 200_000,
    seed: int | None = None,
) -> str:
    """Return a human-readable summary of the graph and optionally show plots.

    When *plot* is ``True``, two figures are displayed:

    - **Degree distribution** — histogram of node degrees with mean and
      median marked.
    - **Edge length histogram** — requires *positions*.  With
      *with_all_pairs* enabled, overlays the all-pairs distance
      distribution as a reference.

    Parameters
    ----------
    plot : bool
        If ``True`` (default), render the degree distribution and, when
        positions are available, the edge-length histogram.
    figsize : tuple of float
        Figure size ``(width, height)`` in inches, applied to both plots.
    with_all_pairs : bool
        When ``True``, overlay the all-pairs distance reference line on
        the edge-length histogram.  Ignored when *plot* is ``False`` or
        positions are unavailable.
    bins : int
        Number of bins for the edge-length histogram.
    max_edge_sample : int or None
        Maximum edges used for the edge-length histogram; larger graphs
        are sampled.  ``None`` disables sampling.
    max_pair_sample : int
        Maximum all-pair distances sampled for the reference line.
    seed : int or None
        Random seed for reproducible sampling.

    Returns
    -------
    str
        Multi-line summary of basic graph statistics.
    """
    n = self._adjacency_matrix.shape[0]
    e = self._adjacency_matrix.nnz // 2
    density = e / (n * (n - 1) / 2) if n > 1 else 0
    gt = self.has_ground_truth
    rec = self.reconstructed_positions is not None
    bipartite = nx.is_bipartite(self.graph)

    if plot:
        from spatial_graph_algorithms.plot import (
            plot_degree_distribution,
            plot_edge_length_histogram,
        )
        plot_degree_distribution(self, figsize=figsize)
        if self.positions is not None:
            plot_edge_length_histogram(
                self,
                figsize=figsize,
                bins=bins,
                with_all_pairs=with_all_pairs,
                max_edge_sample=max_edge_sample,
                max_pair_sample=max_pair_sample,
                seed=seed,
            )

    return (
        f"SpatialGraph Report:\n"
        f"Nodes: {n}\n"
        f"Edges: {e}\n"
        f"Density: {density:.4f}\n"
        f"Bipartite: {bipartite}\n"
        f"Has Ground Truth: {gt}\n"
        f"Has Reconstructed Positions: {rec}"
    )

copy()

Return a deep copy of this SpatialGraph.

The copy is constructed with keep_lcc=False because LCC extraction has already been applied to the source object.

Returns:

Type Description
SpatialGraph

Independent copy with all arrays and DataFrames deep-copied.

Source code in src/spatial_graph_algorithms/network.py
def copy(self) -> SpatialGraph:
    """Return a deep copy of this SpatialGraph.

    The copy is constructed with ``keep_lcc=False`` because LCC extraction
    has already been applied to the source object.

    Returns
    -------
    SpatialGraph
        Independent copy with all arrays and DataFrames deep-copied.
    """
    return SpatialGraph(
        adjacency_matrix=self._adjacency_matrix.copy(),
        positions=None if self.positions is None else self.positions.copy(),
        reconstructed_positions=(
            None
            if self.reconstructed_positions is None
            else self.reconstructed_positions.copy()
        ),
        node_metadata=(
            None if self.node_metadata is None else self.node_metadata.copy(deep=True)
        ),
        edge_metadata=(
            None if self.edge_metadata is None else self.edge_metadata.copy(deep=True)
        ),
        node_id_map=None if self.node_id_map is None else dict(self.node_id_map),
        source_indices=None if self.source_indices is None else self.source_indices.copy(),
        keep_lcc=False,
    )

subgraph(node_indices)

Return a new SpatialGraph restricted to node_indices.

All metadata fields are sliced consistently and edge indices are remapped to the new contiguous range 0..len(node_indices)-1. This covers both one-off node selection and repeated LCC re-extraction after pipeline steps that change graph topology.

Parameters:

Name Type Description Default
node_indices array-like of int

Indices of nodes to keep, in any order. Duplicates are silently deduplicated; order is preserved after deduplication.

required

Returns:

Type Description
SpatialGraph

New instance with keep_lcc=False. source_indices is composed with the parent's source_indices so it always refers back to the original graph's node numbering.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, seed=0)
>>> sg_sub = sg.subgraph(range(20))
>>> sg_sub.n_nodes
20
Source code in src/spatial_graph_algorithms/network.py
def subgraph(self, node_indices: np.ndarray | list[int]) -> SpatialGraph:
    """Return a new SpatialGraph restricted to *node_indices*.

    All metadata fields are sliced consistently and edge indices are
    remapped to the new contiguous range ``0..len(node_indices)-1``.
    This covers both one-off node selection and repeated LCC re-extraction
    after pipeline steps that change graph topology.

    Parameters
    ----------
    node_indices : array-like of int
        Indices of nodes to keep, in any order.  Duplicates are silently
        deduplicated; order is preserved after deduplication.

    Returns
    -------
    SpatialGraph
        New instance with ``keep_lcc=False``.  ``source_indices`` is
        composed with the parent's ``source_indices`` so it always refers
        back to the original graph's node numbering.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> sg = generate(n=100, seed=0)
    >>> sg_sub = sg.subgraph(range(20))
    >>> sg_sub.n_nodes
    20
    """
    keep = np.asarray(list(dict.fromkeys(int(i) for i in node_indices)), dtype=np.intp)
    remap = {int(old): new for new, old in enumerate(keep)}

    new_adj = self._adjacency_matrix[keep][:, keep].tocsr()

    new_positions = None if self.positions is None else self.positions[keep]
    new_rec = (
        None
        if self.reconstructed_positions is None
        else self.reconstructed_positions[keep]
    )
    new_node_meta = (
        None
        if self.node_metadata is None
        else self.node_metadata.iloc[keep].reset_index(drop=True)
    )

    new_edge_meta = None
    if (
        self.edge_metadata is not None
        and isinstance(self.edge_metadata, pd.DataFrame)
        and {"source", "target"}.issubset(self.edge_metadata.columns)
    ):
        keep_set = set(keep.tolist())
        mask = (
            self.edge_metadata["source"].isin(keep_set)
            & self.edge_metadata["target"].isin(keep_set)
        )
        meta = self.edge_metadata.loc[mask].copy()
        meta["source"] = meta["source"].map(remap)
        meta["target"] = meta["target"].map(remap)
        new_edge_meta = meta.reset_index(drop=True)

    new_node_id_map = None
    if self.node_id_map is not None:
        inverse = {v: k for k, v in self.node_id_map.items()}
        new_node_id_map = {
            inverse[int(old)]: new for new, old in enumerate(keep)
        }

    base = self.source_indices if self.source_indices is not None else np.arange(
        self._adjacency_matrix.shape[0]
    )
    new_source_indices = base[keep]

    return SpatialGraph(
        adjacency_matrix=new_adj,
        positions=new_positions,
        reconstructed_positions=new_rec,
        node_metadata=new_node_meta,
        edge_metadata=new_edge_meta,
        node_id_map=new_node_id_map,
        source_indices=new_source_indices,
        keep_lcc=False,
    )

lcc_subgraph()

Return a new SpatialGraph containing only the largest connected component.

Unlike the keep_lcc=True constructor option, this method can be called at any point in the pipeline — after denoising, BFS sampling, or manual edge filtering — without discarding metadata.

Returns:

Type Description
SpatialGraph

New instance restricted to the LCC, with all metadata fields filtered and remapped consistently.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, seed=0, keep_lcc=False)
>>> sg_lcc = sg.lcc_subgraph()
>>> sg_lcc.n_connected_components
1
Source code in src/spatial_graph_algorithms/network.py
def lcc_subgraph(self) -> SpatialGraph:
    """Return a new SpatialGraph containing only the largest connected component.

    Unlike the ``keep_lcc=True`` constructor option, this method can be
    called at any point in the pipeline — after denoising, BFS sampling,
    or manual edge filtering — without discarding metadata.

    Returns
    -------
    SpatialGraph
        New instance restricted to the LCC, with all metadata fields
        filtered and remapped consistently.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> sg = generate(n=100, seed=0, keep_lcc=False)
    >>> sg_lcc = sg.lcc_subgraph()
    >>> sg_lcc.n_connected_components
    1
    """
    n_components, labels = scipy.sparse.csgraph.connected_components(
        self._adjacency_matrix, directed=False
    )
    if n_components <= 1:
        return self.copy()
    lcc_label = int(np.argmax(np.bincount(labels)))
    keep_idx = np.where(labels == lcc_label)[0]
    return self.subgraph(keep_idx)

replace(**changes)

Return a new SpatialGraph with selected fields replaced.

Unlike :meth:copy, the adjacency matrix is shared by reference when not in changes, making pipeline steps that only attach positions or metadata essentially free in terms of adjacency allocation. The NetworkX cache is also shared when the adjacency matrix is unchanged.

Passing a adjacency_matrix that is a different object from the current one emits :class:AdjacencyReplaceWarning to surface the allocation cost on large graphs.

Parameters:

Name Type Description Default
**changes Any

Fields to override. Valid keys: adjacency_matrix, positions, reconstructed_positions, node_metadata, edge_metadata, node_id_map, source_indices.

{}

Returns:

Type Description
SpatialGraph

New instance. Shares _adjacency_matrix and _nx_cache with the original when adjacency_matrix is not in changes.

Warns:

Type Description
AdjacencyReplaceWarning

When adjacency_matrix is replaced with a different object.

Examples:

>>> sn_rec = sn.replace(reconstructed_positions=coords)
>>> sn_rec.adjacency_matrix is sn.adjacency_matrix
True
Source code in src/spatial_graph_algorithms/network.py
def replace(self, **changes: Any) -> SpatialGraph:
    """Return a new SpatialGraph with selected fields replaced.

    Unlike :meth:`copy`, the adjacency matrix is shared by reference when
    not in *changes*, making pipeline steps that only attach positions or
    metadata essentially free in terms of adjacency allocation.  The
    NetworkX cache is also shared when the adjacency matrix is unchanged.

    Passing a ``adjacency_matrix`` that is a different object from the
    current one emits :class:`AdjacencyReplaceWarning` to surface the
    allocation cost on large graphs.

    Parameters
    ----------
    **changes
        Fields to override.  Valid keys: ``adjacency_matrix``,
        ``positions``, ``reconstructed_positions``, ``node_metadata``,
        ``edge_metadata``, ``node_id_map``, ``source_indices``.

    Returns
    -------
    SpatialGraph
        New instance.  Shares ``_adjacency_matrix`` and ``_nx_cache``
        with the original when ``adjacency_matrix`` is not in *changes*.

    Warns
    -----
    AdjacencyReplaceWarning
        When *adjacency_matrix* is replaced with a different object.

    Examples
    --------
    >>> sn_rec = sn.replace(reconstructed_positions=coords)
    >>> sn_rec.adjacency_matrix is sn.adjacency_matrix
    True
    """
    new_adj = changes.pop("adjacency_matrix", _MISSING)

    if new_adj is not _MISSING and new_adj is not self._adjacency_matrix:
        warnings.warn(
            f"replace() is allocating a new adjacency_matrix copy "
            f"({self._adjacency_matrix.shape[0]} nodes). "
            "Reuse the existing matrix if the topology has not changed.",
            AdjacencyReplaceWarning,
            stacklevel=2,
        )
        return SpatialGraph(
            adjacency_matrix=new_adj,
            positions=changes.get("positions", self.positions),
            reconstructed_positions=changes.get(
                "reconstructed_positions", self.reconstructed_positions
            ),
            node_metadata=changes.get("node_metadata", self.node_metadata),
            edge_metadata=changes.get("edge_metadata", self.edge_metadata),
            node_id_map=changes.get("node_id_map", self.node_id_map),
            source_indices=changes.get("source_indices", self.source_indices),
            keep_lcc=False,
        )

    # Fast path: adjacency unchanged — share reference, bypass __post_init__
    inst = object.__new__(SpatialGraph)
    object.__setattr__(inst, "_adjacency_matrix", self._adjacency_matrix)
    inst._nx_cache = self._nx_cache
    inst._degree_cache = self._degree_cache
    inst._mean_shortest_path_cache = self._mean_shortest_path_cache
    inst._shortest_path_dist_cache = self._shortest_path_dist_cache
    inst._n_components_cache = self._n_components_cache
    inst._lcc_fraction_cache = self._lcc_fraction_cache
    inst.positions = changes.get("positions", self.positions)
    inst.reconstructed_positions = changes.get(
        "reconstructed_positions", self.reconstructed_positions
    )
    inst.node_metadata = changes.get("node_metadata", self.node_metadata)
    inst.edge_metadata = changes.get("edge_metadata", self.edge_metadata)
    inst.node_id_map = changes.get("node_id_map", self.node_id_map)
    inst.source_indices = changes.get("source_indices", self.source_indices)
    inst.keep_lcc = False
    return inst

spatial_graph_algorithms.network.to_igraph(sn)

Convert a SpatialGraph to an undirected igraph.Graph.

Parameters:

Name Type Description Default
sn SpatialGraph

Source graph. Only the adjacency matrix is used; metadata is not transferred.

required

Returns:

Type Description
Graph

Undirected unweighted graph with n vertices and the same edge set as sn.adjacency_matrix.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> from spatial_graph_algorithms.network import to_igraph
>>> sn = generate(n=100, seed=0)
>>> g = to_igraph(sn)
>>> g.vcount()
100
Source code in src/spatial_graph_algorithms/network.py
def to_igraph(sn: SpatialGraph) -> igraph.Graph:
    """Convert a SpatialGraph to an undirected igraph.Graph.

    Parameters
    ----------
    sn : SpatialGraph
        Source graph.  Only the adjacency matrix is used; metadata is not
        transferred.

    Returns
    -------
    igraph.Graph
        Undirected unweighted graph with ``n`` vertices and the same edge set as
        *sn.adjacency_matrix*.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> from spatial_graph_algorithms.network import to_igraph
    >>> sn = generate(n=100, seed=0)
    >>> g = to_igraph(sn)
    >>> g.vcount()
    100
    """
    adj = sn._adjacency_matrix.tocoo()
    edges = [(int(r), int(c)) for r, c in zip(adj.row, adj.col) if r < c]
    return igraph.Graph(n=adj.shape[0], edges=edges)