SpatialGraph

The central data container for the entire pipeline. Every function accepts and returns SpatialGraph objects.

Constructor patterns

Method	Use when
`SpatialGraph(adjacency_matrix, ...)`	You already have a scipy sparse matrix
`SpatialGraph.from_edge_list(path_or_df)`	Loading from a CSV edge list
`SpatialGraph.from_positions(array)`	Starting from a point cloud with no edges

Key attributes

Attribute	Type	Description
`adjacency_matrix`	`scipy.sparse.csr_matrix`	Symmetric adjacency, zero diagonal
`positions`	`np.ndarray` or `None`	Ground-truth coordinates `(n, dim)`
`reconstructed_positions`	`np.ndarray` or `None`	Set by `reconstruct()`
`edge_metadata`	`pd.DataFrame` or `None`	Includes `is_false` column when simulated
`node_metadata`	`pd.DataFrame` or `None`	Per-node attributes
`graph`	`nx.Graph`	Lazy NetworkX view (cached)

API Reference

`spatial_graph_algorithms.network.SpatialGraph` `dataclass`

Core graph and coordinate container for spatial network analysis.

All pipeline functions accept and return SpatialGraph objects. The adjacency matrix is stored internally as a CSR sparse matrix with the diagonal forced to zero. When keep_lcc is True (the default), only the largest connected component is retained on construction.

Parameters:

Name	Type	Description	Default
`positions`	`ndarray`	Ground-truth node coordinates, shape `(n_nodes, dim)`.	`None`
`reconstructed_positions`	`ndarray`	Coordinates recovered by a reconstruction algorithm, same shape as positions. Set by :func:`spatial_graph_algorithms.reconstruct.reconstruct`.	`None`
`node_metadata`	`DataFrame`	Per-node attributes. Row i corresponds to node i.	`None`
`edge_metadata`	`DataFrame`	Per-edge attributes with columns `source`, `target`, and optionally `is_false` (bool) when the graph was created by :func:`spatial_graph_algorithms.simulate.generate`.	`None`
`node_id_map`	`dict`	Mapping from original node identifiers to integer indices `0..n-1`. Built automatically when using :meth:`from_edge_list`.	`None`
`keep_lcc`	`bool`	If `True` (default), only the largest connected component is kept on construction. Set to `False` when you know the graph is connected or want to preserve all nodes.	`True`

Attributes:

Name	Type	Description
`adjacency_matrix`	`csr_matrix`	Symmetric, unweighted adjacency matrix. Any format accepted by `scipy.sparse.csr_matrix` is converted automatically. Self-loops are removed silently.
`n_nodes`	`int`	Number of nodes (always cheap — direct from adjacency shape).
`n_edges`	`int`	Number of undirected edges (always cheap — `nnz // 2`).
`edge_density`	`float`	Fraction of all possible edges present.
`degree_sequence`	`ndarray`	Per-node degree array; computed once on first access and cached. Cache is cleared automatically when the adjacency matrix changes.
`mean_degree`	`float`	Mean of :attr:`degree_sequence`.
`degree_distribution`	`ndarray`	Counts per degree value; derived from :attr:`degree_sequence`.
`mean_shortest_path`	`float or None`	Mean pairwise shortest-path length. `None` until populated by a graph-structure computation (see :mod:`spatial_graph_algorithms.metrics`).
`shortest_path_distribution`	`ndarray or None`	Histogram of pairwise shortest-path lengths. `None` until populated alongside :attr:`mean_shortest_path`.

Examples:

Build from a scipy sparse matrix:

>>> import scipy.sparse
>>> adj = scipy.sparse.eye(3, format="csr") * 0  # 3 isolated nodes
>>> sn = SpatialGraph(adjacency_matrix=adj, keep_lcc=False)
>>> sn.adjacency_matrix.shape
(3, 3)

Source code in src/spatial_graph_algorithms/network.py

@dataclass(eq=False)
class SpatialGraph:
    """Core graph and coordinate container for spatial network analysis.

    All pipeline functions accept and return ``SpatialGraph`` objects.  The
    adjacency matrix is stored internally as a CSR sparse matrix with the
    diagonal forced to zero.  When *keep_lcc* is ``True`` (the default), only
    the largest connected component is retained on construction.

    Parameters
    ----------
    positions : np.ndarray, optional
        Ground-truth node coordinates, shape ``(n_nodes, dim)``.
    reconstructed_positions : np.ndarray, optional
        Coordinates recovered by a reconstruction algorithm, same shape as
        *positions*.  Set by :func:`spatial_graph_algorithms.reconstruct.reconstruct`.
    node_metadata : pd.DataFrame, optional
        Per-node attributes.  Row *i* corresponds to node *i*.
    edge_metadata : pd.DataFrame, optional
        Per-edge attributes with columns ``source``, ``target``, and optionally
        ``is_false`` (bool) when the graph was created by
        :func:`spatial_graph_algorithms.simulate.generate`.
    node_id_map : dict, optional
        Mapping from original node identifiers to integer indices ``0..n-1``.
        Built automatically when using :meth:`from_edge_list`.
    keep_lcc : bool
        If ``True`` (default), only the largest connected component is kept on
        construction.  Set to ``False`` when you know the graph is connected or
        want to preserve all nodes.

    Attributes
    ----------
    adjacency_matrix : scipy.sparse.csr_matrix
        Symmetric, unweighted adjacency matrix.  Any format accepted by
        ``scipy.sparse.csr_matrix`` is converted automatically.  Self-loops are
        removed silently.
    n_nodes : int
        Number of nodes (always cheap — direct from adjacency shape).
    n_edges : int
        Number of undirected edges (always cheap — ``nnz // 2``).
    edge_density : float
        Fraction of all possible edges present.
    degree_sequence : np.ndarray
        Per-node degree array; computed once on first access and cached.
        Cache is cleared automatically when the adjacency matrix changes.
    mean_degree : float
        Mean of :attr:`degree_sequence`.
    degree_distribution : np.ndarray
        Counts per degree value; derived from :attr:`degree_sequence`.
    mean_shortest_path : float or None
        Mean pairwise shortest-path length.  ``None`` until populated by a
        graph-structure computation (see :mod:`spatial_graph_algorithms.metrics`).
    shortest_path_distribution : np.ndarray or None
        Histogram of pairwise shortest-path lengths.  ``None`` until populated
        alongside :attr:`mean_shortest_path`.

    Examples
    --------
    Build from a scipy sparse matrix:

    >>> import scipy.sparse
    >>> adj = scipy.sparse.eye(3, format="csr") * 0  # 3 isolated nodes
    >>> sn = SpatialGraph(adjacency_matrix=adj, keep_lcc=False)
    >>> sn.adjacency_matrix.shape
    (3, 3)
    """

    adjacency_matrix: scipy.sparse.csr_matrix
    positions: np.ndarray | None = None
    reconstructed_positions: np.ndarray | None = None
    node_metadata: pd.DataFrame | None = None
    edge_metadata: pd.DataFrame | None = None
    node_id_map: dict[Any, int] | None = None
    source_indices: np.ndarray | None = None
    keep_lcc: bool = True
    _nx_cache: nx.Graph | None = None
    _degree_cache: np.ndarray | None = None
    _mean_shortest_path_cache: float | None = None
    _shortest_path_dist_cache: np.ndarray | None = None
    _n_components_cache: int | None = None
    _lcc_fraction_cache: float | None = None

    def __post_init__(self) -> None:
        # Move to backing field once, then route all future writes via property setter.
        initial_adj = self._normalize_adjacency(self.adjacency_matrix)
        object.__setattr__(self, "_adjacency_matrix", initial_adj)
        self.__dict__.pop("adjacency_matrix", None)

        if self.node_id_map is None:
            n = self._adjacency_matrix.shape[0]
            self.node_id_map = {i: i for i in range(n)}

        if (
            self.positions is not None
            and self.positions.shape[0] != self._adjacency_matrix.shape[0]
        ):
            raise ValueError(
                "positions row count must match adjacency_matrix shape[0] "
                f"({self.positions.shape[0]} != {self._adjacency_matrix.shape[0]})"
            )

        if self.keep_lcc:
            self._extract_lcc()

    @staticmethod
    def _normalize_adjacency(value: scipy.sparse.csr_matrix) -> scipy.sparse.csr_matrix:
        adj = scipy.sparse.csr_matrix(value).copy()
        adj.setdiag(0)
        adj.eliminate_zeros()
        return adj

    @property
    def adjacency_matrix(self) -> scipy.sparse.csr_matrix:
        """CSR adjacency matrix with zero diagonal and no duplicate entries."""
        return self._adjacency_matrix

    @adjacency_matrix.setter
    def adjacency_matrix(self, value: scipy.sparse.csr_matrix) -> None:
        self._adjacency_matrix = self._normalize_adjacency(value)
        self._nx_cache = None
        self._degree_cache = None
        self._mean_shortest_path_cache = None
        self._shortest_path_dist_cache = None
        self._n_components_cache = None
        self._lcc_fraction_cache = None

    @property
    def graph(self) -> nx.Graph:
        """NetworkX undirected graph built lazily from the adjacency matrix.

        The result is cached; assigning a new *adjacency_matrix* invalidates
        the cache automatically.

        Returns
        -------
        nx.Graph
            Undirected graph with integer node labels ``0..n-1``.
        """
        if self._nx_cache is None:
            self._nx_cache = nx.from_scipy_sparse_array(self._adjacency_matrix)
        return self._nx_cache

    @property
    def n_nodes(self) -> int:
        """Number of nodes."""
        return self._adjacency_matrix.shape[0]

    @property
    def n_edges(self) -> int:
        """Number of undirected edges (no self-loops)."""
        return self._adjacency_matrix.nnz // 2

    @property
    def edge_density(self) -> float:
        """Fraction of all possible edges that are present.

        Returns 0.0 for graphs with fewer than 2 nodes.
        """
        n = self.n_nodes
        return 2.0 * self.n_edges / (n * (n - 1)) if n > 1 else 0.0

    @property
    def degree_sequence(self) -> np.ndarray:
        """Degree of every node as a 1-D integer array, computed once and cached.

        Returns
        -------
        np.ndarray
            Shape ``(n_nodes,)``, ``dtype=int``.
        """
        if self._degree_cache is None:
            self._degree_cache = (
                np.asarray(self._adjacency_matrix.sum(axis=1)).ravel().astype(int)
            )
        return self._degree_cache

    @property
    def mean_degree(self) -> float:
        """Mean node degree."""
        return float(self.degree_sequence.mean())

    @property
    def degree_distribution(self) -> np.ndarray:
        """Counts per degree value: index *k* gives the number of degree-*k* nodes.

        Returns
        -------
        np.ndarray
            Shape ``(max_degree + 1,)``, integer counts.
        """
        return np.bincount(self.degree_sequence)

    @property
    def mean_shortest_path(self) -> float | None:
        """Mean pairwise shortest-path length, or ``None`` if not yet computed.

        Populated automatically when a graph-structure computation (e.g. in
        :mod:`spatial_graph_algorithms.metrics`) runs and caches the result, or
        by direct assignment to ``sn._mean_shortest_path_cache``.

        TODO: implement lazy auto-computation via
        ``scipy.sparse.csgraph.shortest_path`` (O(n²)–O(n³); too expensive to
        trigger silently on large graphs).
        """
        return self._mean_shortest_path_cache

    @property
    def shortest_path_distribution(self) -> np.ndarray | None:
        """Histogram of pairwise shortest-path lengths, or ``None`` if not yet computed.

        Index *k* gives the count of node pairs at shortest-path distance *k*.
        Populated alongside :attr:`mean_shortest_path`.

        TODO: implement together with ``mean_shortest_path``.
        """
        return self._shortest_path_dist_cache

    def _extract_lcc(self) -> None:
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        if n_components <= 1:
            # Cache the result we just computed — free since we already have it.
            self._n_components_cache = int(n_components)
            n = self._adjacency_matrix.shape[0]
            self._lcc_fraction_cache = 1.0 if n > 0 else 0.0
            return

        counts = np.bincount(labels)
        lcc_label = int(np.argmax(counts))
        keep_idx = np.where(labels == lcc_label)[0]

        warnings.warn(
            f"Found {n_components} connected components; "
            f"keeping largest with {len(keep_idx)} nodes.",
            UserWarning,
            stacklevel=2,
        )

        self._adjacency_matrix = self._adjacency_matrix[keep_idx][:, keep_idx].tocsr()
        self._nx_cache = None
        self._degree_cache = None
        self._mean_shortest_path_cache = None
        self._shortest_path_dist_cache = None
        # After extraction the graph is connected by construction.
        self._n_components_cache = 1
        self._lcc_fraction_cache = 1.0

        if self.positions is not None:
            self.positions = self.positions[keep_idx]
        if self.reconstructed_positions is not None:
            self.reconstructed_positions = self.reconstructed_positions[keep_idx]
        if self.node_metadata is not None:
            self.node_metadata = self.node_metadata.iloc[keep_idx].reset_index(drop=True)

        if self.node_id_map is not None:
            inverse = {v: k for k, v in self.node_id_map.items()}
            kept_original = [inverse[int(i)] for i in keep_idx]
            self.node_id_map = {orig: new_i for new_i, orig in enumerate(kept_original)}

        # Compose with any prior filtering so source_indices always points to the
        # root graph, regardless of how many extraction steps have been chained.
        n_before = labels.shape[0]
        base = self.source_indices if self.source_indices is not None else np.arange(n_before)
        self.source_indices = base[keep_idx]

        if (
            self.edge_metadata is not None
            and isinstance(self.edge_metadata, pd.DataFrame)
            and {"source", "target"}.issubset(self.edge_metadata.columns)
        ):
            keep_set = set(keep_idx.tolist())
            mask = (
                self.edge_metadata["source"].isin(keep_set)
                & self.edge_metadata["target"].isin(keep_set)
            )
            remap = {int(old): new for new, old in enumerate(keep_idx)}
            meta = self.edge_metadata.loc[mask].copy()
            meta["source"] = meta["source"].map(remap)
            meta["target"] = meta["target"].map(remap)
            self.edge_metadata = meta.reset_index(drop=True)

    @classmethod
    def from_edge_list(
        cls,
        source: str | pd.DataFrame,
        source_col: str = "source",
        target_col: str = "target",
        **kwargs: Any,
    ) -> SpatialGraph:
        """Construct a SpatialGraph from a CSV edge list or a DataFrame.

        This is the entry point for **observational (real) data**.  Only the
        topology is known, so :attr:`positions`, :attr:`edge_metadata`, and
        :attr:`node_metadata` are all ``None`` after construction.  Use
        :attr:`has_ground_truth` to check this programmatically.

        Node identifiers in the edge list can be arbitrary (strings, integers,
        etc.).  They are remapped to a contiguous integer range ``0..n-1`` and
        the mapping is stored in :attr:`node_id_map`.

        Parameters
        ----------
        source : str or pd.DataFrame
            Path to a CSV file or a DataFrame with at least two columns
            containing source and target node identifiers.
        source_col : str
            Column name for the source node.  Default is ``"source"``.
        target_col : str
            Column name for the target node.  Default is ``"target"``.
        **kwargs
            Additional keyword arguments forwarded to the constructor (e.g.
            ``keep_lcc``).

        Returns
        -------
        SpatialGraph
            A new instance with *adjacency_matrix* and *node_id_map* populated.
            *positions*, *edge_metadata*, and *node_metadata* are all ``None``
            — no ground truth is available.

        Raises
        ------
        KeyError
            If *source_col* or *target_col* are not present in the data.
        ValueError
            If any node identifier is not an integer.  Convert non-integer IDs
            before calling this method, e.g.
            ``df["source"] = df["source"].astype(int)``.

        Examples
        --------
        >>> import pandas as pd
        >>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
        >>> sn = SpatialGraph.from_edge_list(df)
        >>> sn.has_ground_truth
        False
        >>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")
        """
        if isinstance(source, str):
            df = pd.read_csv(source)
        else:
            df = source

        all_ids = set(df[source_col].tolist()).union(df[target_col].tolist())
        non_int = [v for v in all_ids if not isinstance(v, (int, np.integer))]
        if non_int:
            sample = non_int[:5]
            raise ValueError(
                f"Node identifiers must be integers. "
                f"Found non-integer values: {sample!r}. "
                "Convert them first, e.g.: "
                f"df['{source_col}'] = df['{source_col}'].astype(int)"
            )

        node_ids = sorted(all_ids)
        id_map = {orig: i for i, orig in enumerate(node_ids)}

        n = len(node_ids)
        rows = np.array([id_map[v] for v in df[source_col]], dtype=int)
        cols = np.array([id_map[v] for v in df[target_col]], dtype=int)
        data = np.ones(len(rows), dtype=float)

        adj = scipy.sparse.csr_matrix(
            (
                np.concatenate([data, data]),
                (np.concatenate([rows, cols]), np.concatenate([cols, rows])),
            ),
            shape=(n, n),
        )

        return cls(adjacency_matrix=adj, node_id_map=id_map, **kwargs)

    @classmethod
    def from_positions(
        cls,
        source: str | pd.DataFrame | np.ndarray,
        **kwargs: Any,
    ) -> SpatialGraph:
        """Construct a SpatialGraph from a point-cloud (no edges).

        The adjacency matrix is initialised as an empty *n × n* matrix.  Use
        :func:`spatial_graph_algorithms.simulate.generate` if you want edges built
        automatically from the positions.

        Parameters
        ----------
        source : str, pd.DataFrame, or np.ndarray
            Node coordinates.  A CSV path or DataFrame is converted to a float
            array; numeric columns are used as spatial dimensions.
        **kwargs
            Additional keyword arguments forwarded to the constructor.  Note
            that *keep_lcc* defaults to ``False`` here because an empty graph
            has no connected components.

        Returns
        -------
        SpatialGraph
            A new instance with *positions* set and an empty adjacency matrix.

        Examples
        --------
        >>> import numpy as np
        >>> pts = np.random.default_rng(0).random((50, 2))
        >>> sn = SpatialGraph.from_positions(pts)
        >>> sn.positions.shape
        (50, 2)
        """
        if isinstance(source, str):
            positions = pd.read_csv(source).values.astype(float)
        elif isinstance(source, pd.DataFrame):
            positions = source.values.astype(float)
        else:
            positions = np.asarray(source, dtype=float)

        n = positions.shape[0]
        adj = scipy.sparse.csr_matrix((n, n))
        id_map = {i: i for i in range(n)}

        keep_lcc = kwargs.pop("keep_lcc", False)
        return cls(
            adjacency_matrix=adj,
            positions=positions,
            node_id_map=id_map,
            keep_lcc=keep_lcc,
            **kwargs,
        )

    def to_edge_dataframe(self) -> pd.DataFrame:
        """Export unique edges as a DataFrame.

        Returns
        -------
        pd.DataFrame
            Two-column DataFrame with integer columns ``source`` and ``target``.
            Only the upper-triangle edges are returned (each edge once).
        """
        coo = self._adjacency_matrix.tocoo()
        mask = coo.row < coo.col
        return pd.DataFrame({"source": coo.row[mask], "target": coo.col[mask]})

    def to_positions_dataframe(self) -> pd.DataFrame:
        """Export node positions as a DataFrame.

        Column names are ``x``, ``y`` for 2-D and ``x``, ``y``, ``z`` for 3-D.
        Higher-dimensional arrays use ``x0``, ``x1``, ... notation.

        Returns
        -------
        pd.DataFrame
            One row per node, one column per spatial dimension.

        Raises
        ------
        ValueError
            If :attr:`positions` is ``None``.
        """
        if self.positions is None:
            raise ValueError("positions is None — nothing to export")

        dim = self.positions.shape[1]
        if dim == 2:
            cols = ["x", "y"]
        elif dim == 3:
            cols = ["x", "y", "z"]
        else:
            cols = [f"x{i}" for i in range(dim)]

        return pd.DataFrame(self.positions, columns=cols)

    @property
    def n_connected_components(self) -> int:
        """Number of connected components in the graph.

        Computed lazily on first access and cached.  For graphs constructed
        with ``keep_lcc=True`` (the default) this is always 1 after
        construction; graphs loaded with ``keep_lcc=False`` may have more.
        """
        if self._n_components_cache is None:
            self._compute_components()
        return self._n_components_cache  # type: ignore[return-value]

    @property
    def largest_component_fraction(self) -> float:
        """Fraction of nodes in the largest connected component.

        1.0 when the graph is fully connected.  Computed together with
        :attr:`n_connected_components` and cached on first access.
        """
        if self._lcc_fraction_cache is None:
            self._compute_components()
        return self._lcc_fraction_cache  # type: ignore[return-value]

    def _compute_components(self) -> None:
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        self._n_components_cache = int(n_components)
        n = self._adjacency_matrix.shape[0]
        if n > 0 and n_components > 0:
            lcc_size = int(np.bincount(labels).max())
            self._lcc_fraction_cache = lcc_size / n
        else:
            self._lcc_fraction_cache = 0.0

    @property
    def false_edge_fraction(self) -> float | None:
        """Fraction of edges marked as false (injected noise), or ``None``.

        Returns ``None`` when :attr:`edge_metadata` is absent or has no
        ``is_false`` column (e.g. real-data graphs loaded via
        :meth:`from_edge_list`).

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, false_edges_fraction=0.10, seed=0)
        >>> sg.false_edge_fraction > 0
        True
        """
        if self.edge_metadata is None or "is_false" not in self.edge_metadata.columns:
            return None
        return float(self.edge_metadata["is_false"].mean())

    @property
    def has_ground_truth(self) -> bool:
        """True when positions were set at construction (simulated or labelled data).

        Returns ``False`` for graphs loaded from a raw edge list where no
        ground-truth coordinates or false-edge labels are available.  Pipeline
        functions that require ground truth (e.g. :func:`metrics.evaluate` with
        ``positions`` comparison) can check this flag before running.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sn = generate(n=100, seed=0)
        >>> sn.has_ground_truth
        True
        >>> sn_real = SpatialGraph.from_edge_list(sn.to_edge_dataframe())
        >>> sn_real.has_ground_truth
        False
        """
        return self.positions is not None

    def __repr__(self) -> str:
        gt = self.has_ground_truth
        rec = self.reconstructed_positions is not None
        return (
            f"SpatialGraph(nodes={self.n_nodes}, edges={self.n_edges}, "
            f"has_ground_truth={gt}, reconstructed={rec})"
        )

    def graph_report(
        self,
        *,
        plot: bool = True,
        figsize: tuple = (4.5, 3.0),
        with_all_pairs: bool = False,
        bins: int = 30,
        max_edge_sample: int | None = 50_000,
        max_pair_sample: int = 200_000,
        seed: int | None = None,
    ) -> str:
        """Return a human-readable summary of the graph and optionally show plots.

        When *plot* is ``True``, two figures are displayed:

        - **Degree distribution** — histogram of node degrees with mean and
          median marked.
        - **Edge length histogram** — requires *positions*.  With
          *with_all_pairs* enabled, overlays the all-pairs distance
          distribution as a reference.

        Parameters
        ----------
        plot : bool
            If ``True`` (default), render the degree distribution and, when
            positions are available, the edge-length histogram.
        figsize : tuple of float
            Figure size ``(width, height)`` in inches, applied to both plots.
        with_all_pairs : bool
            When ``True``, overlay the all-pairs distance reference line on
            the edge-length histogram.  Ignored when *plot* is ``False`` or
            positions are unavailable.
        bins : int
            Number of bins for the edge-length histogram.
        max_edge_sample : int or None
            Maximum edges used for the edge-length histogram; larger graphs
            are sampled.  ``None`` disables sampling.
        max_pair_sample : int
            Maximum all-pair distances sampled for the reference line.
        seed : int or None
            Random seed for reproducible sampling.

        Returns
        -------
        str
            Multi-line summary of basic graph statistics.
        """
        n = self._adjacency_matrix.shape[0]
        e = self._adjacency_matrix.nnz // 2
        density = e / (n * (n - 1) / 2) if n > 1 else 0
        gt = self.has_ground_truth
        rec = self.reconstructed_positions is not None
        bipartite = nx.is_bipartite(self.graph)

        if plot:
            from spatial_graph_algorithms.plot import (
                plot_degree_distribution,
                plot_edge_length_histogram,
            )
            plot_degree_distribution(self, figsize=figsize)
            if self.positions is not None:
                plot_edge_length_histogram(
                    self,
                    figsize=figsize,
                    bins=bins,
                    with_all_pairs=with_all_pairs,
                    max_edge_sample=max_edge_sample,
                    max_pair_sample=max_pair_sample,
                    seed=seed,
                )

        return (
            f"SpatialGraph Report:\n"
            f"Nodes: {n}\n"
            f"Edges: {e}\n"
            f"Density: {density:.4f}\n"
            f"Bipartite: {bipartite}\n"
            f"Has Ground Truth: {gt}\n"
            f"Has Reconstructed Positions: {rec}"
        )

    def copy(self) -> SpatialGraph:
        """Return a deep copy of this SpatialGraph.

        The copy is constructed with ``keep_lcc=False`` because LCC extraction
        has already been applied to the source object.

        Returns
        -------
        SpatialGraph
            Independent copy with all arrays and DataFrames deep-copied.
        """
        return SpatialGraph(
            adjacency_matrix=self._adjacency_matrix.copy(),
            positions=None if self.positions is None else self.positions.copy(),
            reconstructed_positions=(
                None
                if self.reconstructed_positions is None
                else self.reconstructed_positions.copy()
            ),
            node_metadata=(
                None if self.node_metadata is None else self.node_metadata.copy(deep=True)
            ),
            edge_metadata=(
                None if self.edge_metadata is None else self.edge_metadata.copy(deep=True)
            ),
            node_id_map=None if self.node_id_map is None else dict(self.node_id_map),
            source_indices=None if self.source_indices is None else self.source_indices.copy(),
            keep_lcc=False,
        )

    def subgraph(self, node_indices: np.ndarray | list[int]) -> SpatialGraph:
        """Return a new SpatialGraph restricted to *node_indices*.

        All metadata fields are sliced consistently and edge indices are
        remapped to the new contiguous range ``0..len(node_indices)-1``.
        This covers both one-off node selection and repeated LCC re-extraction
        after pipeline steps that change graph topology.

        Parameters
        ----------
        node_indices : array-like of int
            Indices of nodes to keep, in any order.  Duplicates are silently
            deduplicated; order is preserved after deduplication.

        Returns
        -------
        SpatialGraph
            New instance with ``keep_lcc=False``.  ``source_indices`` is
            composed with the parent's ``source_indices`` so it always refers
            back to the original graph's node numbering.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, seed=0)
        >>> sg_sub = sg.subgraph(range(20))
        >>> sg_sub.n_nodes
        20
        """
        keep = np.asarray(list(dict.fromkeys(int(i) for i in node_indices)), dtype=np.intp)
        remap = {int(old): new for new, old in enumerate(keep)}

        new_adj = self._adjacency_matrix[keep][:, keep].tocsr()

        new_positions = None if self.positions is None else self.positions[keep]
        new_rec = (
            None
            if self.reconstructed_positions is None
            else self.reconstructed_positions[keep]
        )
        new_node_meta = (
            None
            if self.node_metadata is None
            else self.node_metadata.iloc[keep].reset_index(drop=True)
        )

        new_edge_meta = None
        if (
            self.edge_metadata is not None
            and isinstance(self.edge_metadata, pd.DataFrame)
            and {"source", "target"}.issubset(self.edge_metadata.columns)
        ):
            keep_set = set(keep.tolist())
            mask = (
                self.edge_metadata["source"].isin(keep_set)
                & self.edge_metadata["target"].isin(keep_set)
            )
            meta = self.edge_metadata.loc[mask].copy()
            meta["source"] = meta["source"].map(remap)
            meta["target"] = meta["target"].map(remap)
            new_edge_meta = meta.reset_index(drop=True)

        new_node_id_map = None
        if self.node_id_map is not None:
            inverse = {v: k for k, v in self.node_id_map.items()}
            new_node_id_map = {
                inverse[int(old)]: new for new, old in enumerate(keep)
            }

        base = self.source_indices if self.source_indices is not None else np.arange(
            self._adjacency_matrix.shape[0]
        )
        new_source_indices = base[keep]

        return SpatialGraph(
            adjacency_matrix=new_adj,
            positions=new_positions,
            reconstructed_positions=new_rec,
            node_metadata=new_node_meta,
            edge_metadata=new_edge_meta,
            node_id_map=new_node_id_map,
            source_indices=new_source_indices,
            keep_lcc=False,
        )

    def lcc_subgraph(self) -> SpatialGraph:
        """Return a new SpatialGraph containing only the largest connected component.

        Unlike the ``keep_lcc=True`` constructor option, this method can be
        called at any point in the pipeline — after denoising, BFS sampling,
        or manual edge filtering — without discarding metadata.

        Returns
        -------
        SpatialGraph
            New instance restricted to the LCC, with all metadata fields
            filtered and remapped consistently.

        Examples
        --------
        >>> from spatial_graph_algorithms.simulate import generate
        >>> sg = generate(n=100, seed=0, keep_lcc=False)
        >>> sg_lcc = sg.lcc_subgraph()
        >>> sg_lcc.n_connected_components
        1
        """
        n_components, labels = scipy.sparse.csgraph.connected_components(
            self._adjacency_matrix, directed=False
        )
        if n_components <= 1:
            return self.copy()
        lcc_label = int(np.argmax(np.bincount(labels)))
        keep_idx = np.where(labels == lcc_label)[0]
        return self.subgraph(keep_idx)

    def replace(self, **changes: Any) -> SpatialGraph:
        """Return a new SpatialGraph with selected fields replaced.

        Unlike :meth:`copy`, the adjacency matrix is shared by reference when
        not in *changes*, making pipeline steps that only attach positions or
        metadata essentially free in terms of adjacency allocation.  The
        NetworkX cache is also shared when the adjacency matrix is unchanged.

        Passing a ``adjacency_matrix`` that is a different object from the
        current one emits :class:`AdjacencyReplaceWarning` to surface the
        allocation cost on large graphs.

        Parameters
        ----------
        **changes
            Fields to override.  Valid keys: ``adjacency_matrix``,
            ``positions``, ``reconstructed_positions``, ``node_metadata``,
            ``edge_metadata``, ``node_id_map``, ``source_indices``.

        Returns
        -------
        SpatialGraph
            New instance.  Shares ``_adjacency_matrix`` and ``_nx_cache``
            with the original when ``adjacency_matrix`` is not in *changes*.

        Warns
        -----
        AdjacencyReplaceWarning
            When *adjacency_matrix* is replaced with a different object.

        Examples
        --------
        >>> sn_rec = sn.replace(reconstructed_positions=coords)
        >>> sn_rec.adjacency_matrix is sn.adjacency_matrix
        True
        """
        new_adj = changes.pop("adjacency_matrix", _MISSING)

        if new_adj is not _MISSING and new_adj is not self._adjacency_matrix:
            warnings.warn(
                f"replace() is allocating a new adjacency_matrix copy "
                f"({self._adjacency_matrix.shape[0]} nodes). "
                "Reuse the existing matrix if the topology has not changed.",
                AdjacencyReplaceWarning,
                stacklevel=2,
            )
            return SpatialGraph(
                adjacency_matrix=new_adj,
                positions=changes.get("positions", self.positions),
                reconstructed_positions=changes.get(
                    "reconstructed_positions", self.reconstructed_positions
                ),
                node_metadata=changes.get("node_metadata", self.node_metadata),
                edge_metadata=changes.get("edge_metadata", self.edge_metadata),
                node_id_map=changes.get("node_id_map", self.node_id_map),
                source_indices=changes.get("source_indices", self.source_indices),
                keep_lcc=False,
            )

        # Fast path: adjacency unchanged — share reference, bypass __post_init__
        inst = object.__new__(SpatialGraph)
        object.__setattr__(inst, "_adjacency_matrix", self._adjacency_matrix)
        inst._nx_cache = self._nx_cache
        inst._degree_cache = self._degree_cache
        inst._mean_shortest_path_cache = self._mean_shortest_path_cache
        inst._shortest_path_dist_cache = self._shortest_path_dist_cache
        inst._n_components_cache = self._n_components_cache
        inst._lcc_fraction_cache = self._lcc_fraction_cache
        inst.positions = changes.get("positions", self.positions)
        inst.reconstructed_positions = changes.get(
            "reconstructed_positions", self.reconstructed_positions
        )
        inst.node_metadata = changes.get("node_metadata", self.node_metadata)
        inst.edge_metadata = changes.get("edge_metadata", self.edge_metadata)
        inst.node_id_map = changes.get("node_id_map", self.node_id_map)
        inst.source_indices = changes.get("source_indices", self.source_indices)
        inst.keep_lcc = False
        return inst

Attributes

`adjacency_matrix` `property` `writable`

CSR adjacency matrix with zero diagonal and no duplicate entries.

`graph` `property`

NetworkX undirected graph built lazily from the adjacency matrix.

The result is cached; assigning a new adjacency_matrix invalidates the cache automatically.

Returns:

Type	Description
`Graph`	Undirected graph with integer node labels `0..n-1`.

`n_nodes` `property`

Number of nodes.

`n_edges` `property`

Number of undirected edges (no self-loops).

`edge_density` `property`

Fraction of all possible edges that are present.

Returns 0.0 for graphs with fewer than 2 nodes.

`degree_sequence` `property`

Degree of every node as a 1-D integer array, computed once and cached.

Returns:

Type	Description
`ndarray`	Shape `(n_nodes,)`, `dtype=int`.

`mean_degree` `property`

Mean node degree.

`degree_distribution` `property`

Counts per degree value: index k gives the number of degree-k nodes.

Returns:

Type	Description
`ndarray`	Shape `(max_degree + 1,)`, integer counts.

`mean_shortest_path` `property`

Mean pairwise shortest-path length, or None if not yet computed.

Populated automatically when a graph-structure computation (e.g. in :mod:spatial_graph_algorithms.metrics) runs and caches the result, or by direct assignment to sn._mean_shortest_path_cache.

TODO: implement lazy auto-computation via scipy.sparse.csgraph.shortest_path (O(n²)–O(n³); too expensive to trigger silently on large graphs).

`shortest_path_distribution` `property`

Histogram of pairwise shortest-path lengths, or None if not yet computed.

Index k gives the count of node pairs at shortest-path distance k. Populated alongside :attr:mean_shortest_path.

TODO: implement together with mean_shortest_path.

`n_connected_components` `property`

Number of connected components in the graph.

Computed lazily on first access and cached. For graphs constructed with keep_lcc=True (the default) this is always 1 after construction; graphs loaded with keep_lcc=False may have more.

`largest_component_fraction` `property`

Fraction of nodes in the largest connected component.

1.0 when the graph is fully connected. Computed together with :attr:n_connected_components and cached on first access.

`false_edge_fraction` `property`

Fraction of edges marked as false (injected noise), or None.

Returns None when :attr:edge_metadata is absent or has no is_false column (e.g. real-data graphs loaded via :meth:from_edge_list).

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, false_edges_fraction=0.10, seed=0)
>>> sg.false_edge_fraction > 0
True

`has_ground_truth` `property`

True when positions were set at construction (simulated or labelled data).

Returns False for graphs loaded from a raw edge list where no ground-truth coordinates or false-edge labels are available. Pipeline functions that require ground truth (e.g. :func:metrics.evaluate with positions comparison) can check this flag before running.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sn = generate(n=100, seed=0)
>>> sn.has_ground_truth
True
>>> sn_real = SpatialGraph.from_edge_list(sn.to_edge_dataframe())
>>> sn_real.has_ground_truth
False

Functions

`from_edge_list(source, source_col='source', target_col='target', **kwargs)` `classmethod`

Construct a SpatialGraph from a CSV edge list or a DataFrame.

This is the entry point for observational (real) data. Only the topology is known, so :attr:positions, :attr:edge_metadata, and :attr:node_metadata are all None after construction. Use :attr:has_ground_truth to check this programmatically.

Node identifiers in the edge list can be arbitrary (strings, integers, etc.). They are remapped to a contiguous integer range 0..n-1 and the mapping is stored in :attr:node_id_map.

Parameters:

Name	Type	Description	Default
`source`	`str or DataFrame`	Path to a CSV file or a DataFrame with at least two columns containing source and target node identifiers.	required
`source_col`	`str`	Column name for the source node. Default is `"source"`.	`'source'`
`target_col`	`str`	Column name for the target node. Default is `"target"`.	`'target'`
`**kwargs`	`Any`	Additional keyword arguments forwarded to the constructor (e.g. `keep_lcc`).	`{}`

Returns:

Type	Description
`SpatialGraph`	A new instance with adjacency_matrix and node_id_map populated. positions, edge_metadata, and node_metadata are all `None` — no ground truth is available.

Raises:

Type	Description
`KeyError`	If source_col or target_col are not present in the data.
`ValueError`	If any node identifier is not an integer. Convert non-integer IDs before calling this method, e.g. `df["source"] = df["source"].astype(int)`.

Examples:

>>> import pandas as pd
>>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
>>> sn = SpatialGraph.from_edge_list(df)
>>> sn.has_ground_truth
False
>>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")

Source code in src/spatial_graph_algorithms/network.py

@classmethod
def from_edge_list(
    cls,
    source: str | pd.DataFrame,
    source_col: str = "source",
    target_col: str = "target",
    **kwargs: Any,
) -> SpatialGraph:
    """Construct a SpatialGraph from a CSV edge list or a DataFrame.

    This is the entry point for **observational (real) data**.  Only the
    topology is known, so :attr:`positions`, :attr:`edge_metadata`, and
    :attr:`node_metadata` are all ``None`` after construction.  Use
    :attr:`has_ground_truth` to check this programmatically.

    Node identifiers in the edge list can be arbitrary (strings, integers,
    etc.).  They are remapped to a contiguous integer range ``0..n-1`` and
    the mapping is stored in :attr:`node_id_map`.

    Parameters
    ----------
    source : str or pd.DataFrame
        Path to a CSV file or a DataFrame with at least two columns
        containing source and target node identifiers.
    source_col : str
        Column name for the source node.  Default is ``"source"``.
    target_col : str
        Column name for the target node.  Default is ``"target"``.
    **kwargs
        Additional keyword arguments forwarded to the constructor (e.g.
        ``keep_lcc``).

    Returns
    -------
    SpatialGraph
        A new instance with *adjacency_matrix* and *node_id_map* populated.
        *positions*, *edge_metadata*, and *node_metadata* are all ``None``
        — no ground truth is available.

    Raises
    ------
    KeyError
        If *source_col* or *target_col* are not present in the data.
    ValueError
        If any node identifier is not an integer.  Convert non-integer IDs
        before calling this method, e.g.
        ``df["source"] = df["source"].astype(int)``.

    Examples
    --------
    >>> import pandas as pd
    >>> df = pd.DataFrame({"source": [0, 1, 2], "target": [1, 2, 0]})
    >>> sn = SpatialGraph.from_edge_list(df)
    >>> sn.has_ground_truth
    False
    >>> sn = SpatialGraph.from_edge_list("edges.csv", source_col="u", target_col="v")
    """
    if isinstance(source, str):
        df = pd.read_csv(source)
    else:
        df = source

    all_ids = set(df[source_col].tolist()).union(df[target_col].tolist())
    non_int = [v for v in all_ids if not isinstance(v, (int, np.integer))]
    if non_int:
        sample = non_int[:5]
        raise ValueError(
            f"Node identifiers must be integers. "
            f"Found non-integer values: {sample!r}. "
            "Convert them first, e.g.: "
            f"df['{source_col}'] = df['{source_col}'].astype(int)"
        )

    node_ids = sorted(all_ids)
    id_map = {orig: i for i, orig in enumerate(node_ids)}

    n = len(node_ids)
    rows = np.array([id_map[v] for v in df[source_col]], dtype=int)
    cols = np.array([id_map[v] for v in df[target_col]], dtype=int)
    data = np.ones(len(rows), dtype=float)

    adj = scipy.sparse.csr_matrix(
        (
            np.concatenate([data, data]),
            (np.concatenate([rows, cols]), np.concatenate([cols, rows])),
        ),
        shape=(n, n),
    )

    return cls(adjacency_matrix=adj, node_id_map=id_map, **kwargs)

`from_positions(source, **kwargs)` `classmethod`

Construct a SpatialGraph from a point-cloud (no edges).

The adjacency matrix is initialised as an empty n × n matrix. Use :func:spatial_graph_algorithms.simulate.generate if you want edges built automatically from the positions.

Parameters:

Name	Type	Description	Default
`source`	`str, pd.DataFrame, or np.ndarray`	Node coordinates. A CSV path or DataFrame is converted to a float array; numeric columns are used as spatial dimensions.	required
`**kwargs`	`Any`	Additional keyword arguments forwarded to the constructor. Note that keep_lcc defaults to `False` here because an empty graph has no connected components.	`{}`

Returns:

Type	Description
`SpatialGraph`	A new instance with positions set and an empty adjacency matrix.

Examples:

>>> import numpy as np
>>> pts = np.random.default_rng(0).random((50, 2))
>>> sn = SpatialGraph.from_positions(pts)
>>> sn.positions.shape
(50, 2)

Source code in src/spatial_graph_algorithms/network.py

@classmethod
def from_positions(
    cls,
    source: str | pd.DataFrame | np.ndarray,
    **kwargs: Any,
) -> SpatialGraph:
    """Construct a SpatialGraph from a point-cloud (no edges).

    The adjacency matrix is initialised as an empty *n × n* matrix.  Use
    :func:`spatial_graph_algorithms.simulate.generate` if you want edges built
    automatically from the positions.

    Parameters
    ----------
    source : str, pd.DataFrame, or np.ndarray
        Node coordinates.  A CSV path or DataFrame is converted to a float
        array; numeric columns are used as spatial dimensions.
    **kwargs
        Additional keyword arguments forwarded to the constructor.  Note
        that *keep_lcc* defaults to ``False`` here because an empty graph
        has no connected components.

    Returns
    -------
    SpatialGraph
        A new instance with *positions* set and an empty adjacency matrix.

    Examples
    --------
    >>> import numpy as np
    >>> pts = np.random.default_rng(0).random((50, 2))
    >>> sn = SpatialGraph.from_positions(pts)
    >>> sn.positions.shape
    (50, 2)
    """
    if isinstance(source, str):
        positions = pd.read_csv(source).values.astype(float)
    elif isinstance(source, pd.DataFrame):
        positions = source.values.astype(float)
    else:
        positions = np.asarray(source, dtype=float)

    n = positions.shape[0]
    adj = scipy.sparse.csr_matrix((n, n))
    id_map = {i: i for i in range(n)}

    keep_lcc = kwargs.pop("keep_lcc", False)
    return cls(
        adjacency_matrix=adj,
        positions=positions,
        node_id_map=id_map,
        keep_lcc=keep_lcc,
        **kwargs,
    )

`to_edge_dataframe()`

Export unique edges as a DataFrame.

Returns:

Type	Description
`DataFrame`	Two-column DataFrame with integer columns `source` and `target`. Only the upper-triangle edges are returned (each edge once).

Source code in src/spatial_graph_algorithms/network.py

def to_edge_dataframe(self) -> pd.DataFrame:
    """Export unique edges as a DataFrame.

    Returns
    -------
    pd.DataFrame
        Two-column DataFrame with integer columns ``source`` and ``target``.
        Only the upper-triangle edges are returned (each edge once).
    """
    coo = self._adjacency_matrix.tocoo()
    mask = coo.row < coo.col
    return pd.DataFrame({"source": coo.row[mask], "target": coo.col[mask]})

`to_positions_dataframe()`

Export node positions as a DataFrame.

Column names are x, y for 2-D and x, y, z for 3-D. Higher-dimensional arrays use x0, x1, ... notation.

Returns:

Type	Description
`DataFrame`	One row per node, one column per spatial dimension.

Raises:

Type	Description
`ValueError`	If :attr:`positions` is `None`.

Source code in src/spatial_graph_algorithms/network.py

def to_positions_dataframe(self) -> pd.DataFrame:
    """Export node positions as a DataFrame.

    Column names are ``x``, ``y`` for 2-D and ``x``, ``y``, ``z`` for 3-D.
    Higher-dimensional arrays use ``x0``, ``x1``, ... notation.

    Returns
    -------
    pd.DataFrame
        One row per node, one column per spatial dimension.

    Raises
    ------
    ValueError
        If :attr:`positions` is ``None``.
    """
    if self.positions is None:
        raise ValueError("positions is None — nothing to export")

    dim = self.positions.shape[1]
    if dim == 2:
        cols = ["x", "y"]
    elif dim == 3:
        cols = ["x", "y", "z"]
    else:
        cols = [f"x{i}" for i in range(dim)]

    return pd.DataFrame(self.positions, columns=cols)

`graph_report(*, plot=True, figsize=(4.5, 3.0), with_all_pairs=False, bins=30, max_edge_sample=50000, max_pair_sample=200000, seed=None)`

Return a human-readable summary of the graph and optionally show plots.

When plot is True, two figures are displayed:

Degree distribution — histogram of node degrees with mean and median marked.
Edge length histogram — requires positions. With with_all_pairs enabled, overlays the all-pairs distance distribution as a reference.

Parameters:

Name	Type	Description	Default
`plot`	`bool`	If `True` (default), render the degree distribution and, when positions are available, the edge-length histogram.	`True`
`figsize`	`tuple of float`	Figure size `(width, height)` in inches, applied to both plots.	`(4.5, 3.0)`
`with_all_pairs`	`bool`	When `True`, overlay the all-pairs distance reference line on the edge-length histogram. Ignored when plot is `False` or positions are unavailable.	`False`
`bins`	`int`	Number of bins for the edge-length histogram.	`30`
`max_edge_sample`	`int or None`	Maximum edges used for the edge-length histogram; larger graphs are sampled. `None` disables sampling.	`50000`
`max_pair_sample`	`int`	Maximum all-pair distances sampled for the reference line.	`200000`
`seed`	`int or None`	Random seed for reproducible sampling.	`None`

Returns:

Type	Description
`str`	Multi-line summary of basic graph statistics.

Source code in src/spatial_graph_algorithms/network.py

def graph_report(
    self,
    *,
    plot: bool = True,
    figsize: tuple = (4.5, 3.0),
    with_all_pairs: bool = False,
    bins: int = 30,
    max_edge_sample: int | None = 50_000,
    max_pair_sample: int = 200_000,
    seed: int | None = None,
) -> str:
    """Return a human-readable summary of the graph and optionally show plots.

    When *plot* is ``True``, two figures are displayed:

    - **Degree distribution** — histogram of node degrees with mean and
      median marked.
    - **Edge length histogram** — requires *positions*.  With
      *with_all_pairs* enabled, overlays the all-pairs distance
      distribution as a reference.

    Parameters
    ----------
    plot : bool
        If ``True`` (default), render the degree distribution and, when
        positions are available, the edge-length histogram.
    figsize : tuple of float
        Figure size ``(width, height)`` in inches, applied to both plots.
    with_all_pairs : bool
        When ``True``, overlay the all-pairs distance reference line on
        the edge-length histogram.  Ignored when *plot* is ``False`` or
        positions are unavailable.
    bins : int
        Number of bins for the edge-length histogram.
    max_edge_sample : int or None
        Maximum edges used for the edge-length histogram; larger graphs
        are sampled.  ``None`` disables sampling.
    max_pair_sample : int
        Maximum all-pair distances sampled for the reference line.
    seed : int or None
        Random seed for reproducible sampling.

    Returns
    -------
    str
        Multi-line summary of basic graph statistics.
    """
    n = self._adjacency_matrix.shape[0]
    e = self._adjacency_matrix.nnz // 2
    density = e / (n * (n - 1) / 2) if n > 1 else 0
    gt = self.has_ground_truth
    rec = self.reconstructed_positions is not None
    bipartite = nx.is_bipartite(self.graph)

    if plot:
        from spatial_graph_algorithms.plot import (
            plot_degree_distribution,
            plot_edge_length_histogram,
        )
        plot_degree_distribution(self, figsize=figsize)
        if self.positions is not None:
            plot_edge_length_histogram(
                self,
                figsize=figsize,
                bins=bins,
                with_all_pairs=with_all_pairs,
                max_edge_sample=max_edge_sample,
                max_pair_sample=max_pair_sample,
                seed=seed,
            )

    return (
        f"SpatialGraph Report:\n"
        f"Nodes: {n}\n"
        f"Edges: {e}\n"
        f"Density: {density:.4f}\n"
        f"Bipartite: {bipartite}\n"
        f"Has Ground Truth: {gt}\n"
        f"Has Reconstructed Positions: {rec}"
    )

`copy()`

Return a deep copy of this SpatialGraph.

The copy is constructed with keep_lcc=False because LCC extraction has already been applied to the source object.

Returns:

Type	Description
`SpatialGraph`	Independent copy with all arrays and DataFrames deep-copied.

Source code in src/spatial_graph_algorithms/network.py

def copy(self) -> SpatialGraph:
    """Return a deep copy of this SpatialGraph.

    The copy is constructed with ``keep_lcc=False`` because LCC extraction
    has already been applied to the source object.

    Returns
    -------
    SpatialGraph
        Independent copy with all arrays and DataFrames deep-copied.
    """
    return SpatialGraph(
        adjacency_matrix=self._adjacency_matrix.copy(),
        positions=None if self.positions is None else self.positions.copy(),
        reconstructed_positions=(
            None
            if self.reconstructed_positions is None
            else self.reconstructed_positions.copy()
        ),
        node_metadata=(
            None if self.node_metadata is None else self.node_metadata.copy(deep=True)
        ),
        edge_metadata=(
            None if self.edge_metadata is None else self.edge_metadata.copy(deep=True)
        ),
        node_id_map=None if self.node_id_map is None else dict(self.node_id_map),
        source_indices=None if self.source_indices is None else self.source_indices.copy(),
        keep_lcc=False,
    )

`subgraph(node_indices)`

Return a new SpatialGraph restricted to node_indices.

All metadata fields are sliced consistently and edge indices are remapped to the new contiguous range 0..len(node_indices)-1. This covers both one-off node selection and repeated LCC re-extraction after pipeline steps that change graph topology.

Parameters:

Name	Type	Description	Default
`node_indices`	`array-like of int`	Indices of nodes to keep, in any order. Duplicates are silently deduplicated; order is preserved after deduplication.	required

Returns:

Type	Description
`SpatialGraph`	New instance with `keep_lcc=False`. `source_indices` is composed with the parent's `source_indices` so it always refers back to the original graph's node numbering.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, seed=0)
>>> sg_sub = sg.subgraph(range(20))
>>> sg_sub.n_nodes
20

Source code in src/spatial_graph_algorithms/network.py

def subgraph(self, node_indices: np.ndarray | list[int]) -> SpatialGraph:
    """Return a new SpatialGraph restricted to *node_indices*.

    All metadata fields are sliced consistently and edge indices are
    remapped to the new contiguous range ``0..len(node_indices)-1``.
    This covers both one-off node selection and repeated LCC re-extraction
    after pipeline steps that change graph topology.

    Parameters
    ----------
    node_indices : array-like of int
        Indices of nodes to keep, in any order.  Duplicates are silently
        deduplicated; order is preserved after deduplication.

    Returns
    -------
    SpatialGraph
        New instance with ``keep_lcc=False``.  ``source_indices`` is
        composed with the parent's ``source_indices`` so it always refers
        back to the original graph's node numbering.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> sg = generate(n=100, seed=0)
    >>> sg_sub = sg.subgraph(range(20))
    >>> sg_sub.n_nodes
    20
    """
    keep = np.asarray(list(dict.fromkeys(int(i) for i in node_indices)), dtype=np.intp)
    remap = {int(old): new for new, old in enumerate(keep)}

    new_adj = self._adjacency_matrix[keep][:, keep].tocsr()

    new_positions = None if self.positions is None else self.positions[keep]
    new_rec = (
        None
        if self.reconstructed_positions is None
        else self.reconstructed_positions[keep]
    )
    new_node_meta = (
        None
        if self.node_metadata is None
        else self.node_metadata.iloc[keep].reset_index(drop=True)
    )

    new_edge_meta = None
    if (
        self.edge_metadata is not None
        and isinstance(self.edge_metadata, pd.DataFrame)
        and {"source", "target"}.issubset(self.edge_metadata.columns)
    ):
        keep_set = set(keep.tolist())
        mask = (
            self.edge_metadata["source"].isin(keep_set)
            & self.edge_metadata["target"].isin(keep_set)
        )
        meta = self.edge_metadata.loc[mask].copy()
        meta["source"] = meta["source"].map(remap)
        meta["target"] = meta["target"].map(remap)
        new_edge_meta = meta.reset_index(drop=True)

    new_node_id_map = None
    if self.node_id_map is not None:
        inverse = {v: k for k, v in self.node_id_map.items()}
        new_node_id_map = {
            inverse[int(old)]: new for new, old in enumerate(keep)
        }

    base = self.source_indices if self.source_indices is not None else np.arange(
        self._adjacency_matrix.shape[0]
    )
    new_source_indices = base[keep]

    return SpatialGraph(
        adjacency_matrix=new_adj,
        positions=new_positions,
        reconstructed_positions=new_rec,
        node_metadata=new_node_meta,
        edge_metadata=new_edge_meta,
        node_id_map=new_node_id_map,
        source_indices=new_source_indices,
        keep_lcc=False,
    )

`lcc_subgraph()`

Return a new SpatialGraph containing only the largest connected component.

Unlike the keep_lcc=True constructor option, this method can be called at any point in the pipeline — after denoising, BFS sampling, or manual edge filtering — without discarding metadata.

Returns:

Type	Description
`SpatialGraph`	New instance restricted to the LCC, with all metadata fields filtered and remapped consistently.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> sg = generate(n=100, seed=0, keep_lcc=False)
>>> sg_lcc = sg.lcc_subgraph()
>>> sg_lcc.n_connected_components
1

Source code in src/spatial_graph_algorithms/network.py

def lcc_subgraph(self) -> SpatialGraph:
    """Return a new SpatialGraph containing only the largest connected component.

    Unlike the ``keep_lcc=True`` constructor option, this method can be
    called at any point in the pipeline — after denoising, BFS sampling,
    or manual edge filtering — without discarding metadata.

    Returns
    -------
    SpatialGraph
        New instance restricted to the LCC, with all metadata fields
        filtered and remapped consistently.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> sg = generate(n=100, seed=0, keep_lcc=False)
    >>> sg_lcc = sg.lcc_subgraph()
    >>> sg_lcc.n_connected_components
    1
    """
    n_components, labels = scipy.sparse.csgraph.connected_components(
        self._adjacency_matrix, directed=False
    )
    if n_components <= 1:
        return self.copy()
    lcc_label = int(np.argmax(np.bincount(labels)))
    keep_idx = np.where(labels == lcc_label)[0]
    return self.subgraph(keep_idx)

`replace(**changes)`

Return a new SpatialGraph with selected fields replaced.

Unlike :meth:copy, the adjacency matrix is shared by reference when not in changes, making pipeline steps that only attach positions or metadata essentially free in terms of adjacency allocation. The NetworkX cache is also shared when the adjacency matrix is unchanged.

Passing a adjacency_matrix that is a different object from the current one emits :class:AdjacencyReplaceWarning to surface the allocation cost on large graphs.

Parameters:

Name	Type	Description	Default
`**changes`	`Any`	Fields to override. Valid keys: `adjacency_matrix`, `positions`, `reconstructed_positions`, `node_metadata`, `edge_metadata`, `node_id_map`, `source_indices`.	`{}`

Returns:

Type	Description
`SpatialGraph`	New instance. Shares `_adjacency_matrix` and `_nx_cache` with the original when `adjacency_matrix` is not in changes.

Warns:

Type	Description
`AdjacencyReplaceWarning`	When adjacency_matrix is replaced with a different object.

Examples:

>>> sn_rec = sn.replace(reconstructed_positions=coords)
>>> sn_rec.adjacency_matrix is sn.adjacency_matrix
True

Source code in src/spatial_graph_algorithms/network.py

def replace(self, **changes: Any) -> SpatialGraph:
    """Return a new SpatialGraph with selected fields replaced.

    Unlike :meth:`copy`, the adjacency matrix is shared by reference when
    not in *changes*, making pipeline steps that only attach positions or
    metadata essentially free in terms of adjacency allocation.  The
    NetworkX cache is also shared when the adjacency matrix is unchanged.

    Passing a ``adjacency_matrix`` that is a different object from the
    current one emits :class:`AdjacencyReplaceWarning` to surface the
    allocation cost on large graphs.

    Parameters
    ----------
    **changes
        Fields to override.  Valid keys: ``adjacency_matrix``,
        ``positions``, ``reconstructed_positions``, ``node_metadata``,
        ``edge_metadata``, ``node_id_map``, ``source_indices``.

    Returns
    -------
    SpatialGraph
        New instance.  Shares ``_adjacency_matrix`` and ``_nx_cache``
        with the original when ``adjacency_matrix`` is not in *changes*.

    Warns
    -----
    AdjacencyReplaceWarning
        When *adjacency_matrix* is replaced with a different object.

    Examples
    --------
    >>> sn_rec = sn.replace(reconstructed_positions=coords)
    >>> sn_rec.adjacency_matrix is sn.adjacency_matrix
    True
    """
    new_adj = changes.pop("adjacency_matrix", _MISSING)

    if new_adj is not _MISSING and new_adj is not self._adjacency_matrix:
        warnings.warn(
            f"replace() is allocating a new adjacency_matrix copy "
            f"({self._adjacency_matrix.shape[0]} nodes). "
            "Reuse the existing matrix if the topology has not changed.",
            AdjacencyReplaceWarning,
            stacklevel=2,
        )
        return SpatialGraph(
            adjacency_matrix=new_adj,
            positions=changes.get("positions", self.positions),
            reconstructed_positions=changes.get(
                "reconstructed_positions", self.reconstructed_positions
            ),
            node_metadata=changes.get("node_metadata", self.node_metadata),
            edge_metadata=changes.get("edge_metadata", self.edge_metadata),
            node_id_map=changes.get("node_id_map", self.node_id_map),
            source_indices=changes.get("source_indices", self.source_indices),
            keep_lcc=False,
        )

    # Fast path: adjacency unchanged — share reference, bypass __post_init__
    inst = object.__new__(SpatialGraph)
    object.__setattr__(inst, "_adjacency_matrix", self._adjacency_matrix)
    inst._nx_cache = self._nx_cache
    inst._degree_cache = self._degree_cache
    inst._mean_shortest_path_cache = self._mean_shortest_path_cache
    inst._shortest_path_dist_cache = self._shortest_path_dist_cache
    inst._n_components_cache = self._n_components_cache
    inst._lcc_fraction_cache = self._lcc_fraction_cache
    inst.positions = changes.get("positions", self.positions)
    inst.reconstructed_positions = changes.get(
        "reconstructed_positions", self.reconstructed_positions
    )
    inst.node_metadata = changes.get("node_metadata", self.node_metadata)
    inst.edge_metadata = changes.get("edge_metadata", self.edge_metadata)
    inst.node_id_map = changes.get("node_id_map", self.node_id_map)
    inst.source_indices = changes.get("source_indices", self.source_indices)
    inst.keep_lcc = False
    return inst

`spatial_graph_algorithms.network.to_igraph(sn)`

Convert a SpatialGraph to an undirected igraph.Graph.

Parameters:

Name	Type	Description	Default
`sn`	`SpatialGraph`	Source graph. Only the adjacency matrix is used; metadata is not transferred.	required

Returns:

Type	Description
`Graph`	Undirected unweighted graph with `n` vertices and the same edge set as sn.adjacency_matrix.

Examples:

>>> from spatial_graph_algorithms.simulate import generate
>>> from spatial_graph_algorithms.network import to_igraph
>>> sn = generate(n=100, seed=0)
>>> g = to_igraph(sn)
>>> g.vcount()
100

Source code in src/spatial_graph_algorithms/network.py

def to_igraph(sn: SpatialGraph) -> igraph.Graph:
    """Convert a SpatialGraph to an undirected igraph.Graph.

    Parameters
    ----------
    sn : SpatialGraph
        Source graph.  Only the adjacency matrix is used; metadata is not
        transferred.

    Returns
    -------
    igraph.Graph
        Undirected unweighted graph with ``n`` vertices and the same edge set as
        *sn.adjacency_matrix*.

    Examples
    --------
    >>> from spatial_graph_algorithms.simulate import generate
    >>> from spatial_graph_algorithms.network import to_igraph
    >>> sn = generate(n=100, seed=0)
    >>> g = to_igraph(sn)
    >>> g.vcount()
    100
    """
    adj = sn._adjacency_matrix.tocoo()
    edges = [(int(r), int(c)) for r, c in zip(adj.row, adj.col) if r < c]
    return igraph.Graph(n=adj.shape[0], edges=edges)

SpatialGraph

Constructor patterns

Key attributes

API Reference

spatial_graph_algorithms.network.SpatialGraph dataclass

Attributes

adjacency_matrix property writable

graph property

n_nodes property

n_edges property

edge_density property

degree_sequence property

mean_degree property

degree_distribution property

mean_shortest_path property

shortest_path_distribution property

n_connected_components property

largest_component_fraction property

false_edge_fraction property

has_ground_truth property

Functions

from_edge_list(source, source_col='source', target_col='target', **kwargs) classmethod

from_positions(source, **kwargs) classmethod

to_edge_dataframe()

to_positions_dataframe()

graph_report(*, plot=True, figsize=(4.5, 3.0), with_all_pairs=False, bins=30, max_edge_sample=50000, max_pair_sample=200000, seed=None)

copy()

subgraph(node_indices)

lcc_subgraph()

replace(**changes)

spatial_graph_algorithms.network.to_igraph(sn)

`spatial_graph_algorithms.network.SpatialGraph` `dataclass`

`adjacency_matrix` `property` `writable`

`graph` `property`

`n_nodes` `property`

`n_edges` `property`

`edge_density` `property`

`degree_sequence` `property`

`mean_degree` `property`

`degree_distribution` `property`

`mean_shortest_path` `property`

`shortest_path_distribution` `property`

`n_connected_components` `property`

`largest_component_fraction` `property`

`false_edge_fraction` `property`

`has_ground_truth` `property`

`from_edge_list(source, source_col='source', target_col='target', **kwargs)` `classmethod`

`from_positions(source, **kwargs)` `classmethod`

`to_edge_dataframe()`

`to_positions_dataframe()`

`graph_report(*, plot=True, figsize=(4.5, 3.0), with_all_pairs=False, bins=30, max_edge_sample=50000, max_pair_sample=200000, seed=None)`

`copy()`

`subgraph(node_indices)`

`lcc_subgraph()`

`replace(**changes)`

`spatial_graph_algorithms.network.to_igraph(sn)`