Collaboration Guide

This guide defines how contributors should change spatial_graph_algorithms without breaking the package, slowing it down, or making the scientific behavior ambiguous. CONTRIBUTING.md explains setup and basic PR mechanics; this document defines the engineering contract for collaborative work.

Core Principles

Keep changes small enough to review. One PR should solve one user-visible problem.
Preserve the public API unless the PR explicitly declares a breaking change.
Make behavior measurable. New code needs validation, tests, and clear failure modes.
Prefer boring implementation over clever implementation unless performance requires otherwise.
Document assumptions when scientific or algorithmic choices affect results.
Do not add dependencies, global state, randomness, or expensive defaults casually.

Standard Workflow

Open or reference an issue before starting non-trivial work.
Create a branch from main using feat/<topic>, fix/<topic>, docs/<topic>, or refactor/<topic>.
Install locally with pip install -e ".[all]".
Make the smallest coherent change.
Add or update tests in tests/.
Run local validation before opening a PR.
Open a PR with a clear description, test evidence, and known limitations.

Minimum local checks before review:

ruff check .
pytest

Use targeted checks while developing:

pytest tests/test_simulate.py
pytest -k "reconstruct and mds"
pytest --tb=short -x

Function Design

Every new function should have one clear responsibility. If a function validates input, transforms data, computes an algorithm, and formats output, split it.

Public functions must:

Use explicit type hints for all parameters and return values.
Have a concise docstring using the project style: NumPy-style when parameters need explanation, otherwise a one-line summary is enough.
Avoid mutating inputs unless the function name or docstring makes mutation explicit.
Accept deterministic seeds or random_state when randomness is involved.
Validate inputs at the boundary of the public API.
Return project-native objects where appropriate, especially SpatialGraph.
Avoid hidden I/O, network access, plotting side effects, and global configuration changes.

Prefer this shape:

def compute_score(
    graph: SpatialGraph,
    *,
    k: int = 10,
    normalize: bool = True,
) -> float:
    """Compute a bounded graph quality score."""
    if k <= 0:
        raise ValueError("k must be positive")
    ...

Avoid:

def do_stuff(data, options={}):
    ...

Reasons:

data and return type are unclear.
Mutable defaults can leak state.
The function name does not say what contract it provides.
Validation requirements are invisible.

Public API Rules

The public API includes documented modules, exported names in __init__.py, and behavior shown in examples or docs. Changing any of these requires extra care.

For public API changes:

Add or update documentation in docs/.
Add a test that covers the documented behavior.
Keep backward compatibility when reasonable.
If compatibility is impossible, document the migration path in the PR.
Do not remove parameters without a deprecation period unless the project is still explicitly treating that API as experimental.

Internal helpers should start with _ unless they are intended for users.

TODOs And Known Gaps

TODOs are acceptable only when they preserve actionable context. A TODO without an owner or decision point becomes technical debt that nobody can resolve.

Use this format:

# TODO(david, issue #123): Replace dense shortest-path fallback with sparse implementation
# before enabling graphs above 50k nodes.

Rules:

Include an owner or issue reference.
Explain the constraint, not just the desired change.
Do not use TODOs to excuse broken tests, unsafe defaults, or undocumented behavior.
Prefer creating an issue when the work is larger than a local follow-up.
Use FIXME only for known incorrect behavior that must not be forgotten.

Validation

Validation is part of the API contract. It should fail early with useful error messages.

Validate at public boundaries:

Graph shape and square adjacency matrices.
Matching node counts across adjacency, positions, labels, and metadata.
Non-negative counts, dimensions, and neighborhood sizes.
Fraction parameters constrained to [0, 1].
Optional dependencies with clear install guidance.
Random seeds and reproducibility controls.

Validation should raise specific exceptions:

ValueError for invalid values.
TypeError for wrong object types when type hints are insufficient.
ImportError with an install hint for missing optional dependencies.

Avoid silent coercion unless it is documented and tested. Scientific packages should not quietly change user data in surprising ways.

Testing Requirements

Every behavior change needs tests. The test should fail before the change and pass after it.

Use this test taxonomy:

Test type	Required when	Example
Unit test	Adding a helper, metric, validator, or branch	Validate bad `k` raises `ValueError`
Integration test	Connecting modules or changing pipeline behavior	`simulate -> reconstruct -> evaluate`
Regression test	Fixing a bug	Minimal reproducer from the issue
Parameterized test	Same behavior across modes/options	All graph construction modes
Optional dependency test	Feature depends on an extra	STRND, Leiden, UMAP-style integrations
Documentation/example test	Public example should keep working	Scripts under `examples/`

Test rules:

Keep tests deterministic. Set seeds for stochastic algorithms.
Assert meaningful behavior, not implementation details.
Test edge cases: empty, tiny, disconnected, dense, sparse, invalid, and high-dimensional inputs.
Use pytest.mark.parametrize for repeated behavior across algorithms or modes.
Do not skip tests unless the reason is unavoidable and visible.
New optional dependencies need tests that handle missing extras cleanly.

For floating point results:

Use tolerances with pytest.approx, numpy.testing, or explicit bounds.
Prefer invariant checks when exact values are unstable.
Test monotonic or bounded properties where algorithms are approximate.

Performance And Memory

Performance-sensitive changes require evidence. This package works with graph data, so accidental O(n^2) memory or time growth matters.

Add performance coverage when a change:

Introduces a new algorithm.
Changes graph construction, reconstruction, shortest paths, nearest-neighbor logic, or plotting.
Adds a loop over nodes, edges, pairs, or dimensions.
Converts sparse matrices to dense arrays.
Touches large metadata or dataframe operations.

Performance checks should measure the smallest meaningful operation:

def test_graph_summary_benchmark(benchmark, medium_spatial_graph):
    result = benchmark(graph_summary, medium_spatial_graph)
    assert result["n_nodes"] == medium_spatial_graph.n_nodes

Guidelines:

Benchmark algorithm calls, not expensive fixture setup.
Use stable input sizes and explicit parametrization IDs.
Keep benchmarks separate from normal correctness tests if they are slow.
Track wall time locally with pytest-benchmark or CI performance tooling such as CodSpeed.
Treat benchmark tests as guardrails, not replacements for complexity analysis.

Memory-sensitive code should:

Preserve sparse representations when possible.
Avoid toarray() and dense pairwise matrices unless the input size is intentionally bounded.
Document expected memory complexity for algorithms that scale poorly.
Include a memory check or profiling note in the PR when memory behavior changes.

If a function cannot support large graphs, make the limit explicit in validation or documentation.

Dependencies And Requirements

Runtime dependencies belong in [project.dependencies] in pyproject.toml. Optional feature dependencies belong in [project.optional-dependencies].

Rules:

Do not add a dependency for a small helper that can be implemented clearly in the package.
Discuss new runtime dependencies before adding them.
Prefer optional extras for algorithm-specific integrations.
Keep lower bounds realistic and justified by used APIs.
Avoid upper bounds unless there is a known incompatibility.
Update installation docs when extras change.
Add tests for missing optional dependencies and successful optional paths.

Development-only tools belong in the dev extra unless the project adopts dependency groups later.

Documentation Requirements

Docs must change with behavior. A PR is incomplete if the user-facing behavior changes but the docs still describe the old behavior.

Update docs when:

Adding public functions, parameters, modes, metrics, or extras.
Changing defaults.
Changing return values or metadata.
Adding known limitations or required assumptions.
Changing installation or optional dependency behavior.

Documentation should include:

Minimal runnable examples.
Parameter meaning when not obvious.
Return type and important metadata keys.
Failure modes and optional dependency hints.
Scientific assumptions where relevant.

Review Checklist

Authors should verify:

The PR solves one clear problem.
Public API changes are documented.
New functions have type hints and docstrings.
Inputs are validated at public boundaries.
Tests cover normal, edge, and error cases.
Random behavior is seeded or controlled.
Sparse data remains sparse unless dense conversion is justified.
Performance and memory risks are measured or explained.
New dependencies are justified and placed in the correct extra.
ruff check . and pytest pass locally.

Reviewers should check:

The test would fail without the implementation.
The implementation matches the documented contract.
Error messages help users fix invalid input.
The change does not hide expensive work behind convenient defaults.
The PR does not mix unrelated refactors with feature or bug work.
New TODOs are actionable and traceable.

PR Template

Use this structure in PR descriptions:

## What changed

## Why

## API impact

## Validation

## Tests run

## Performance/memory impact

## Follow-ups

If performance or memory is not relevant, say why.

Sources Used

This guide follows practices from:

PyPA's pyproject.toml specification for project metadata, dependencies, optional dependencies, and tool configuration: https://packaging.python.org/specifications/declaring-project-metadata/
PyPA's pyproject.toml guide for using [build-system], [project], and [tool]: https://packaging.python.org/en/latest/guides/writing-pyproject-toml/
pytest's good integration practices for editable installs, src/ layout, test discovery, and package-oriented testing: https://docs.pytest.org/en/stable/explanation/goodpractices.html
pytest's invocation guide for targeted test runs: https://docs.pytest.org/en/stable/how-to/usage.html
Scientific Python's development guide for style checks, pre-commit/static checks, and type-checking expectations: https://learn.scientific-python.org/development/guides/style/
Python's pull request lifecycle guidance for focused PRs, checks before review, and clear communication: https://devguide.python.org/getting-started/pull-request-lifecycle/
pytest-benchmark documentation for benchmark fixtures, pedantic mode, and benchmark comparison: https://pytest-benchmark.readthedocs.io/en/latest/usage.html
CodSpeed pytest documentation for CI-oriented Python performance and memory benchmarking: https://codspeed.io/docs/reference/pytest-codspeed