Collaboration Guide
This guide defines how contributors should change spatial_graph_algorithms without
breaking the package, slowing it down, or making the scientific behavior ambiguous.
CONTRIBUTING.md explains setup and basic PR mechanics; this document defines the
engineering contract for collaborative work.
Core Principles
- Keep changes small enough to review. One PR should solve one user-visible problem.
- Preserve the public API unless the PR explicitly declares a breaking change.
- Make behavior measurable. New code needs validation, tests, and clear failure modes.
- Prefer boring implementation over clever implementation unless performance requires otherwise.
- Document assumptions when scientific or algorithmic choices affect results.
- Do not add dependencies, global state, randomness, or expensive defaults casually.
Standard Workflow
- Open or reference an issue before starting non-trivial work.
- Create a branch from
mainusingfeat/<topic>,fix/<topic>,docs/<topic>, orrefactor/<topic>. - Install locally with
pip install -e ".[all]". - Make the smallest coherent change.
- Add or update tests in
tests/. - Run local validation before opening a PR.
- Open a PR with a clear description, test evidence, and known limitations.
Minimum local checks before review:
Use targeted checks while developing:
Function Design
Every new function should have one clear responsibility. If a function validates input, transforms data, computes an algorithm, and formats output, split it.
Public functions must:
- Use explicit type hints for all parameters and return values.
- Have a concise docstring using the project style: NumPy-style when parameters need explanation, otherwise a one-line summary is enough.
- Avoid mutating inputs unless the function name or docstring makes mutation explicit.
- Accept deterministic seeds or
random_statewhen randomness is involved. - Validate inputs at the boundary of the public API.
- Return project-native objects where appropriate, especially
SpatialGraph. - Avoid hidden I/O, network access, plotting side effects, and global configuration changes.
Prefer this shape:
def compute_score(
graph: SpatialGraph,
*,
k: int = 10,
normalize: bool = True,
) -> float:
"""Compute a bounded graph quality score."""
if k <= 0:
raise ValueError("k must be positive")
...
Avoid:
Reasons:
dataand return type are unclear.- Mutable defaults can leak state.
- The function name does not say what contract it provides.
- Validation requirements are invisible.
Public API Rules
The public API includes documented modules, exported names in __init__.py, and behavior shown
in examples or docs. Changing any of these requires extra care.
For public API changes:
- Add or update documentation in
docs/. - Add a test that covers the documented behavior.
- Keep backward compatibility when reasonable.
- If compatibility is impossible, document the migration path in the PR.
- Do not remove parameters without a deprecation period unless the project is still explicitly treating that API as experimental.
Internal helpers should start with _ unless they are intended for users.
TODOs And Known Gaps
TODOs are acceptable only when they preserve actionable context. A TODO without an owner or decision point becomes technical debt that nobody can resolve.
Use this format:
# TODO(david, issue #123): Replace dense shortest-path fallback with sparse implementation
# before enabling graphs above 50k nodes.
Rules:
- Include an owner or issue reference.
- Explain the constraint, not just the desired change.
- Do not use TODOs to excuse broken tests, unsafe defaults, or undocumented behavior.
- Prefer creating an issue when the work is larger than a local follow-up.
- Use
FIXMEonly for known incorrect behavior that must not be forgotten.
Validation
Validation is part of the API contract. It should fail early with useful error messages.
Validate at public boundaries:
- Graph shape and square adjacency matrices.
- Matching node counts across adjacency, positions, labels, and metadata.
- Non-negative counts, dimensions, and neighborhood sizes.
- Fraction parameters constrained to
[0, 1]. - Optional dependencies with clear install guidance.
- Random seeds and reproducibility controls.
Validation should raise specific exceptions:
ValueErrorfor invalid values.TypeErrorfor wrong object types when type hints are insufficient.ImportErrorwith an install hint for missing optional dependencies.
Avoid silent coercion unless it is documented and tested. Scientific packages should not quietly change user data in surprising ways.
Testing Requirements
Every behavior change needs tests. The test should fail before the change and pass after it.
Use this test taxonomy:
| Test type | Required when | Example |
|---|---|---|
| Unit test | Adding a helper, metric, validator, or branch | Validate bad k raises ValueError |
| Integration test | Connecting modules or changing pipeline behavior | simulate -> reconstruct -> evaluate |
| Regression test | Fixing a bug | Minimal reproducer from the issue |
| Parameterized test | Same behavior across modes/options | All graph construction modes |
| Optional dependency test | Feature depends on an extra | STRND, Leiden, UMAP-style integrations |
| Documentation/example test | Public example should keep working | Scripts under examples/ |
Test rules:
- Keep tests deterministic. Set seeds for stochastic algorithms.
- Assert meaningful behavior, not implementation details.
- Test edge cases: empty, tiny, disconnected, dense, sparse, invalid, and high-dimensional inputs.
- Use
pytest.mark.parametrizefor repeated behavior across algorithms or modes. - Do not skip tests unless the reason is unavoidable and visible.
- New optional dependencies need tests that handle missing extras cleanly.
For floating point results:
- Use tolerances with
pytest.approx,numpy.testing, or explicit bounds. - Prefer invariant checks when exact values are unstable.
- Test monotonic or bounded properties where algorithms are approximate.
Performance And Memory
Performance-sensitive changes require evidence. This package works with graph data, so accidental
O(n^2) memory or time growth matters.
Add performance coverage when a change:
- Introduces a new algorithm.
- Changes graph construction, reconstruction, shortest paths, nearest-neighbor logic, or plotting.
- Adds a loop over nodes, edges, pairs, or dimensions.
- Converts sparse matrices to dense arrays.
- Touches large metadata or dataframe operations.
Performance checks should measure the smallest meaningful operation:
def test_graph_summary_benchmark(benchmark, medium_spatial_graph):
result = benchmark(graph_summary, medium_spatial_graph)
assert result["n_nodes"] == medium_spatial_graph.n_nodes
Guidelines:
- Benchmark algorithm calls, not expensive fixture setup.
- Use stable input sizes and explicit parametrization IDs.
- Keep benchmarks separate from normal correctness tests if they are slow.
- Track wall time locally with
pytest-benchmarkor CI performance tooling such as CodSpeed. - Treat benchmark tests as guardrails, not replacements for complexity analysis.
Memory-sensitive code should:
- Preserve sparse representations when possible.
- Avoid
toarray()and dense pairwise matrices unless the input size is intentionally bounded. - Document expected memory complexity for algorithms that scale poorly.
- Include a memory check or profiling note in the PR when memory behavior changes.
If a function cannot support large graphs, make the limit explicit in validation or documentation.
Dependencies And Requirements
Runtime dependencies belong in [project.dependencies] in pyproject.toml. Optional feature
dependencies belong in [project.optional-dependencies].
Rules:
- Do not add a dependency for a small helper that can be implemented clearly in the package.
- Discuss new runtime dependencies before adding them.
- Prefer optional extras for algorithm-specific integrations.
- Keep lower bounds realistic and justified by used APIs.
- Avoid upper bounds unless there is a known incompatibility.
- Update installation docs when extras change.
- Add tests for missing optional dependencies and successful optional paths.
Development-only tools belong in the dev extra unless the project adopts dependency groups later.
Documentation Requirements
Docs must change with behavior. A PR is incomplete if the user-facing behavior changes but the docs still describe the old behavior.
Update docs when:
- Adding public functions, parameters, modes, metrics, or extras.
- Changing defaults.
- Changing return values or metadata.
- Adding known limitations or required assumptions.
- Changing installation or optional dependency behavior.
Documentation should include:
- Minimal runnable examples.
- Parameter meaning when not obvious.
- Return type and important metadata keys.
- Failure modes and optional dependency hints.
- Scientific assumptions where relevant.
Review Checklist
Authors should verify:
- The PR solves one clear problem.
- Public API changes are documented.
- New functions have type hints and docstrings.
- Inputs are validated at public boundaries.
- Tests cover normal, edge, and error cases.
- Random behavior is seeded or controlled.
- Sparse data remains sparse unless dense conversion is justified.
- Performance and memory risks are measured or explained.
- New dependencies are justified and placed in the correct extra.
ruff check .andpytestpass locally.
Reviewers should check:
- The test would fail without the implementation.
- The implementation matches the documented contract.
- Error messages help users fix invalid input.
- The change does not hide expensive work behind convenient defaults.
- The PR does not mix unrelated refactors with feature or bug work.
- New TODOs are actionable and traceable.
PR Template
Use this structure in PR descriptions:
## What changed
## Why
## API impact
## Validation
## Tests run
## Performance/memory impact
## Follow-ups
If performance or memory is not relevant, say why.
Sources Used
This guide follows practices from:
- PyPA's
pyproject.tomlspecification for project metadata, dependencies, optional dependencies, and tool configuration: https://packaging.python.org/specifications/declaring-project-metadata/ - PyPA's
pyproject.tomlguide for using[build-system],[project], and[tool]: https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ - pytest's good integration practices for editable installs,
src/layout, test discovery, and package-oriented testing: https://docs.pytest.org/en/stable/explanation/goodpractices.html - pytest's invocation guide for targeted test runs: https://docs.pytest.org/en/stable/how-to/usage.html
- Scientific Python's development guide for style checks, pre-commit/static checks, and type-checking expectations: https://learn.scientific-python.org/development/guides/style/
- Python's pull request lifecycle guidance for focused PRs, checks before review, and clear communication: https://devguide.python.org/getting-started/pull-request-lifecycle/
- pytest-benchmark documentation for benchmark fixtures, pedantic mode, and benchmark comparison: https://pytest-benchmark.readthedocs.io/en/latest/usage.html
- CodSpeed pytest documentation for CI-oriented Python performance and memory benchmarking: https://codspeed.io/docs/reference/pytest-codspeed