Rust Systems & Services
Covers modern application-layer Rust (edition 2024): CLIs, web services, libraries. Not no_std/embedded.
Tooling
| Tool | Purpose |
|---|
cargo | Build, dep management, script runner |
clippy | Lint (cargo clippy --workspace --all-targets -- -D warnings) |
rustfmt | Formatter (cargo fmt --all) |
cargo-nextest | Test runner, noticeably faster than cargo test, better isolation |
cargo-deny | License + advisory + duplicate-dep checks |
cargo-machete | Find unused dependencies |
- Pin
rust-toolchain.toml per repo so every contributor and CI uses the same compiler.
cargo update -p <crate> for single-package upgrades. cargo update rewrites everything — avoid in PR diffs.
Cargo.lock goes in version control for binaries and libraries (modern guidance; reproducibility wins).
Workspaces
Multi-crate projects use a workspace with layered crates. Dependencies point inward only.
Cargo.toml # [workspace] members + [workspace.dependencies]
crates/
protocol/ # Shared types, no deps on other workspace crates
storage/ # Persistence, depends on protocol
service/ # Business logic, depends on protocol + storage
cli/ # Binary, depends on everything
-
Centralize versions in [workspace.dependencies], reference as foo = { workspace = true } in members.
-
Keep the leaf-most crate (protocol / types) dependency-free so every other crate can depend on it without cycles.
-
Feature flags belong on the crate that introduces the dependency, not re-exported through the workspace root.
-
Library crates expose one stable facade: a thin lib.rs with a //! module doc comment stating purpose, followed by pub use re-exports of the public surface. Consumers learn one import path per concept; internal module layout can be reorganized without breaking callers.
-
Feature gates must error, never silently degrade. If runtime config requests a capability the binary wasn't compiled with (e.g. device = "gpu" on a non-CUDA build), fail at startup with a clear error. Silent fallback produces different behavior from what the operator configured, often without anyone noticing.
-
Centralize lints at the workspace root with [workspace.lints.*]. Every member crate inherits the same ruleset — no drift between crates, no per-crate #![deny(...)] stacks. Example:
[workspace.lints.rust]
unsafe_code = "warn"
missing_docs = "warn"
[workspace.lints.clippy]
all = { level = "warn", priority = -1 }
pedantic = { level = "warn", priority = -1 }
nursery = { level = "warn", priority = -1 }
module_name_repetitions = "allow"
must_use_candidate = "allow"
Each member crate opts in with [lints] workspace = true in its own Cargo.toml. Changing a lint in one place updates every crate.
Build Profiles
When tuning Cargo build profiles (release LTO, release-dbg symbols, release-min for distributable binaries) or adding dev-machine speedups (mold linker, target-cpu=native, share-generics), load build-profiles.md.
Error Handling
Split by crate role:
- Libraries / lower crates: define typed errors with
thiserror. Consumers can pattern-match.
- Binaries / top-level crates: use
anyhow::Result with .context("what was being attempted"). Human-readable error chains.
- Never return
Box<dyn Error> from library APIs — it erases variant information.
- Use
? liberally. Never .unwrap() or .expect() outside tests and main. An expect("...") is acceptable only when the invariant is provably upheld and the message explains why.
- Convert at boundaries:
#[from] on thiserror variants for auto-conversion; .map_err(MyError::from) when explicit.
bail!("...") / ensure!(cond, "...") in application code for early exits.
- Prefer
Result<T, E> over panics for any recoverable error. Panics are for programmer bugs (broken invariants), not runtime failures.
#[must_use] on fallible APIs: annotate functions returning Result or newtype-wrapped results that callers frequently ignore. Catches let _ = validate(x); at compile time instead of shipping a silently-dropped error.
Ownership Discipline
-
Take &str over &String, &[T] over &Vec<T> in function signatures — accepts more call sites for free.
-
Return owned (String, Vec<T>) from constructors and public APIs. Borrow in hot paths where lifetimes are obvious.
-
Reach for Arc<T> only when sharing across threads. Single-threaded sharing uses Rc<T> or references.
-
Cow<'_, str> when a function sometimes allocates and sometimes borrows (e.g. normalization).
-
Lifetime elision handles 90% of cases. If you're writing 'a in more than one signature, reconsider whether that type should own its data instead.
-
bytes::Bytes for zero-copy slicing of shared immutable buffers — network parsers, frame decoders, protocol handlers. BytesMut for building buffers that split_to / split_off into Bytes without reallocation. Prefer Bytes over Arc<Vec<u8>> when slicing is the dominant access pattern.
-
Reduce hot-path heap allocations with stack-or-inline collections when the typical size is small and known:
smallvec::SmallVec<[T; N]> — inline for ≤N items, spills to heap beyond. Good for "usually 1-8 items" cases like parsed tag lists, lookup keys, small event batches.
arrayvec::ArrayVec<T, CAP> — fixed capacity, never heap-allocates. Returns an error when full. Good for bounded message buffers or per-request scratch space.
- String interning for repeatedly-seen strings (enum-like values parsed from config, tenant IDs, route keys):
dashmap::DashMap<String, &'static str> with Box::leak on miss gives &'static str comparisons without per-call allocations.
These are optimizations — profile first. Vec/String on a cold path isn't the bottleneck.
Async with Tokio
- Default runtime:
#[tokio::main] with features = ["full"] for apps; features = ["rt", "macros", "sync"] for libraries that need to stay slim.
tokio::spawn for independent tasks. JoinSet for a dynamic group you'll await together with cancellation.
tokio::select! for racing futures (timeouts, cancellation, first-wins).
- Never block the runtime:
tokio::task::spawn_blocking for sync CPU work or blocking I/O libs.
tokio::sync::Mutex only when the guard must be held across .await. Otherwise std::sync::Mutex is faster.
tokio::sync::RwLock when reads dominate writes (config snapshots, route tables, hot caches). Many readers proceed in parallel; Mutex serializes them. For snapshot-swap semantics (rarely-updated config), arc-swap::ArcSwap is faster still — no lock on the read path.
- Cancellation:
CancellationToken (from tokio-util) propagates shutdown. Long-running tasks must check it.
- Backpressure via bounded
mpsc channels — unbounded channels hide memory growth until OOM.
Semaphore for hard concurrency limits on spawn paths that don't fit a channel model (e.g. "at most 50 concurrent outbound HTTP calls"). let _permit = sem.acquire().await?; inside the task; dropping the permit releases the slot. Pair with Arc<Semaphore> shared across spawners.
- Don't mix async runtimes. Pick
tokio and stick with it; async-std and smol don't interop cleanly.
CLI Tools (clap)
- Use the derive API:
#[derive(Parser)] + #[derive(Subcommand)]. Less boilerplate, types drive the help text.
- One
enum Commands variant per subcommand; flatten shared flags into a #[command(flatten)] struct CommonArgs.
--json flag on query commands for agent/pipe consumption. Emit via serde_json::to_string(&value)?.
- Exit codes: 0 success, 1 for errors
main returned, 2 for argparse (clap handles this), reserve 3+ for domain meanings documented in --help.
- Provide
--version automatically via #[command(version)].
See cli-tools.md for config layering, logging setup, progress reporting, and shell completions.
HTTP Services (axum)
- Framework default: axum (tokio-native, tower middleware, extractor-based handlers). Pick
actix-web only if an existing codebase uses it.
- Handlers return
Result<impl IntoResponse, AppError>. Implement IntoResponse for AppError to centralize error → status mapping.
- Validate input at the boundary:
axum::extract::Json<T> where T: Deserialize + Validate (use validator crate). Internal services trust input was validated.
- Share state via
State<Arc<AppState>> — not globals, not lazy_static.
- Middleware via
tower::ServiceBuilder: tracing → timeout → auth → CORS → handler. Order matters.
- Resilience layer stack (outbound HTTP clients and shared services):
ServiceBuilder::new().layer(TimeoutLayer).layer(RateLimitLayer).layer(ConcurrencyLimitLayer).layer(LoadShedLayer).layer(RetryLayer).service(client). Name each layer explicitly — LoadShedLayer sheds excess load, ConcurrencyLimitLayer caps in-flight requests, RateLimitLayer bounds request rate, RetryLayer retries classified transient errors. Combining LoadShedLayer + ConcurrencyLimitLayer produces proper backpressure instead of unbounded queueing.
See axum-service.md for project layout, extractors, error types, graceful shutdown, and OpenAPI generation.
Concurrency
| Workload | Approach |
|---|
| Independent async I/O | tokio::spawn + JoinSet or futures::join! |
| Data-parallel CPU work | rayon with par_iter |
| Shared mutable state across threads | Arc<Mutex<T>> or Arc<RwLock<T>>, smallest scope possible |
| Single-producer pipelines | tokio::sync::mpsc (async) or std::sync::mpsc (sync) |
| Broadcast / fan-out | tokio::sync::broadcast |
rayon and tokio coexist — use tokio::task::spawn_blocking to call a rayon pool from async code. Never call .block_on() from inside a tokio task; it deadlocks the runtime.
Testing
- Built-in
#[test]. Prefer cargo nextest run --workspace over cargo test — it runs tests in parallel processes with proper isolation.
- Unit tests live in
mod tests { ... } at the bottom of the file (access to private items).
- Integration tests in
tests/ directory. One file per public surface area.
#[tokio::test] for async tests. Add flavor = "multi_thread" when the code under test spawns tasks.
rstest for parametrized tests and fixtures. proptest / quickcheck for property-based tests on pure logic.
insta for snapshot testing CLI output, serialization, large structs. Review diffs with cargo insta review.
assert_cmd + predicates for CLI integration tests (invokes the binary, asserts on stdout/stderr/exit code).
- Assert on error variants with
matches!: assert!(matches!(result.unwrap_err(), MyError::Validation(_))). Cleaner than match arms when the test only cares whether the error is the right kind, and doesn't force updates when unrelated variants are added.
- Coverage:
cargo llvm-cov --workspace --html. Target 70%+ on application code, higher on library crates.
- Fuzzing for parsers:
cargo fuzz + libfuzzer-sys on any code that parses untrusted input (file formats, protocols, query languages). A short nightly fuzz run surfaces the panics and UB that unit tests miss.
For generic test discipline (anti-patterns, mock rules, rationalization resistance), see the ia-writing-tests skill.
Unsafe Discipline
- Default: no
unsafe. If clippy flags it, don't #[allow] it — refactor.
- Every
unsafe block gets a // SAFETY: comment above it explaining why each invariant holds. No comment = reviewer rejects.
- Keep
unsafe blocks minimal — wrap in a safe abstraction at module boundary, mark the module pub(crate).
- Use
miri (cargo +nightly miri test) on any crate containing unsafe or raw pointer arithmetic — catches UB that optimizers mask.
- Prefer
bytemuck, zerocopy, bytes over hand-rolled transmutes for zero-copy patterns.
Production Resilience
When productionizing a service (config validation, /health + /ready endpoints, graceful shutdown, retries/timeouts/jitter, connection pools, diagnostic secret redaction), load production-resilience.md.
Observability
For logging (tracing + tracing-subscriber with init recipe), #[instrument] spans, correlation IDs, metrics, and distributed tracing patterns, load observability.md. Never use println! or log:: in new code.
CI
General CI design lives with the ia-infrastructure-engineer agent. For Rust-specific callouts (rustsec/audit-check, cargo-llvm-cov, Swatinem/rust-cache, taiki-e/install-action, matrix coverage guidance, doc-test step), load ci-pipeline.md.
Discipline
- Simplicity first — every change as simple as possible, impact minimal code.
- Only touch what's necessary — avoid unrelated changes in a PR.
- No
#[allow(clippy::...)] as a shortcut — fix the underlying issue. Document exceptions with a rationale.
- Before adding a trait or generic, verify it's used in 3+ places. Otherwise a concrete type is clearer.
- Verify: see Verify section — pass all checks with zero warnings before declaring done.
Verify
cargo fmt --all -- --check passes with zero diffs
cargo clippy --workspace --all-targets --all-features -- -D warnings passes
cargo nextest run --workspace (or cargo test --workspace) passes with zero failures
cargo deny check passes (licenses, advisories, duplicates) for any crate going to production
- No new
unsafe without // SAFETY: comment
References
- cli-tools.md — clap patterns, config layering, tracing setup, progress, shell completions
- axum-service.md — project layout, extractors, error types, graceful shutdown, testing
- build-profiles.md — release/release-dbg/release-min profiles, mold linker, dev compile speedups
- ci-pipeline.md — Rust-specific CI steps (cargo audit, llvm-cov, rust-cache, matrix strategy, doc tests)
- production-resilience.md — fail-fast config, health/ready endpoints, graceful shutdown, retries, timeouts, connection pools
- observability.md — tracing init recipe, span instrumentation, correlation IDs, metrics, distributed tracing