Overview
WhatsApp-Rust is a high-performance, async Rust library for the WhatsApp Web API. The project follows a modular, layered architecture that separates protocol concerns from runtime concerns, enabling platform-agnostic core logic with pluggable backends.Workspace Structure
The project is organized as a Cargo workspace with multiple crates:Three Main Crates
wacore - Platform-Agnostic Core
Location:wacore/
Purpose: Contains core logic for the WhatsApp binary protocol, cryptography primitives, IQ protocol types, runtime abstraction, and state management traits.
Key Features:
- Zero runtime dependencies — no Tokio, no async-std, only
futures,async-trait,async-lock, andasync-channel - 32-bit target support — uses
portable-atomicfor 64-bit atomics with a software fallback on platforms without nativeAtomicU64(ARM32, MIPS, etc.) Runtimetrait for pluggable async executors (Tokio, async-std, WASM, etc.)Transport,TransportFactory, andHttpClienttraits for pluggable networkingBackendtrait for pluggable storage- Cryptographic operations (Signal Protocol, Noise Protocol)
- Type-safe protocol node builders
waproto - Protocol Buffers
Location:waproto/
Purpose: Houses WhatsApp’s Protocol Buffers definitions compiled to Rust structs.
Build Process:
The generated code (whatsapp.rs) is checked into version control. The prost-build dependency is gated behind an optional generate feature, so build.rs is a no-op during normal builds. To regenerate after modifying the .proto file:
Message- All message typesWebMessageInfo- Message metadataHistorySync- Chat historySyncActionValue- App state mutations
whatsapp-rust - Main Client
Location:src/
Purpose: Integrates wacore with concrete implementations (Tokio runtime, SQLite storage, ureq HTTP, Tokio WebSocket), provides the high-level Bot builder and Client API.
Key Features:
TokioRuntime— defaultRuntimeimplementation (gated ontokio-runtimefeature)- Typestate
BotBuilder— compile-time enforcement that all 4 required components are provided - SQLite persistence (pluggable via
Backendtrait) - Event bus system
- Feature modules (groups, media, newsletters, communities, etc.)
Runtime abstraction
The library is fully runtime-agnostic. All async operations go through four pluggable trait abstractions defined inwacore:
| Concern | Trait | Default implementation | Crate |
|---|---|---|---|
| Async runtime | Runtime | TokioRuntime | whatsapp-rust (gated on tokio-runtime) |
| Network transport | TransportFactory + Transport | TokioWebSocketTransportFactory | whatsapp-rust-tokio-transport |
| HTTP client | HttpClient | UreqHttpClient | whatsapp-rust-ureq-http-client |
| Storage | Backend | SqliteStore | whatsapp-rust-sqlite-storage |
Runtime trait requires four methods plus one optional method with a default:
AbortHandle is #[must_use] — dropping the handle aborts the spawned task. Call .detach() on the handle for fire-and-forget tasks that should run to completion independently. See custom backends — AbortHandle for implementation details.
The yield_frequency() method controls how often the client cooperatively yields during tight async loops (such as processing incoming frames). It returns the number of items to process before yielding. The default value is 10. Single-threaded runtimes should return 1 to avoid starving the event loop, while multi-threaded runtimes can use higher values or rely on yield_now() returning None.
On WASM targets, Send bounds are automatically removed via #[cfg(target_arch = "wasm32")].
The BotBuilder uses a typestate pattern with four type parameters <B, T, H, R> (Backend, Transport, HttpClient, Runtime). The build() method is only callable when all four are Provided, making missing-component errors compile-time instead of runtime.
See custom backends for implementing your own runtime, transport, HTTP client, or storage backend.
Key Components
Client
Location:src/client.rs
Purpose: Orchestrates connection lifecycle, event bus, and high-level operations. Uses async-lock (runtime-agnostic) for all internal synchronization instead of Tokio-specific primitives.
- Connection management
- Request/response routing
- Event dispatching
- Session management
PersistenceManager
Location:src/store/persistence_manager.rs
Purpose: Manages all state changes and persistence.
- Never modify
Devicestate directly - Use
DeviceCommand+process_command() - For read-only:
get_device_snapshot()
Signal Protocol
Location:wacore/libsignal/ & src/store/signal*.rs
Purpose: End-to-end encryption via Signal Protocol implementation.
Features:
- Double Ratchet algorithm
- Pre-key bundles
- Session management
- Sender keys for groups
Socket & Handshake
Location:src/socket/, src/handshake.rs
Purpose: WebSocket connection and Noise Protocol handshake.
Flow:
- WebSocket connection
- Noise handshake (XX pattern)
- Encrypted frame exchange
Module Interactions
Layer Responsibilities
wacore Layer (Platform-Agnostic)
- Protocol logic
- State traits
- Cryptographic helpers
- Data models
whatsapp-rust Layer (Runtime)
- Runtime orchestration
- Storage integration
- User-facing API
Protocol Entry Points
Incoming Messages
Flow:src/message.rs → Signal decryption → Event dispatch
Incoming stanzas are decoded as Arc<OwnedNodeRef> (zero-copy from the network buffer) and routed through per-chat message queues:
Outgoing Messages
Flow:src/send.rs → Signal encryption → Socket send
Outgoing stanzas are built as owned Node values via NodeBuilder:
Socket Communication
Flow:src/socket/ → Noise framing → Transport
Connection Lifecycle
Auto-Reconnection
The client implements robust reconnection handling with stream error awareness:is_connected field uses an AtomicBool to track whether the noise socket is established. This avoids a TOCTOU race that previously occurred when try_lock() on the noise socket mutex failed under contention, causing false-negative connection checks and silent ack drops.
Connection timeout: Both the transport connection and version fetch are wrapped in a 20-second timeout (TRANSPORT_CONNECT_TIMEOUT), matching WhatsApp Web’s MQTT and DGW defaults. This prevents dead networks from blocking on the OS TCP SYN timeout (~60-75s). Both operations run in parallel via tokio::join!.
Reconnection flow:
- Connection lost →
cleanup_connection_state()(see disconnect cleanup) - Check
enable_auto_reconnect→ exit if disabled (401, 409, 516 disable this) - Check
expected_disconnect→ immediate reconnect if expected (e.g., 515) - Calculate Fibonacci backoff delay (1s, 1s, 2s, 3s, 5s, 8s… max 900s with +/-10% jitter)
- Wait → attempt reconnection (with 20s connect timeout)
Disconnect cleanup
When a connection is lost ordisconnect() is called, cleanup_connection_state() resets all connection-scoped state to prevent stale data from leaking into the next connection. This cleanup runs exactly once — in the run() method after the message loop exits — rather than being duplicated inside the message loop on transport disconnect events:
| Resource | Action | Reason |
|---|---|---|
| Transport, events, noise socket | Set to None | Release connection resources |
is_connected | Set to false (Release ordering) | After socket is None so no task sees connected with a cleared socket |
chat_lanes | Invalidated | Drop per-chat queue senders so workers exit via channel close — prevents stale workers from the old connection surviving reconnects with outdated signal/crypto state |
pending_retries | Cleared | Stale keys from detached scope guard cleanup would otherwise suppress the first retry after reconnect |
signal_cache | Cleared | Prevents stale signal state from leaking across connections |
response_waiters | Drained | Pending IQ waiters fail fast with InternalChannelClosed instead of hanging until the 75s timeout |
| Offline sync state | Reset | Counters, timing, and semaphore replaced with fresh single-permit instance |
| Dead-socket timestamps | Reset to 0 | Prevents stale values from triggering an immediate reconnect on the next connection |
app_state_key_requests, app_state_syncing | Replaced with empty maps | Prevents unbounded growth across reconnections |
Chat lane invalidation is critical for correctness. Without it, stale message processing workers from the previous connection survive reconnects, holding outdated Signal session state that causes decryption failures on the new connection.
- 401 (unauthorized): Disables auto-reconnect, emits
LoggedOut - 409 (conflict): Disables auto-reconnect, emits
StreamReplaced - 429 (rate limited): Adds 5 extra Fibonacci steps to backoff, then reconnects
- 515 (expected): Immediate reconnect without backoff
- 516 (device removed): Disables auto-reconnect, emits
LoggedOut
Message loop (read loop)
Theread_messages_loop runs on the run() caller’s task and uses select_biased! to multiplex shutdown signals with transport events. Frame decryption is sequential (noise counter ordering), but node processing uses a hybrid inline/concurrent strategy:
- Inline:
success,failure,stream:error(connection state),message(arrival order for per-chat queues),ib(offline sync tracking) - Spawned concurrently: all other stanzas (receipts, notifications, presence, etc.)
last_data_received_ms so the keepalive loop sees the batch completion time rather than the arrival time — preventing false-positive dead-socket triggers during large offline sync batches. The loop also cooperatively yields every yield_frequency() frames to avoid starving other tasks.
See WebSocket & Noise Protocol - Message loop for implementation details.
Keepalive loop
The keepalive loop runs as a separate spawned task, fully decoupled from the read loop. This ensures keepalive pings are never blocked by frame processing — even during large offline sync batches that take seconds to drain. The two loops communicate solely through atomic timestamps (last_data_received_ms, last_data_sent_ms).
- Sends ping every 15-30 seconds (randomized, matching WA Web’s
15 * (1 + random())) - Skips ping if data was received within the minimum interval (connection proven alive)
- Sends ping before dead-socket check to prevent false-positive reconnects on idle-but-healthy connections
- Waits up to 20s for response
- Checks dead socket on every tick (not just after failures) — catches scenarios where pending IQs caused the ping to be skipped, or where the ping succeeded but the connection died immediately after
- Detects dead socket if no data received for 20s after a send, triggering immediate reconnection
- Fatal errors (
Socket,Disconnected,NotConnected,InternalChannelClosed) cause the keepalive loop to exit immediately - Error classification is exhaustive and compile-time enforced — adding a new error variant without handling it causes a build failure
Offline sync
When reconnecting, the client tracks offline message sync progress:- Receive
<ib><offline_preview count="N"/>→ start tracking, reset counters - Process messages with
offlineattribute → increment counter - Receive
<ib><offline/>→ sync complete - Emit
OfflineSyncCompletedevent
offline_preview but never sends the end marker (<ib><offline/>), the client applies a 60-second timeout (matching WhatsApp Web’s OFFLINE_STANZA_TIMEOUT_MS). On timeout, the client logs a warning, marks sync as complete, and resumes normal operation so startup is not blocked indefinitely.
Concurrency gating: During offline sync, the client restricts message processing to a single concurrent task (1 semaphore permit) to preserve ordering. Once sync completes — either by the server end marker, all expected items arriving, or timeout — the semaphore is expanded to 64 permits, switching to parallel message processing.
Semaphore transition safety: When the semaphore is swapped from 1 to 64 permits, tasks that were already waiting on the old semaphore must not be silently dropped. The client uses a generation-checked re-acquire loop to handle this transition safely:
- Each semaphore swap increments an atomic
message_semaphore_generationcounter - When a task acquires a permit, it checks whether the generation has changed since it started waiting
- If the generation changed (meaning the semaphore was swapped while the task was blocked), the task drops the stale permit and re-acquires from the new semaphore
- This loop continues until the task holds a permit from the current-generation semaphore
pkmsg messages (which carry Sender Key Distribution Messages for group chats) could be silently dropped during the offline-to-online transition. Without this safety mechanism, a dropped pkmsg would cause all subsequent skmsg messages from that sender to fail with NoSenderKeyState, since the SKDM they depended on was never processed.
State reset: On reconnect or cleanup, all offline sync state is reset (counters, timing, and the semaphore is replaced with a fresh single-permit instance) so stale state does not leak into the next connection attempt.
Deferred device sync
During offline sync, the client may receive group messages from devices not yet present in the local device registry (for example, a companion device that was paired while the client was offline). Rather than firing a network request for each unknown device individually, the client batches these into aPendingDeviceSync set.
Flow:
- During offline message processing,
is_from_known_device()detects an unrecognized sender device - The sender’s user JID is added to
PendingDeviceSync(deduplicated — each user is queued at most once) - A retry receipt is sent so the sender will redeliver the message after the device list is updated
- When
<ib><offline/>arrives (offline sync complete), the client waits 2 seconds (OFFLINE_DEVICE_SYNC_DELAY, matching WhatsApp Web) - All batched user JIDs are flushed in a single bulk usync request via
flush_pending_device_sync() - If the flush fails, the JIDs are re-enqueued for the next attempt
PendingDeviceSync state is cleared on reconnect to prevent stale entries from leaking across connections.
Location: src/pending_device_sync.rs, src/handlers/ib.rs, src/usync.rs
See also: Unknown device detection for the detection mechanism during group message decryption.
History Sync Pipeline
History sync transfers chat history from the phone to the linked device. The pipeline is designed for minimal RAM usage through a multi-layered zero-copy strategy.Processing flow
RAM optimization layers
- Heuristic pre-allocation with
compressed_size_hint— the decompression buffer is pre-allocated using a 4x multiplier on the compressed blob’sfile_length(clamped to 256 bytes – 8 MiB). When the notification providesfile_length, this avoids repeatedVecreallocation during decompression. The hint comes from the decrypted (but still compressed) blob size, which is a better estimate than the encrypted size that includes MAC/padding overhead - Immediate drop of compressed data — after decompression, the compressed input is dropped so peak memory equals
max(compressed, decompressed)rather than both combined - Hand-rolled protobuf parser — instead of decoding the entire
HistorySyncmessage tree (which allocates every nested message), the core walks varint tags manually and only extracts field 2 (conversations) and field 7 (pushnames) Byteszero-copy slicing — decompressed data is wrapped in a reference-countedBytesbuffer; each conversation is extracted asbuf.slice(pos..end), which is an Arc refcount increment with no per-conversation heap allocation- Bounded channel streaming — an
async_channel::bounded::<Bytes>(4)streams conversation bytes from the blocking parser thread to the async event dispatcher, providing backpressure with only ~4 conversations in-flight LazyHistorySyncwrapper — the decompressed blob is wrapped in aLazyHistorySyncwith cheap metadata (sync type, chunk order, progress) available without decoding. Full protobuf decoding only happens if the event handler calls.get(). Each clone parses independently (plainOnceLock, notArc-wrapped) since the common case is a single handler. Consumers can use.raw_bytes()to access the raw protobuf bytes for custom partial decoding- Compile-time callback elimination — when no event handlers are registered, the callback is
None, causing the parser to skip conversation extraction entirely at the protobuf level
Skip mode
For bots that don’t need chat history,skip_history_sync() sends a receipt so the phone stops retrying uploads but downloads nothing. See Bot - History Sync.
Concurrency Patterns
Per-Chat Lanes
Prevents race conditions where a later message is processed before the PreKey message. Each chat gets a lane combining an enqueue lock and a bounded channel (capacity 500 messages) into a single cached entry, providing backpressure to prevent memory amplification when many chats are active simultaneously:Per-Device Session Locks
Prevents concurrent Signal protocol operations on the same session. Each device JID gets its own lock, keyed by protocol address strings generated byto_protocol_address_string() (format: user[:device]@server.0):
WAWebSendUserMsgJob and WAWebDBDeviceListFanout behavior. The local registry is checked first; a network fetch is only triggered on a cache miss to avoid unnecessary LID-migration side effects. Session locks are acquired for all involved devices in sorted order to prevent deadlocks. The build_session_lock_keys() helper resolves encryption JIDs (normalizing the recipient to bare form via to_non_ad()), sorts by (server, user, device) using cmp_for_lock_order(), and deduplicates. The session_mutexes_for() helper then converts the sorted JIDs to session mutexes, reusing a single String buffer to avoid per-JID heap allocations.
The peer message path (single-device) acquires a single lock for the resolved encryption JID. Group messages do not hold client-level session locks — each participant device is encrypted separately inside prepare_group_stanza. Group stanza preparation uses sort_dedup_by_user() to deduplicate participants before device resolution, and sort_dedup_by_device() to deduplicate resolved device JIDs after LID conversion — both operate in-place on sorted Vec<Jid> without HashSet allocations.
Background Saver
Periodic persistence with dirty flag optimization:Feature Organization
Location:src/features/
src/download.rs and src/upload.rs as separate top-level modules.
Pattern: Features are accessed through accessor methods on Client:
State Management Flow
Best Practices
State Management
Async Operations
Error Handling
Related Sections
Authentication
Learn about QR code and pair code flows
Events
Understand the event system and handlers
Storage
Explore storage backends and state management
Getting Started
Build your first WhatsApp bot