Overview
WhatsApp uses a custom binary protocol for all communication between clients and servers. This format is significantly more compact than JSON or XML and optimized for mobile network conditions. The protocol encodes messages as nodes - hierarchical structures with tags, attributes, and content. All nodes are serialized to binary format before encryption and transmission.Architecture
The binary protocol implementation is inwacore/binary/, a platform-agnostic crate:
Node Structure
Node Definition
A node represents a protocol message or message component:wacore/binary/src/node.rs:308-314
Attributes
Attributes are stored as key-value pairs with specialized value types:15551234567@s.whatsapp.net appear frequently in the protocol. Storing them as structured data avoids repeated parsing/formatting overhead:
wacore/binary/src/node.rs:10-112, wacore/binary/src/jid.rs
Example Node
Token Dictionary
The protocol uses a token dictionary to compress common strings into single bytes.Token Types
wacore/binary/src/token.rs
Dictionary Lookup
Common protocol strings are mapped to single-byte tokens:- Protocol tags (“message”, “iq”, “presence”)
- Common attributes (“id”, “type”, “to”, “from”)
- Frequent values (“text”, “chat”, “available”)
Multi-byte Tokens
Less common strings use two-byte tokens:wacore/binary/src/token.rs:200-300
Encoding Process
Marshal Functions
wacore/binary/src/marshal.rs:31-76
Encoding Strategy
The encoder uses multiple strategies based on data characteristics:wacore/binary/src/encoder.rs:227-237
Packed Encoding
Nibble Packing (Numeric Strings)
Strings containing only digits, dash, and dot are packed into 4 bits per character:wacore/binary/src/encoder.rs:769-777
Hex Packing
Uppercase hex strings (0-9, A-F) are packed into 4 bits per character:wacore/binary/src/encoder.rs:780-787
SIMD Optimization
The encoder uses SIMD instructions for fast packing of long strings:wacore/binary/src/encoder.rs:809-824
JID Encoding
JIDs have special compact encodings:JID_PAIR (Standard JID)
wacore/binary/src/encoder.rs:706-715
AD_JID (Device-Specific JID)
wacore/binary/src/encoder.rs:699-705
List Encoding
Lists (including node structures) have length-prefixed encoding:wacore/binary/src/encoder.rs:865-876
Node Encoding Format
A complete node is encoded as:list_len = 1 (tag) + (num_attrs * 2) + (content ? 1 : 0)
wacore/binary/src/encoder.rs:879-889
Decoding Process
Decoder Structure
wacore/binary/src/decoder.rs
Zero-Copy Decoding
The decoder usesNodeRef<'a> to avoid allocations:
wacore/binary/src/node.rs:316-321, 288-293
Unpacking
Reverse of the packing process:wacore/binary/src/decoder.rs:400-450
Performance Optimizations
Two-Pass Encoding
For large or variable-size payloads, exact size calculation prevents buffer growth:wacore/binary/src/marshal.rs:67-76
String Hint Cache
Repeated strings (like JIDs) are analyzed once and cached:wacore/binary/src/encoder.rs:240-282
Capacity Estimation
Auto-sizing strategy samples node structure to estimate capacity:wacore/binary/src/marshal.rs:167-200
Common Protocol Patterns
IQ (Info/Query) Stanzas
Messages
Receipts
Wire Format Examples
Simple Message
Message with Body
Debugging Tools
Inspecting Encoded Data
Useevcxr REPL for interactive exploration:
Error Handling
wacore/binary/src/error.rs
Related Components
- Signal Protocol - How messages are encrypted before marshaling
- WebSocket Handling - How binary data is framed and transmitted
- State Management - Protocol state stored in Device
References
- Source:
wacore/binary/src/ - Token dictionary:
wacore/binary/src/token.rs - Node builder:
wacore/binary/src/builder.rs