Overview
WhatsApp uses a custom binary protocol for all communication between clients and servers. This format is significantly more compact than JSON or XML and optimized for mobile network conditions.
The protocol encodes messages as nodes - hierarchical structures with tags, attributes, and content. All nodes are serialized to binary format before encryption and transmission.
Architecture
The binary protocol implementation is in wacore/binary/, a platform-agnostic crate:
wacore/binary/src/
├── marshal.rs # Serialization entry points
├── encoder.rs # Binary encoding logic
├── decoder.rs # Binary decoding logic
├── node.rs # Node data structures
├── token.rs # Token dictionary
├── jid.rs # JID (identifier) handling
└── builder.rs # Fluent API for node construction
Node Structure
Node Definition
A node represents a protocol message or message component:
use compact_str::CompactString;
use std::borrow::Cow;
pub struct Node {
pub tag: Cow<'static, str>, // e.g., "message", "receipt", "iq"
pub attrs: Attrs, // Key-value attributes
pub content: Option<NodeContent>, // Optional content
}
pub enum NodeContent {
Bytes(Vec<u8>), // Binary payload
String(CompactString), // Text payload
Nodes(Vec<Node>), // Child nodes
}
The tag field uses Cow<'static, str> so that known protocol tags (like "message", "iq", "receipt") are borrowed as zero-allocation static references from the token dictionary, while unknown tags fall back to an owned String.
Location: wacore/binary/src/node.rs:459
Attributes
Attributes are stored as key-value pairs with specialized value types:
use compact_str::CompactString;
pub enum NodeValue {
String(CompactString),
Jid(Jid), // Optimized for WhatsApp identifiers
}
pub struct Attrs(pub Vec<(Cow<'static, str>, NodeValue)>);
Like Node.tag, attribute keys use Cow<'static, str> so that common protocol attribute names (like "id", "type", "to", "from") reference static memory from the token dictionary rather than allocating on the heap.
NodeValue API
NodeValue provides exactly two methods for accessing the underlying value, regardless of variant:
use std::borrow::Cow;
// Get a string view of the value (works for both variants)
// - String variant: Cow::Borrowed(&str) — zero copy
// - Jid variant: Cow::Owned(formatted) — allocates only when needed
pub fn as_str(&self) -> Cow<'_, str>
// Convert to an owned Jid, parsing from string if necessary
// - Jid variant: clones the Jid directly
// - String variant: attempts to parse, returns None on failure
pub fn to_jid(&self) -> Option<Jid>
This simplified API means you never need to match on the variant directly — use as_str() when you need the value as text, and to_jid() when you need a structured JID:
// Reading an attribute value as a string
let msg_type = node.attrs.get("type")
.map(|v| v.as_str().into_owned());
// Reading an attribute value as a JID
let recipient = node.attrs.get("to")
.and_then(|v| v.to_jid());
NodeValue also implements PartialEq<str> for zero-allocation comparisons — the Jid variant compares byte-by-byte against the formatted string without allocating.
Location: wacore/binary/src/node.rs:39-58
Why Jid as a separate type?
JIDs (Jabber IDs) like 15551234567@s.whatsapp.net appear frequently in the protocol. Storing them as structured data avoids repeated parsing/formatting overhead:
use compact_str::CompactString;
pub struct Jid {
pub user: CompactString, // "15551234567"
pub server: Server, // Server::Pn (s.whatsapp.net)
pub agent: u8, // Agent byte (parsed from JID string)
pub device: u16, // Device ID (0 for primary)
pub integrator: u16, // Integrator ID (used with interop server)
}
The user field uses CompactString (re-exported from compact_str) instead of String. CompactString stores short strings inline (up to 24 bytes on 64-bit platforms) without heap allocation, which benefits typical phone numbers and user identifiers. The library re-exports it as wacore_binary::CompactString and whatsapp_rust::CompactString for convenience. CompactString implements From<&str>, From<String>, and Deref<Target = str>, so it works as a drop-in replacement in most contexts — but code that relied on Jid.user being a String (e.g., passing it to functions expecting &String or calling String-specific methods) may need updating.
Server enum
The server field is a Server enum (#[repr(u8)]) that maps to the wire protocol’s AD_JID domain type. This replaces the previous Cow<'static, str> string representation, eliminating all heap allocation for server identifiers and enabling match-based dispatch instead of string comparisons:
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(u8)]
pub enum Server {
#[default]
Pn = 0, // s.whatsapp.net — Standard phone-number JIDs
Lid = 1, // lid — Linked Identity JIDs
Group = 2, // g.us — Group chat JIDs
Broadcast = 3, // broadcast — Broadcast lists and status
Newsletter = 4, // newsletter — Newsletter / channel JIDs
Hosted = 5, // hosted — Cloud API business devices (phone-based)
HostedLid = 6, // hosted.lid — Cloud API business devices (LID-based)
Messenger = 7, // msgr — Messenger interop JIDs
Interop = 8, // interop — Cross-platform interop JIDs
Bot = 9, // bot — Bot JIDs
Legacy = 10, // c.us — Legacy user server (pre-multidevice)
}
Server implements Display (returns the wire string like "s.whatsapp.net"), as_str() for zero-cost string access, TryFrom<&str> for parsing, Serialize/Deserialize (as the wire string), and PartialEq<str> / PartialEq<&str> for backward-compatible string comparisons:
let server = Server::Pn;
assert_eq!(server.as_str(), "s.whatsapp.net");
assert!(server == "s.whatsapp.net"); // PartialEq<str> for backward compat
let parsed = Server::try_from("g.us").unwrap();
assert_eq!(parsed, Server::Group);
If you previously compared jid.server to string constants like "s.whatsapp.net", the PartialEq<str> impl on Server preserves backward compatibility. However, match on the enum variant is preferred for exhaustiveness checking and performance.
String constants are still available for backward compatibility and use in non-JID contexts:
| Constant | Value | Server variant |
|---|
DEFAULT_USER_SERVER | s.whatsapp.net | Server::Pn |
HIDDEN_USER_SERVER | lid | Server::Lid |
GROUP_SERVER | g.us | Server::Group |
BROADCAST_SERVER | broadcast | Server::Broadcast |
NEWSLETTER_SERVER | newsletter | Server::Newsletter |
HOSTED_SERVER | hosted | Server::Hosted |
HOSTED_LID_SERVER | hosted.lid | Server::HostedLid |
MESSENGER_SERVER | msgr | Server::Messenger |
INTEROP_SERVER | interop | Server::Interop |
BOT_SERVER | bot | Server::Bot |
LEGACY_USER_SERVER | c.us | Server::Legacy |
Typed constructors
Convenience constructors avoid specifying the server directly:
Jid::pn("15551234567") // @s.whatsapp.net, device 0
Jid::lid("ABC123") // @lid, device 0
Jid::group("12345678") // @g.us
Jid::newsletter("12345678") // @newsletter
Jid::pn_device("15551234567", 1) // @s.whatsapp.net, device 1
Jid::lid_device("ABC123", 2) // @lid, device 2
Jid::status_broadcast() // status@broadcast
Jid::new("user", Server::Pn) // arbitrary server variant
Borrowing types
For zero-allocation lookups and comparisons, the protocol also provides:
JidRef<'a> — a borrowing version of Jid where user is NodeStr<'a> (borrowed or inline) and server is the Server enum (already Copy). Used for zero-copy decoded JIDs in NodeRef attributes
DeviceKey<'a> — a lightweight key containing (&'a str, &'a str, u16) for user/server/device, used for HashSet lookups without cloning
Location: wacore/binary/src/node.rs:10-112, wacore/binary/src/jid.rs
JidExt trait
The JidExt trait provides type-checking methods on JIDs. It is implemented for Jid, JidRef, and other borrowing types so you can inspect a JID’s server type without string comparisons:
use wacore_binary::jid::{Jid, JidExt};
let jid = Jid::pn("15551234567");
assert!(jid.is_ad()); // true — s.whatsapp.net is an AD server
assert!(!jid.is_group()); // false — not a @g.us JID
| Method | Returns true when |
|---|
is_ad() | Server is Pn, Lid, Hosted, or HostedLid and device > 0 |
is_group() | Server is Group |
is_broadcast_list() | Server is Broadcast and user is not "status" |
is_status_broadcast() | User is "status" and server is Broadcast |
is_newsletter() | Server is Newsletter |
is_hosted() | Device is 99, or server is Hosted / HostedLid |
is_bot() | Server is Bot, or phone number starts with known bot prefixes |
is_interop() | Server is Interop and integrator > 0 |
is_messenger() | Server is Messenger and device > 0 |
is_empty() | User is empty |
is_same_user_as(other) | Both JIDs share the same user |
The trait also exposes basic accessor methods (user() -> &str, server() -> Server, device() -> u16, integrator() -> u16) that work uniformly across owned and borrowed JID types. Additional helper methods is_pn() and is_lid() are available directly on Jid for the most common server checks.
Location: wacore/binary/src/jid.rs:304-363
NodeBuilder API
The NodeBuilder provides a fluent chaining API for constructing nodes. All setter methods consume and return Self:
use wacore_binary::builder::NodeBuilder;
use wacore_binary::jid::Jid;
let message = NodeBuilder::new("message")
.attr("to", "15551234567@s.whatsapp.net")
.attr("type", "text")
.attr("id", "ABCD1234")
.children(vec![
NodeBuilder::new("body").string_content("Hello, world!").build(),
])
.build();
Available methods
| Method | Signature | Description |
|---|
new | new(tag: &'static str) -> Self | Create a builder with a static tag (zero-alloc) |
new_dynamic | new_dynamic(tag: String) -> Self | Create a builder with a dynamic tag |
attr | attr(self, key: &'static str, value: impl Into<NodeValue>) -> Self | Add a string attribute (zero-alloc key) |
jid_attr | jid_attr(self, key: &'static str, jid: Jid) -> Self | Add a JID attribute without stringifying |
attrs | attrs(self, attrs: impl IntoIterator<Item = (&'static str, V)>) -> Self | Bulk-add attributes from an iterator |
children | children(children: impl IntoIterator<Item = Node>) -> Self | Set child nodes as content |
bytes | bytes(bytes: impl Into<Vec<u8>>) -> Self | Set raw bytes as content |
string_content | string_content(s: impl Into<CompactString>) -> Self | Set string as content |
apply_content | apply_content(content: Option<NodeContent>) -> Self | Set arbitrary content |
build | build(self) -> Node | Consume the builder and produce a Node |
The new and attr methods accept &'static str for tags and keys, which creates Cow::Borrowed values on the owned Node with zero heap allocation. For rare cases where the tag is computed at runtime, use new_dynamic.
Location: wacore/binary/src/builder.rs
jid_attr vs attr
The jid_attr method stores JIDs as NodeValue::Jid(jid) directly in the attribute map, avoiding the allocation cost of jid.to_string(). Use jid_attr for JID-valued attributes like to, from, and participant on hot paths:
// Prefer jid_attr for JID attributes — avoids string allocation
let receipt = NodeBuilder::new("receipt")
.attr("id", &message_id)
.jid_attr("to", chat_jid.clone())
.jid_attr("participant", sender_jid.clone())
.build();
// Equivalent but allocates a string per JID
let receipt = NodeBuilder::new("receipt")
.attr("id", &message_id)
.attr("to", chat_jid.to_string())
.attr("participant", sender_jid.to_string())
.build();
Conditional chaining
Use let mut builder with reassignment for conditional attributes:
let mut builder = NodeBuilder::new("receipt")
.attr("id", &info.id)
.jid_attr("to", info.source.chat.clone());
if info.category == MessageCategory::Peer {
builder = builder.attr("type", "peer_msg");
}
if info.source.is_group {
builder = builder.jid_attr("participant", info.source.sender.clone());
}
let node = builder.build();
Token Dictionary
The protocol uses a token dictionary to compress common strings into single bytes.
Token Types
// Single-byte tokens (4-235)
pub const LIST_EMPTY: u8 = 0;
pub const INTEROP_JID: u8 = 245; // Interop JID
pub const FB_JID: u8 = 246; // Facebook JID
pub const AD_JID: u8 = 247; // JID with device ID
pub const LIST_8: u8 = 248; // List with <256 items
pub const LIST_16: u8 = 249; // List with ≥256 items
pub const JID_PAIR: u8 = 250; // JID in user@server format
pub const HEX_8: u8 = 251; // Packed hex string
pub const BINARY_8: u8 = 252; // Binary data <256 bytes
pub const BINARY_20: u8 = 253; // Binary data <1MB
pub const BINARY_32: u8 = 254; // Binary data ≥1MB
pub const NIBBLE_8: u8 = 255; // Packed numeric string
Location: wacore/binary/src/token.rs
Unified token lookup
Both single-byte and double-byte tokens are stored in a single compile-time hashify PTHash map (using FNV-1a hashing), generated by a build script from tokens.json. A single call to index_of_token resolves any known protocol string:
use wacore_binary::token::{index_of_token, TokenKind};
index_of_token("message") => Some(TokenKind::Single(19))
index_of_token("iq") => Some(TokenKind::Single(18))
index_of_token("body") => Some(TokenKind::Single(7))
index_of_token("participant") => Some(TokenKind::Double(dict, idx))
index_of_token("unknown_string") => None
The TokenKind enum distinguishes single-byte from double-byte tokens:
pub enum TokenKind {
Single(u8),
Double(u8, u8), // (dictionary index, token index)
}
The dictionary includes:
- Protocol tags (“message”, “iq”, “presence”)
- Common attributes (“id”, “type”, “to”, “from”)
- Frequent values (“text”, “chat”, “available”)
Reverse lookups (index → string) use separate arrays:
get_single_token(19) => Some("message")
get_double_token(0, 42) => Some("participant")
Location: wacore/binary/src/token.rs
Encoding Process
Marshal Functions
// Basic serialization
pub fn marshal(node: &Node) -> Result<Vec<u8>>
// Serialize to existing buffer (zero-copy for output)
pub fn marshal_to_vec(node: &Node, output: &mut Vec<u8>) -> Result<()>
// Two-pass encoding with exact size pre-calculation
pub fn marshal_exact(node: &Node) -> Result<Vec<u8>>
// Auto-sizing with heuristics
pub fn marshal_auto(node: &Node) -> Result<Vec<u8>>
Location: wacore/binary/src/marshal.rs:31-76
Encoding Strategy
The encoder uses multiple strategies based on data characteristics:
enum StringHint {
Empty, // "" → BINARY_8 + 0
SingleToken(u8), // "message" → 19
DoubleToken { dict: u8, token: u8 },
PackedNibble, // "123-456" → compressed
PackedHex, // "DEADBEEF" → compressed
Jid(ParsedJidMeta), // JID-specific encoding
RawBytes, // Fallback
}
Location: wacore/binary/src/encoder.rs:227-237
Packed Encoding
Nibble Packing (Numeric Strings)
Strings containing only digits, dash, and dot are packed into 4 bits per character:
// Input: "123-456.789"
// Encoding:
// '1' → 1, '2' → 2, '3' → 3, '-' → 10, '4' → 4, ...
// Packed: 0x12, 0x3A, 0x45, 0x67, 0x89
pub const PACKED_MAX: u8 = 127; // Max length for packed/token strings
fn pack_nibble(value: u8) -> u8 {
match value {
b'-' => 10,
b'.' => 11,
0 => 15, // Padding
c if c.is_ascii_digit() => c - b'0',
_ => panic!("Invalid nibble"),
}
}
Location: wacore/binary/src/encoder.rs:769-777
Hex Packing
Uppercase hex strings (0-9, A-F) are packed into 4 bits per character:
// Input: "DEADBEEF"
// Packed: 0xDE, 0xAD, 0xBE, 0xEF
fn pack_hex(value: u8) -> u8 {
match value {
c if c.is_ascii_digit() => c - b'0',
c if (b'A'..=b'F').contains(&c) => 10 + (c - b'A'),
0 => 15, // Padding
_ => panic!("Invalid hex"),
}
}
Location: wacore/binary/src/encoder.rs:780-787
SIMD Optimization
The encoder uses SIMD instructions for fast packing of long strings:
while input_bytes.len() >= 16 {
let input = u8x16::from_slice(chunk);
let indices = input.saturating_sub(nibble_base);
let nibbles = lookup.swizzle_dyn(indices);
let (evens, odds) = nibbles.deinterleave(
nibbles.rotate_elements_left::<1>()
);
let packed = (evens << Simd::splat(4)) | odds;
self.write_raw_bytes(&packed.to_array()[..8])?;
}
Location: wacore/binary/src/encoder.rs:809-824
JID Encoding
JIDs have special compact encodings:
JID_PAIR (Standard JID)
// Format: JID_PAIR + user + server
// Example: "15551234567@s.whatsapp.net"
self.write_u8(token::JID_PAIR)?;
if user.is_empty() {
self.write_u8(token::LIST_EMPTY)?;
} else {
self.write_string(user)?; // "15551234567"
}
self.write_string(server)?; // "s.whatsapp.net"
Location: wacore/binary/src/encoder.rs:706-715
AD_JID (Device-Specific JID)
// Format: AD_JID + domain_type + device + user
// Example: "15551234567:1@s.whatsapp.net" (device 1)
self.write_u8(token::AD_JID)?;
self.write_u8(server_to_domain_type(jid.server, jid.agent))?;
self.write_u8(device)?; // Device number
self.write_string(user)?; // User part only
The domain_type byte is derived from the Server enum variant at encoding time, not from the agent field directly. Since Server is #[repr(u8)], the mapping is a direct cast for known variants:
domain_type | Server variant | Wire string | Description |
|---|
0 | Server::Pn | s.whatsapp.net | Standard phone-number JIDs |
1 | Server::Lid | lid | Linked Identity JIDs |
128 | Server::Hosted | hosted | Cloud API / Meta Business API (phone-based) |
129 | Server::HostedLid | hosted.lid | Cloud API / Meta Business API (LID-based) |
| fallback | (other) | varies | Uses the agent byte from the JID |
The domain_type must be derived from the JID’s server field via server_to_domain_type(), not from jid.agent. A previous bug wrote jid.agent (which is 0 for most JIDs) unconditionally, causing LID JIDs to be encoded with domain_type=0 instead of domain_type=1. This made LID group messages silently rejected by the server with error 421.
Location: wacore/binary/src/encoder.rs:699-705, 362-369
List Encoding
Lists (including node structures) have length-prefixed encoding:
fn write_list_start(&mut self, len: usize) -> Result<()> {
if len == 0 {
self.write_u8(token::LIST_EMPTY)?; // 0x00
} else if len < 256 {
self.write_u8(token::LIST_8)?; // 0xF8
self.write_u8(len as u8)?;
} else {
self.write_u8(token::LIST_16)?; // 0xF9
self.write_u16_be(len as u16)?;
}
Ok(())
}
Location: wacore/binary/src/encoder.rs:865-876
A complete node is encoded as:
LIST_START(list_len)
tag
attr_key_1
attr_value_1
attr_key_2
attr_value_2
...
[content] // If present
Where list_len = 1 (tag) + (num_attrs * 2) + (content ? 1 : 0)
pub fn write_node<N: EncodeNode>(&mut self, node: &N) -> Result<()> {
let content_len = if node.has_content() { 1 } else { 0 };
let list_len = 1 + (node.attrs_len() * 2) + content_len;
self.write_list_start(list_len)?;
self.write_string(node.tag())?;
node.encode_attrs(self)?;
node.encode_content(self)?;
Ok(())
}
Location: wacore/binary/src/encoder.rs:879-889
Decoding Process
Decoder Structure
pub struct Decoder<'a> {
data: &'a [u8],
offset: usize,
}
impl<'a> Decoder<'a> {
pub fn read_node_ref(&mut self) -> Result<NodeRef<'a>>
pub fn read_list_size(&mut self) -> Result<usize>
pub fn read_string(&mut self, len: usize) -> Result<NodeStr<'a>>
}
Location: wacore/binary/src/decoder.rs
Zero-copy decoding
The decoder uses NodeRef<'a> to avoid allocations. String and byte payloads borrow directly from the input buffer. Decoded strings use NodeStr<'a> — a borrowed-or-inline string type that stores short owned values (up to 24 bytes) inline via CompactString, avoiding heap allocation:
/// Borrowed-or-inline string for decoded nodes.
pub enum NodeStr<'a> {
Borrowed(&'a str), // Zero-copy: points into input buffer
Owned(CompactString), // Inline for short strings (≤24 bytes)
}
pub struct NodeRef<'a> {
pub tag: NodeStr<'a>, // Borrowed or inline
pub attrs: AttrsRef<'a>, // Vec<(NodeStr<'a>, ValueRef<'a>)>
pub content: Option<Box<NodeContentRef<'a>>>,
}
pub enum NodeContentRef<'a> {
Bytes(Cow<'a, [u8]>), // Zero-copy for byte content
String(NodeStr<'a>), // Borrowed or inline
Nodes(Box<NodeVec<'a>>), // Recursive borrowing
}
NodeStr implements Deref<Target = str>, AsRef<str>, PartialEq<str>, and PartialEq<&str>, so you can use it anywhere a &str is expected. It also provides to_compact_string() for efficient conversion to an owned CompactString.
NodeStr replaces the previous Cow<'a, str> used in NodeRef, AttrsRef, ValueRef, and NodeContentRef. The key difference is that the Owned variant uses CompactString (inline up to 24 bytes) instead of String (always heap-allocated), reducing allocation pressure for the many short protocol strings that can’t be statically interned.
Location: wacore/binary/src/node.rs:10-106, 465-469, 437-441
OwnedNodeRef (yoke zero-copy)
OwnedNodeRef is a self-referential type that owns the decompressed network buffer while the inner NodeRef borrows string and byte payloads directly from it. This avoids copying payloads out of the buffer during decoding — only container allocations (attribute Vec, child Vec) occur.
pub struct OwnedNodeRef {
inner: Yoke<NodeRef<'static>, Vec<u8>>,
}
impl OwnedNodeRef {
/// Decode a node from an owned buffer.
pub fn new(buffer: Vec<u8>) -> Result<Self>;
/// Access the borrowed node.
pub fn get(&self) -> &NodeRef<'_>;
/// Convert to an owned Node (allocates — use sparingly).
pub fn to_owned_node(&self) -> Node;
// Convenience accessors: tag(), attrs(), get_attr(),
// children(), get_optional_child(), content_bytes(), etc.
}
Received stanzas flow through the system as Arc<OwnedNodeRef>, giving handlers cheap shared access to the zero-copy decoded node. The to_owned_node() method is available as an escape hatch when you need a fully owned Node, but it allocates all strings and bytes — defeating the zero-copy benefit.
Location: wacore/binary/src/node.rs:594-693
Node vs NodeRef usage pattern
Node (owned) — used for building and sending outgoing stanzas. Constructed via NodeBuilder.
NodeRef<'a> (borrowed) — used for reading received stanzas. Borrows from the network buffer.
OwnedNodeRef — wraps a NodeRef with its backing buffer via yoke, enabling safe zero-copy sharing across handler tasks as Arc<OwnedNodeRef>.
Zero-copy serialization
The entire NodeRef type family implements serde::Serialize (gated behind the serde feature), producing output identical to their owned counterparts. This means you can serialize a NodeRef, OwnedNodeRef, ValueRef, JidRef, or NodeContentRef directly — without converting to an owned Node first — avoiding all intermediate allocations.
use serde_json;
// Serialize an OwnedNodeRef directly — zero-copy from the network buffer
let owned_ref: OwnedNodeRef = OwnedNodeRef::new(buffer)?;
let json = serde_json::to_string(&owned_ref)?;
// Equivalent but allocates: convert to Node first
let json_owned = serde_json::to_string(&owned_ref.to_owned_node())?;
// Both produce identical JSON output
assert_eq!(json, json_owned);
The following types implement Serialize:
| Type | Serializes as | Notes |
|---|
NodeRef<'a> | Node struct | Fields: tag (as &str), attrs (newtype wrapper), content |
NodeStr<'a> | &str | Borrows the inner string directly |
ValueRef<'a> | NodeValue enum | Variant names match: String, Jid |
JidRef<'a> | Jid struct | Fields: user, server, agent, device, integrator |
NodeContentRef<'a> | NodeContent enum | Variant names match: Bytes, String, Nodes |
OwnedNodeRef | Node struct | Delegates to inner NodeRef::serialize |
The Serialize implementations use an AttrsRefWrapper to match the newtype-struct framing that serde’s derive produces for Attrs(Vec<...>). This ensures compatibility with binary formats like bincode and postcard, which distinguish between a bare sequence and a newtype struct wrapper.
This is useful for logging, debugging, protocol inspection, and forwarding stanzas to external systems without paying the cost of to_owned_node().
Location: wacore/binary/src/node.rs:67-71, 397-407, 473-488, 612-632, 875-880; wacore/binary/src/jid.rs:603-615
The borrowed counterpart to NodeValue is ValueRef<'a>, used in the decoder path and in NodeRef attributes:
pub enum ValueRef<'a> {
String(NodeStr<'a>),
Jid(JidRef<'a>),
}
ValueRef provides three methods: as_str() (returns Cow<'_, str> — zero-copy for String, allocates for Jid), as_jid() (returns Option<&JidRef>, only for Jid variant), and to_jid() (converts either variant to owned Jid, parsing from string if necessary).
Location: wacore/binary/src/node.rs:282-313
Attribute parsing
AttrParser and AttrParserRef provide structured attribute extraction from owned Node and borrowed NodeRef values respectively. They accumulate parse errors instead of panicking:
let mut parser = node.attr_parser();
let msg_id: String = parser.required_str("id");
let msg_type: Option<String> = parser.optional_str("type");
let recipient: Option<Jid> = parser.optional_jid("to");
parser.finish()?; // Returns accumulated errors if any
The optional_jid method handles both NodeValue variants:
- If the attribute is a
NodeValue::Jid, it returns the JID directly via clone (zero parse cost).
- If the attribute is a
NodeValue::String, it parses via Jid::from_str. Parse failures are captured in the error list and surfaced when you call finish(), rather than being silently discarded.
This ensures that malformed JID strings in protocol messages are reported as BinaryError::Jid (for AttrParser) or BinaryError::AttrParse (for AttrParserRef) instead of silently returning None.
Location: wacore/binary/src/attrs.rs
Unpacking
Reverse of the packing process:
fn unpack_nibble(packed: u8, position: u8) -> u8 {
let nibble = if position == 0 {
(packed >> 4) & 0x0F
} else {
packed & 0x0F
};
match nibble {
0..=9 => b'0' + nibble,
10 => b'-',
11 => b'.',
15 => 0, // Padding
_ => panic!("Invalid nibble"),
}
}
Location: wacore/binary/src/decoder.rs:400-450
Token interning with Cow
When converting decoded NodeRef values to owned Node values, the intern_cow function maps known protocol strings to their static references using the unified hashify lookup:
fn intern_cow(s: &str) -> Cow<'static, str> {
if let Some(kind) = token::index_of_token(s) {
let interned = match kind {
token::TokenKind::Single(idx) => token::get_single_token(idx),
token::TokenKind::Double(dict, idx) => token::get_double_token(dict, idx),
};
if let Some(token) = interned {
return Cow::Borrowed(token); // Zero-alloc: points to static str
}
}
Cow::Owned(s.to_string()) // Fallback: heap-allocate unknown strings
}
The unified token map is generated at compile time via hashify (PTHash with FNV-1a), so the single lookup is O(1). Since the vast majority of node tags and attribute keys are part of the WhatsApp token dictionary, this eliminates most heap allocations during protocol decoding.
This optimization applies to:
Node.tag — protocol tags like "message", "iq", "receipt"
Attrs keys — attribute names like "id", "type", "to", "from"
Jid.server is now a Server enum (a Copy type, #[repr(u8)]), so it requires no allocation at all — neither heap nor interning. This is an improvement over the previous Cow<'static, str> approach.
Location: wacore/binary/src/node.rs:108-123
Two-Pass Encoding
For large or variable-size payloads, exact size calculation prevents buffer growth:
pub fn marshal_exact(node: &Node) -> Result<Vec<u8>> {
// Pass 1: Calculate exact size
let plan = build_marshaled_node_plan(node);
// Pass 2: Encode directly into fixed-size buffer
let mut payload = vec![0; plan.size];
let mut encoder = Encoder::new_slice(&mut payload, Some(&plan.hints))?;
encoder.write_node(node)?;
Ok(payload)
}
Location: wacore/binary/src/marshal.rs:67-76
String Hint Cache
Repeated strings (like JIDs) are analyzed once and cached. Strings longer than PACKED_MAX (127 bytes) are immediately classified as RawBytes without running the full classification logic, since they can never be protocol tokens (max 48 bytes), packed nibble/hex, or JIDs:
pub struct StringHintCache {
hints: Vec<(StrKey, StringHint)>,
}
impl StringHintCache {
fn hint_or_insert(&mut self, s: &str) -> StringHint {
// Strings longer than PACKED_MAX can't be tokens,
// packed nibble/hex, or JIDs — skip classification entirely
if s.len() > token::PACKED_MAX as usize {
return StringHint::RawBytes;
}
if let Some(existing) = self.hints.iter().find(...) {
return existing;
}
let hint = classify_string_hint(s);
self.hints.push((key, hint));
hint
}
}
The same length check is applied in the uncached write path (write_string_uncached), where strings exceeding PACKED_MAX are emitted directly as raw bytes without classification. This avoids unnecessary work for long strings like message bodies, media URLs, and base64-encoded payloads.
Location: wacore/binary/src/encoder.rs:240-287
Capacity estimation
Auto-sizing strategy samples node structure to estimate capacity. The Cow<'static, str> tag on owned Node works transparently since Cow implements Deref<Target = str>:
fn estimate_capacity_node(node: &Node) -> usize {
let mut estimate = DEFAULT_MARSHAL_CAPACITY + 16;
estimate += node.tag.len(); // Works with both Cow::Borrowed and Cow::Owned
estimate += node.attrs.len() * AUTO_ATTR_ESTIMATE; // ~24 bytes/attr
if let Some(NodeContent::Nodes(children)) = &node.content {
estimate += children.len() * AUTO_CHILD_ESTIMATE; // ~96 bytes/child
// Sample first 32 children for better accuracy
for child in children.iter().take(AUTO_CHILD_SAMPLE_LIMIT) {
estimate += child.tag.len() + ...
}
}
estimate.clamp(DEFAULT_MARSHAL_CAPACITY, AUTO_MAX_HINT_CAPACITY)
}
Location: wacore/binary/src/marshal.rs:167-200
Common Protocol Patterns
IQ (Info/Query) Stanzas
// Request
NodeBuilder::new("iq")
.attr("id", "ABC123")
.attr("type", "get")
.attr("xmlns", "w:g2")
.attr("to", "@s.whatsapp.net")
.children(vec![
NodeBuilder::new("query").build(),
])
.build()
// Response
NodeBuilder::new("iq")
.attr("id", "ABC123")
.attr("type", "result")
.attr("from", "@s.whatsapp.net")
.children(vec![
NodeBuilder::new("group")
.attr("id", "123456@g.us")
.attr("subject", "My Group")
.build(),
])
.build()
Messages
NodeBuilder::new("message")
.attr("to", "15551234567@s.whatsapp.net")
.attr("type", "text")
.attr("id", message_id)
.children(vec![
NodeBuilder::new("enc")
.attr("v", "2")
.attr("type", "msg")
.bytes(encrypted_payload)
.build(),
])
.build()
Receipts
NodeBuilder::new("receipt")
.attr("to", "15551234567@s.whatsapp.net")
.attr("id", message_id)
.attr("type", "read")
.attr("t", timestamp)
.build()
Simple Message
Node: <message type="text"/>
Binary:
F8 03 LIST_8(3) [tag + 2 attrs]
13 Token("message")
16 Token("type")
07 Token("text")
Message with Body
Node: <message type="text"><body>Hi</body></message>
Binary:
F8 04 LIST_8(4) [tag + 2 attrs + content]
13 Token("message")
16 Token("type")
07 Token("text")
F8 02 LIST_8(2) [child: tag + content]
07 Token("body")
FC 02 BINARY_8(2)
48 69 "Hi"
Inspecting Encoded Data
Use evcxr REPL for interactive exploration:
:dep wacore-binary = { path = "wacore/binary" }
:dep hex = "0.4"
use wacore_binary::marshal::unmarshal_ref;
use wacore_binary::builder::NodeBuilder;
// Decode binary data
{
let data = hex::decode("f8034c1a07").unwrap();
let node = unmarshal_ref(&data).unwrap();
println!("Tag: {}", node.tag);
for (k, v) in node.attrs.iter() {
println!(" {}: {}", k, v);
}
}
// Encode and inspect
{
let node = NodeBuilder::new("message")
.attr("type", "text")
.build();
let bytes = marshal(&node).unwrap();
println!("Encoded: {:02x?}", bytes);
}
Error Handling
pub enum BinaryError {
UnexpectedEof,
InvalidToken(u8),
InvalidListSize,
AttrParse(String),
Jid(JidParseError), // JID parse failures from AttrParser
LeftoverData(usize),
Io(std::io::Error),
}
The Jid variant is emitted by AttrParser::optional_jid when a string attribute fails to parse as a JID. This ensures malformed JIDs in protocol messages are surfaced as typed errors rather than silently ignored.
Location: wacore/binary/src/error.rs
References
- Source:
wacore/binary/src/
- Token dictionary:
wacore/binary/src/token.rs
- Node builder:
wacore/binary/src/builder.rs