Skip to main content

Overview

WhatsApp uses a custom binary protocol for all communication between clients and servers. This format is significantly more compact than JSON or XML and optimized for mobile network conditions. The protocol encodes messages as nodes - hierarchical structures with tags, attributes, and content. All nodes are serialized to binary format before encryption and transmission.

Architecture

The binary protocol implementation is in wacore/binary/, a platform-agnostic crate:
wacore/binary/src/
├── marshal.rs     # Serialization entry points
├── encoder.rs     # Binary encoding logic
├── decoder.rs     # Binary decoding logic
├── node.rs        # Node data structures
├── token.rs       # Token dictionary
├── jid.rs         # JID (identifier) handling
└── builder.rs     # Fluent API for node construction

Node Structure

Node Definition

A node represents a protocol message or message component:
use compact_str::CompactString;
use std::borrow::Cow;

pub struct Node {
    pub tag: Cow<'static, str>,        // e.g., "message", "receipt", "iq"
    pub attrs: Attrs,                  // Key-value attributes
    pub content: Option<NodeContent>,  // Optional content
}

pub enum NodeContent {
    Bytes(Vec<u8>),             // Binary payload
    String(CompactString),      // Text payload
    Nodes(Vec<Node>),           // Child nodes
}
The tag field uses Cow<'static, str> so that known protocol tags (like "message", "iq", "receipt") are borrowed as zero-allocation static references from the token dictionary, while unknown tags fall back to an owned String. Location: wacore/binary/src/node.rs:459

Attributes

Attributes are stored as key-value pairs with specialized value types:
use compact_str::CompactString;

pub enum NodeValue {
    String(CompactString),
    Jid(Jid),                // Optimized for WhatsApp identifiers
}

pub struct Attrs(pub Vec<(Cow<'static, str>, NodeValue)>);
Like Node.tag, attribute keys use Cow<'static, str> so that common protocol attribute names (like "id", "type", "to", "from") reference static memory from the token dictionary rather than allocating on the heap.

NodeValue API

NodeValue provides exactly two methods for accessing the underlying value, regardless of variant:
use std::borrow::Cow;

// Get a string view of the value (works for both variants)
// - String variant: Cow::Borrowed(&str) — zero copy
// - Jid variant: Cow::Owned(formatted) — allocates only when needed
pub fn as_str(&self) -> Cow<'_, str>

// Convert to an owned Jid, parsing from string if necessary
// - Jid variant: clones the Jid directly
// - String variant: attempts to parse, returns None on failure
pub fn to_jid(&self) -> Option<Jid>
This simplified API means you never need to match on the variant directly — use as_str() when you need the value as text, and to_jid() when you need a structured JID:
// Reading an attribute value as a string
let msg_type = node.attrs.get("type")
    .map(|v| v.as_str().into_owned());

// Reading an attribute value as a JID
let recipient = node.attrs.get("to")
    .and_then(|v| v.to_jid());
NodeValue also implements PartialEq<str> for zero-allocation comparisons — the Jid variant compares byte-by-byte against the formatted string without allocating. Location: wacore/binary/src/node.rs:39-58 Why Jid as a separate type? JIDs (Jabber IDs) like 15551234567@s.whatsapp.net appear frequently in the protocol. Storing them as structured data avoids repeated parsing/formatting overhead:
use compact_str::CompactString;

pub struct Jid {
    pub user: CompactString,           // "15551234567"
    pub server: Server,                // Server::Pn (s.whatsapp.net)
    pub agent: u8,                     // Agent byte (parsed from JID string)
    pub device: u16,                   // Device ID (0 for primary)
    pub integrator: u16,               // Integrator ID (used with interop server)
}
The user field uses CompactString (re-exported from compact_str) instead of String. CompactString stores short strings inline (up to 24 bytes on 64-bit platforms) without heap allocation, which benefits typical phone numbers and user identifiers. The library re-exports it as wacore_binary::CompactString and whatsapp_rust::CompactString for convenience. CompactString implements From<&str>, From<String>, and Deref<Target = str>, so it works as a drop-in replacement in most contexts — but code that relied on Jid.user being a String (e.g., passing it to functions expecting &String or calling String-specific methods) may need updating.

Server enum

The server field is a Server enum (#[repr(u8)]) that maps to the wire protocol’s AD_JID domain type. This replaces the previous Cow<'static, str> string representation, eliminating all heap allocation for server identifiers and enabling match-based dispatch instead of string comparisons:
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[repr(u8)]
pub enum Server {
    #[default]
    Pn = 0,          // s.whatsapp.net — Standard phone-number JIDs
    Lid = 1,         // lid — Linked Identity JIDs
    Group = 2,       // g.us — Group chat JIDs
    Broadcast = 3,   // broadcast — Broadcast lists and status
    Newsletter = 4,  // newsletter — Newsletter / channel JIDs
    Hosted = 5,      // hosted — Cloud API business devices (phone-based)
    HostedLid = 6,   // hosted.lid — Cloud API business devices (LID-based)
    Messenger = 7,   // msgr — Messenger interop JIDs
    Interop = 8,     // interop — Cross-platform interop JIDs
    Bot = 9,         // bot — Bot JIDs
    Legacy = 10,     // c.us — Legacy user server (pre-multidevice)
}
Server implements Display (returns the wire string like "s.whatsapp.net"), as_str() for zero-cost string access, TryFrom<&str> for parsing, Serialize/Deserialize (as the wire string), and PartialEq<str> / PartialEq<&str> for backward-compatible string comparisons:
let server = Server::Pn;
assert_eq!(server.as_str(), "s.whatsapp.net");
assert!(server == "s.whatsapp.net");  // PartialEq<str> for backward compat

let parsed = Server::try_from("g.us").unwrap();
assert_eq!(parsed, Server::Group);
If you previously compared jid.server to string constants like "s.whatsapp.net", the PartialEq<str> impl on Server preserves backward compatibility. However, match on the enum variant is preferred for exhaustiveness checking and performance.
String constants are still available for backward compatibility and use in non-JID contexts:
ConstantValueServer variant
DEFAULT_USER_SERVERs.whatsapp.netServer::Pn
HIDDEN_USER_SERVERlidServer::Lid
GROUP_SERVERg.usServer::Group
BROADCAST_SERVERbroadcastServer::Broadcast
NEWSLETTER_SERVERnewsletterServer::Newsletter
HOSTED_SERVERhostedServer::Hosted
HOSTED_LID_SERVERhosted.lidServer::HostedLid
MESSENGER_SERVERmsgrServer::Messenger
INTEROP_SERVERinteropServer::Interop
BOT_SERVERbotServer::Bot
LEGACY_USER_SERVERc.usServer::Legacy

Typed constructors

Convenience constructors avoid specifying the server directly:
Jid::pn("15551234567")            // @s.whatsapp.net, device 0
Jid::lid("ABC123")                // @lid, device 0
Jid::group("12345678")            // @g.us
Jid::newsletter("12345678")       // @newsletter
Jid::pn_device("15551234567", 1)  // @s.whatsapp.net, device 1
Jid::lid_device("ABC123", 2)      // @lid, device 2
Jid::status_broadcast()           // status@broadcast
Jid::new("user", Server::Pn)      // arbitrary server variant

Borrowing types

For zero-allocation lookups and comparisons, the protocol also provides:
  • JidRef<'a> — a borrowing version of Jid where user is NodeStr<'a> (borrowed or inline) and server is the Server enum (already Copy). Used for zero-copy decoded JIDs in NodeRef attributes
  • DeviceKey<'a> — a lightweight key containing (&'a str, &'a str, u16) for user/server/device, used for HashSet lookups without cloning
Location: wacore/binary/src/node.rs:10-112, wacore/binary/src/jid.rs

JidExt trait

The JidExt trait provides type-checking methods on JIDs. It is implemented for Jid, JidRef, and other borrowing types so you can inspect a JID’s server type without string comparisons:
use wacore_binary::jid::{Jid, JidExt};

let jid = Jid::pn("15551234567");
assert!(jid.is_ad());        // true — s.whatsapp.net is an AD server
assert!(!jid.is_group());    // false — not a @g.us JID
MethodReturns true when
is_ad()Server is Pn, Lid, Hosted, or HostedLid and device > 0
is_group()Server is Group
is_broadcast_list()Server is Broadcast and user is not "status"
is_status_broadcast()User is "status" and server is Broadcast
is_newsletter()Server is Newsletter
is_hosted()Device is 99, or server is Hosted / HostedLid
is_bot()Server is Bot, or phone number starts with known bot prefixes
is_interop()Server is Interop and integrator > 0
is_messenger()Server is Messenger and device > 0
is_empty()User is empty
is_same_user_as(other)Both JIDs share the same user
The trait also exposes basic accessor methods (user() -> &str, server() -> Server, device() -> u16, integrator() -> u16) that work uniformly across owned and borrowed JID types. Additional helper methods is_pn() and is_lid() are available directly on Jid for the most common server checks. Location: wacore/binary/src/jid.rs:304-363

NodeBuilder API

The NodeBuilder provides a fluent chaining API for constructing nodes. All setter methods consume and return Self:
use wacore_binary::builder::NodeBuilder;
use wacore_binary::jid::Jid;

let message = NodeBuilder::new("message")
    .attr("to", "15551234567@s.whatsapp.net")
    .attr("type", "text")
    .attr("id", "ABCD1234")
    .children(vec![
        NodeBuilder::new("body").string_content("Hello, world!").build(),
    ])
    .build();

Available methods

MethodSignatureDescription
newnew(tag: &'static str) -> SelfCreate a builder with a static tag (zero-alloc)
new_dynamicnew_dynamic(tag: String) -> SelfCreate a builder with a dynamic tag
attrattr(self, key: &'static str, value: impl Into<NodeValue>) -> SelfAdd a string attribute (zero-alloc key)
jid_attrjid_attr(self, key: &'static str, jid: Jid) -> SelfAdd a JID attribute without stringifying
attrsattrs(self, attrs: impl IntoIterator<Item = (&'static str, V)>) -> SelfBulk-add attributes from an iterator
childrenchildren(children: impl IntoIterator<Item = Node>) -> SelfSet child nodes as content
bytesbytes(bytes: impl Into<Vec<u8>>) -> SelfSet raw bytes as content
string_contentstring_content(s: impl Into<CompactString>) -> SelfSet string as content
apply_contentapply_content(content: Option<NodeContent>) -> SelfSet arbitrary content
buildbuild(self) -> NodeConsume the builder and produce a Node
The new and attr methods accept &'static str for tags and keys, which creates Cow::Borrowed values on the owned Node with zero heap allocation. For rare cases where the tag is computed at runtime, use new_dynamic. Location: wacore/binary/src/builder.rs

jid_attr vs attr

The jid_attr method stores JIDs as NodeValue::Jid(jid) directly in the attribute map, avoiding the allocation cost of jid.to_string(). Use jid_attr for JID-valued attributes like to, from, and participant on hot paths:
// Prefer jid_attr for JID attributes — avoids string allocation
let receipt = NodeBuilder::new("receipt")
    .attr("id", &message_id)
    .jid_attr("to", chat_jid.clone())
    .jid_attr("participant", sender_jid.clone())
    .build();

// Equivalent but allocates a string per JID
let receipt = NodeBuilder::new("receipt")
    .attr("id", &message_id)
    .attr("to", chat_jid.to_string())
    .attr("participant", sender_jid.to_string())
    .build();

Conditional chaining

Use let mut builder with reassignment for conditional attributes:
let mut builder = NodeBuilder::new("receipt")
    .attr("id", &info.id)
    .jid_attr("to", info.source.chat.clone());

if info.category == MessageCategory::Peer {
    builder = builder.attr("type", "peer_msg");
}

if info.source.is_group {
    builder = builder.jid_attr("participant", info.source.sender.clone());
}

let node = builder.build();

Token Dictionary

The protocol uses a token dictionary to compress common strings into single bytes.

Token Types

// Single-byte tokens (4-235)
pub const LIST_EMPTY: u8 = 0;
pub const INTEROP_JID: u8 = 245; // Interop JID
pub const FB_JID: u8 = 246;      // Facebook JID
pub const AD_JID: u8 = 247;      // JID with device ID
pub const LIST_8: u8 = 248;      // List with <256 items
pub const LIST_16: u8 = 249;     // List with ≥256 items
pub const JID_PAIR: u8 = 250;    // JID in user@server format
pub const HEX_8: u8 = 251;       // Packed hex string
pub const BINARY_8: u8 = 252;    // Binary data <256 bytes
pub const BINARY_20: u8 = 253;   // Binary data <1MB
pub const BINARY_32: u8 = 254;   // Binary data ≥1MB
pub const NIBBLE_8: u8 = 255;    // Packed numeric string
Location: wacore/binary/src/token.rs

Unified token lookup

Both single-byte and double-byte tokens are stored in a single compile-time hashify PTHash map (using FNV-1a hashing), generated by a build script from tokens.json. A single call to index_of_token resolves any known protocol string:
use wacore_binary::token::{index_of_token, TokenKind};

index_of_token("message") => Some(TokenKind::Single(19))
index_of_token("iq")      => Some(TokenKind::Single(18))
index_of_token("body")    => Some(TokenKind::Single(7))
index_of_token("participant") => Some(TokenKind::Double(dict, idx))
index_of_token("unknown_string") => None
The TokenKind enum distinguishes single-byte from double-byte tokens:
pub enum TokenKind {
    Single(u8),
    Double(u8, u8),  // (dictionary index, token index)
}
The dictionary includes:
  • Protocol tags (“message”, “iq”, “presence”)
  • Common attributes (“id”, “type”, “to”, “from”)
  • Frequent values (“text”, “chat”, “available”)
Reverse lookups (index → string) use separate arrays:
get_single_token(19)    => Some("message")
get_double_token(0, 42) => Some("participant")
Location: wacore/binary/src/token.rs

Encoding Process

Marshal Functions

// Basic serialization
pub fn marshal(node: &Node) -> Result<Vec<u8>>

// Serialize to existing buffer (zero-copy for output)
pub fn marshal_to_vec(node: &Node, output: &mut Vec<u8>) -> Result<()>

// Two-pass encoding with exact size pre-calculation
pub fn marshal_exact(node: &Node) -> Result<Vec<u8>>

// Auto-sizing with heuristics
pub fn marshal_auto(node: &Node) -> Result<Vec<u8>>
Location: wacore/binary/src/marshal.rs:31-76

Encoding Strategy

The encoder uses multiple strategies based on data characteristics:
enum StringHint {
    Empty,                          // "" → BINARY_8 + 0
    SingleToken(u8),                // "message" → 19
    DoubleToken { dict: u8, token: u8 },
    PackedNibble,                   // "123-456" → compressed
    PackedHex,                      // "DEADBEEF" → compressed
    Jid(ParsedJidMeta),             // JID-specific encoding
    RawBytes,                       // Fallback
}
Location: wacore/binary/src/encoder.rs:227-237

Packed Encoding

Nibble Packing (Numeric Strings)

Strings containing only digits, dash, and dot are packed into 4 bits per character:
// Input: "123-456.789"
// Encoding:
// '1' → 1, '2' → 2, '3' → 3, '-' → 10, '4' → 4, ...
// Packed: 0x12, 0x3A, 0x45, 0x67, 0x89

pub const PACKED_MAX: u8 = 127;  // Max length for packed/token strings

fn pack_nibble(value: u8) -> u8 {
    match value {
        b'-' => 10,
        b'.' => 11,
        0 => 15,  // Padding
        c if c.is_ascii_digit() => c - b'0',
        _ => panic!("Invalid nibble"),
    }
}
Location: wacore/binary/src/encoder.rs:769-777

Hex Packing

Uppercase hex strings (0-9, A-F) are packed into 4 bits per character:
// Input: "DEADBEEF"
// Packed: 0xDE, 0xAD, 0xBE, 0xEF

fn pack_hex(value: u8) -> u8 {
    match value {
        c if c.is_ascii_digit() => c - b'0',
        c if (b'A'..=b'F').contains(&c) => 10 + (c - b'A'),
        0 => 15,  // Padding
        _ => panic!("Invalid hex"),
    }
}
Location: wacore/binary/src/encoder.rs:780-787

SIMD Optimization

The encoder uses SIMD instructions for fast packing of long strings:
while input_bytes.len() >= 16 {
    let input = u8x16::from_slice(chunk);
    let indices = input.saturating_sub(nibble_base);
    let nibbles = lookup.swizzle_dyn(indices);
    
    let (evens, odds) = nibbles.deinterleave(
        nibbles.rotate_elements_left::<1>()
    );
    let packed = (evens << Simd::splat(4)) | odds;
    self.write_raw_bytes(&packed.to_array()[..8])?;
}
Location: wacore/binary/src/encoder.rs:809-824

JID Encoding

JIDs have special compact encodings:

JID_PAIR (Standard JID)

// Format: JID_PAIR + user + server
// Example: "15551234567@s.whatsapp.net"
self.write_u8(token::JID_PAIR)?;
if user.is_empty() {
    self.write_u8(token::LIST_EMPTY)?;
} else {
    self.write_string(user)?;  // "15551234567"
}
self.write_string(server)?;    // "s.whatsapp.net"
Location: wacore/binary/src/encoder.rs:706-715

AD_JID (Device-Specific JID)

// Format: AD_JID + domain_type + device + user
// Example: "15551234567:1@s.whatsapp.net" (device 1)
self.write_u8(token::AD_JID)?;
self.write_u8(server_to_domain_type(jid.server, jid.agent))?;
self.write_u8(device)?;            // Device number
self.write_string(user)?;          // User part only
The domain_type byte is derived from the Server enum variant at encoding time, not from the agent field directly. Since Server is #[repr(u8)], the mapping is a direct cast for known variants:
domain_typeServer variantWire stringDescription
0Server::Pns.whatsapp.netStandard phone-number JIDs
1Server::LidlidLinked Identity JIDs
128Server::HostedhostedCloud API / Meta Business API (phone-based)
129Server::HostedLidhosted.lidCloud API / Meta Business API (LID-based)
fallback(other)variesUses the agent byte from the JID
The domain_type must be derived from the JID’s server field via server_to_domain_type(), not from jid.agent. A previous bug wrote jid.agent (which is 0 for most JIDs) unconditionally, causing LID JIDs to be encoded with domain_type=0 instead of domain_type=1. This made LID group messages silently rejected by the server with error 421.
Location: wacore/binary/src/encoder.rs:699-705, 362-369

List Encoding

Lists (including node structures) have length-prefixed encoding:
fn write_list_start(&mut self, len: usize) -> Result<()> {
    if len == 0 {
        self.write_u8(token::LIST_EMPTY)?;  // 0x00
    } else if len < 256 {
        self.write_u8(token::LIST_8)?;      // 0xF8
        self.write_u8(len as u8)?;
    } else {
        self.write_u8(token::LIST_16)?;     // 0xF9
        self.write_u16_be(len as u16)?;
    }
    Ok(())
}
Location: wacore/binary/src/encoder.rs:865-876

Node Encoding Format

A complete node is encoded as:
LIST_START(list_len)
    tag
    attr_key_1
    attr_value_1
    attr_key_2
    attr_value_2
    ...
    [content]  // If present
Where list_len = 1 (tag) + (num_attrs * 2) + (content ? 1 : 0)
pub fn write_node<N: EncodeNode>(&mut self, node: &N) -> Result<()> {
    let content_len = if node.has_content() { 1 } else { 0 };
    let list_len = 1 + (node.attrs_len() * 2) + content_len;
    
    self.write_list_start(list_len)?;
    self.write_string(node.tag())?;
    node.encode_attrs(self)?;
    node.encode_content(self)?;
    Ok(())
}
Location: wacore/binary/src/encoder.rs:879-889

Decoding Process

Decoder Structure

pub struct Decoder<'a> {
    data: &'a [u8],
    offset: usize,
}

impl<'a> Decoder<'a> {
    pub fn read_node_ref(&mut self) -> Result<NodeRef<'a>>
    pub fn read_list_size(&mut self) -> Result<usize>
    pub fn read_string(&mut self, len: usize) -> Result<NodeStr<'a>>
}
Location: wacore/binary/src/decoder.rs

Zero-copy decoding

The decoder uses NodeRef<'a> to avoid allocations. String and byte payloads borrow directly from the input buffer. Decoded strings use NodeStr<'a> — a borrowed-or-inline string type that stores short owned values (up to 24 bytes) inline via CompactString, avoiding heap allocation:
/// Borrowed-or-inline string for decoded nodes.
pub enum NodeStr<'a> {
    Borrowed(&'a str),        // Zero-copy: points into input buffer
    Owned(CompactString),     // Inline for short strings (≤24 bytes)
}

pub struct NodeRef<'a> {
    pub tag: NodeStr<'a>,          // Borrowed or inline
    pub attrs: AttrsRef<'a>,       // Vec<(NodeStr<'a>, ValueRef<'a>)>
    pub content: Option<Box<NodeContentRef<'a>>>,
}

pub enum NodeContentRef<'a> {
    Bytes(Cow<'a, [u8]>),    // Zero-copy for byte content
    String(NodeStr<'a>),     // Borrowed or inline
    Nodes(Box<NodeVec<'a>>), // Recursive borrowing
}
NodeStr implements Deref<Target = str>, AsRef<str>, PartialEq<str>, and PartialEq<&str>, so you can use it anywhere a &str is expected. It also provides to_compact_string() for efficient conversion to an owned CompactString.
NodeStr replaces the previous Cow<'a, str> used in NodeRef, AttrsRef, ValueRef, and NodeContentRef. The key difference is that the Owned variant uses CompactString (inline up to 24 bytes) instead of String (always heap-allocated), reducing allocation pressure for the many short protocol strings that can’t be statically interned.
Location: wacore/binary/src/node.rs:10-106, 465-469, 437-441

OwnedNodeRef (yoke zero-copy)

OwnedNodeRef is a self-referential type that owns the decompressed network buffer while the inner NodeRef borrows string and byte payloads directly from it. This avoids copying payloads out of the buffer during decoding — only container allocations (attribute Vec, child Vec) occur.
pub struct OwnedNodeRef {
    inner: Yoke<NodeRef<'static>, Vec<u8>>,
}

impl OwnedNodeRef {
    /// Decode a node from an owned buffer.
    pub fn new(buffer: Vec<u8>) -> Result<Self>;

    /// Access the borrowed node.
    pub fn get(&self) -> &NodeRef<'_>;

    /// Convert to an owned Node (allocates — use sparingly).
    pub fn to_owned_node(&self) -> Node;

    // Convenience accessors: tag(), attrs(), get_attr(),
    // children(), get_optional_child(), content_bytes(), etc.
}
Received stanzas flow through the system as Arc<OwnedNodeRef>, giving handlers cheap shared access to the zero-copy decoded node. The to_owned_node() method is available as an escape hatch when you need a fully owned Node, but it allocates all strings and bytes — defeating the zero-copy benefit. Location: wacore/binary/src/node.rs:594-693

Node vs NodeRef usage pattern

  • Node (owned) — used for building and sending outgoing stanzas. Constructed via NodeBuilder.
  • NodeRef<'a> (borrowed) — used for reading received stanzas. Borrows from the network buffer.
  • OwnedNodeRef — wraps a NodeRef with its backing buffer via yoke, enabling safe zero-copy sharing across handler tasks as Arc<OwnedNodeRef>.

Zero-copy serialization

The entire NodeRef type family implements serde::Serialize (gated behind the serde feature), producing output identical to their owned counterparts. This means you can serialize a NodeRef, OwnedNodeRef, ValueRef, JidRef, or NodeContentRef directly — without converting to an owned Node first — avoiding all intermediate allocations.
use serde_json;

// Serialize an OwnedNodeRef directly — zero-copy from the network buffer
let owned_ref: OwnedNodeRef = OwnedNodeRef::new(buffer)?;
let json = serde_json::to_string(&owned_ref)?;

// Equivalent but allocates: convert to Node first
let json_owned = serde_json::to_string(&owned_ref.to_owned_node())?;

// Both produce identical JSON output
assert_eq!(json, json_owned);
The following types implement Serialize:
TypeSerializes asNotes
NodeRef<'a>Node structFields: tag (as &str), attrs (newtype wrapper), content
NodeStr<'a>&strBorrows the inner string directly
ValueRef<'a>NodeValue enumVariant names match: String, Jid
JidRef<'a>Jid structFields: user, server, agent, device, integrator
NodeContentRef<'a>NodeContent enumVariant names match: Bytes, String, Nodes
OwnedNodeRefNode structDelegates to inner NodeRef::serialize
The Serialize implementations use an AttrsRefWrapper to match the newtype-struct framing that serde’s derive produces for Attrs(Vec<...>). This ensures compatibility with binary formats like bincode and postcard, which distinguish between a bare sequence and a newtype struct wrapper.
This is useful for logging, debugging, protocol inspection, and forwarding stanzas to external systems without paying the cost of to_owned_node(). Location: wacore/binary/src/node.rs:67-71, 397-407, 473-488, 612-632, 875-880; wacore/binary/src/jid.rs:603-615 The borrowed counterpart to NodeValue is ValueRef<'a>, used in the decoder path and in NodeRef attributes:
pub enum ValueRef<'a> {
    String(NodeStr<'a>),
    Jid(JidRef<'a>),
}
ValueRef provides three methods: as_str() (returns Cow<'_, str> — zero-copy for String, allocates for Jid), as_jid() (returns Option<&JidRef>, only for Jid variant), and to_jid() (converts either variant to owned Jid, parsing from string if necessary). Location: wacore/binary/src/node.rs:282-313

Attribute parsing

AttrParser and AttrParserRef provide structured attribute extraction from owned Node and borrowed NodeRef values respectively. They accumulate parse errors instead of panicking:
let mut parser = node.attr_parser();

let msg_id: String = parser.required_str("id");
let msg_type: Option<String> = parser.optional_str("type");
let recipient: Option<Jid> = parser.optional_jid("to");

parser.finish()?; // Returns accumulated errors if any
The optional_jid method handles both NodeValue variants:
  • If the attribute is a NodeValue::Jid, it returns the JID directly via clone (zero parse cost).
  • If the attribute is a NodeValue::String, it parses via Jid::from_str. Parse failures are captured in the error list and surfaced when you call finish(), rather than being silently discarded.
This ensures that malformed JID strings in protocol messages are reported as BinaryError::Jid (for AttrParser) or BinaryError::AttrParse (for AttrParserRef) instead of silently returning None. Location: wacore/binary/src/attrs.rs

Unpacking

Reverse of the packing process:
fn unpack_nibble(packed: u8, position: u8) -> u8 {
    let nibble = if position == 0 {
        (packed >> 4) & 0x0F
    } else {
        packed & 0x0F
    };
    
    match nibble {
        0..=9 => b'0' + nibble,
        10 => b'-',
        11 => b'.',
        15 => 0,  // Padding
        _ => panic!("Invalid nibble"),
    }
}
Location: wacore/binary/src/decoder.rs:400-450

Performance Optimizations

Token interning with Cow

When converting decoded NodeRef values to owned Node values, the intern_cow function maps known protocol strings to their static references using the unified hashify lookup:
fn intern_cow(s: &str) -> Cow<'static, str> {
    if let Some(kind) = token::index_of_token(s) {
        let interned = match kind {
            token::TokenKind::Single(idx) => token::get_single_token(idx),
            token::TokenKind::Double(dict, idx) => token::get_double_token(dict, idx),
        };
        if let Some(token) = interned {
            return Cow::Borrowed(token);    // Zero-alloc: points to static str
        }
    }
    Cow::Owned(s.to_string())               // Fallback: heap-allocate unknown strings
}
The unified token map is generated at compile time via hashify (PTHash with FNV-1a), so the single lookup is O(1). Since the vast majority of node tags and attribute keys are part of the WhatsApp token dictionary, this eliminates most heap allocations during protocol decoding. This optimization applies to:
  • Node.tag — protocol tags like "message", "iq", "receipt"
  • Attrs keys — attribute names like "id", "type", "to", "from"
Jid.server is now a Server enum (a Copy type, #[repr(u8)]), so it requires no allocation at all — neither heap nor interning. This is an improvement over the previous Cow<'static, str> approach.
Location: wacore/binary/src/node.rs:108-123

Two-Pass Encoding

For large or variable-size payloads, exact size calculation prevents buffer growth:
pub fn marshal_exact(node: &Node) -> Result<Vec<u8>> {
    // Pass 1: Calculate exact size
    let plan = build_marshaled_node_plan(node);
    
    // Pass 2: Encode directly into fixed-size buffer
    let mut payload = vec![0; plan.size];
    let mut encoder = Encoder::new_slice(&mut payload, Some(&plan.hints))?;
    encoder.write_node(node)?;
    Ok(payload)
}
Location: wacore/binary/src/marshal.rs:67-76

String Hint Cache

Repeated strings (like JIDs) are analyzed once and cached. Strings longer than PACKED_MAX (127 bytes) are immediately classified as RawBytes without running the full classification logic, since they can never be protocol tokens (max 48 bytes), packed nibble/hex, or JIDs:
pub struct StringHintCache {
    hints: Vec<(StrKey, StringHint)>,
}

impl StringHintCache {
    fn hint_or_insert(&mut self, s: &str) -> StringHint {
        // Strings longer than PACKED_MAX can't be tokens,
        // packed nibble/hex, or JIDs — skip classification entirely
        if s.len() > token::PACKED_MAX as usize {
            return StringHint::RawBytes;
        }
        if let Some(existing) = self.hints.iter().find(...) {
            return existing;
        }
        let hint = classify_string_hint(s);
        self.hints.push((key, hint));
        hint
    }
}
The same length check is applied in the uncached write path (write_string_uncached), where strings exceeding PACKED_MAX are emitted directly as raw bytes without classification. This avoids unnecessary work for long strings like message bodies, media URLs, and base64-encoded payloads. Location: wacore/binary/src/encoder.rs:240-287

Capacity estimation

Auto-sizing strategy samples node structure to estimate capacity. The Cow<'static, str> tag on owned Node works transparently since Cow implements Deref<Target = str>:
fn estimate_capacity_node(node: &Node) -> usize {
    let mut estimate = DEFAULT_MARSHAL_CAPACITY + 16;
    estimate += node.tag.len();  // Works with both Cow::Borrowed and Cow::Owned
    estimate += node.attrs.len() * AUTO_ATTR_ESTIMATE;  // ~24 bytes/attr
    
    if let Some(NodeContent::Nodes(children)) = &node.content {
        estimate += children.len() * AUTO_CHILD_ESTIMATE;  // ~96 bytes/child
        
        // Sample first 32 children for better accuracy
        for child in children.iter().take(AUTO_CHILD_SAMPLE_LIMIT) {
            estimate += child.tag.len() + ...
        }
    }
    
    estimate.clamp(DEFAULT_MARSHAL_CAPACITY, AUTO_MAX_HINT_CAPACITY)
}
Location: wacore/binary/src/marshal.rs:167-200

Common Protocol Patterns

IQ (Info/Query) Stanzas

// Request
NodeBuilder::new("iq")
    .attr("id", "ABC123")
    .attr("type", "get")
    .attr("xmlns", "w:g2")
    .attr("to", "@s.whatsapp.net")
    .children(vec![
        NodeBuilder::new("query").build(),
    ])
    .build()

// Response
NodeBuilder::new("iq")
    .attr("id", "ABC123")
    .attr("type", "result")
    .attr("from", "@s.whatsapp.net")
    .children(vec![
        NodeBuilder::new("group")
            .attr("id", "123456@g.us")
            .attr("subject", "My Group")
            .build(),
    ])
    .build()

Messages

NodeBuilder::new("message")
    .attr("to", "15551234567@s.whatsapp.net")
    .attr("type", "text")
    .attr("id", message_id)
    .children(vec![
        NodeBuilder::new("enc")
            .attr("v", "2")
            .attr("type", "msg")
            .bytes(encrypted_payload)
            .build(),
    ])
    .build()

Receipts

NodeBuilder::new("receipt")
    .attr("to", "15551234567@s.whatsapp.net")
    .attr("id", message_id)
    .attr("type", "read")
    .attr("t", timestamp)
    .build()

Wire Format Examples

Simple Message

Node: <message type="text"/>

Binary:
  F8 03           LIST_8(3)  [tag + 2 attrs]
  13              Token("message")
  16              Token("type")
  07              Token("text")

Message with Body

Node: <message type="text"><body>Hi</body></message>

Binary:
  F8 04           LIST_8(4)  [tag + 2 attrs + content]
  13              Token("message")
  16              Token("type")
  07              Token("text")
  F8 02           LIST_8(2)  [child: tag + content]
  07              Token("body")
  FC 02           BINARY_8(2)
  48 69           "Hi"

Debugging Tools

Inspecting Encoded Data

Use evcxr REPL for interactive exploration:
:dep wacore-binary = { path = "wacore/binary" }
:dep hex = "0.4"

use wacore_binary::marshal::unmarshal_ref;
use wacore_binary::builder::NodeBuilder;

// Decode binary data
{
    let data = hex::decode("f8034c1a07").unwrap();
    let node = unmarshal_ref(&data).unwrap();
    println!("Tag: {}", node.tag);
    for (k, v) in node.attrs.iter() {
        println!("  {}: {}", k, v);
    }
}

// Encode and inspect
{
    let node = NodeBuilder::new("message")
        .attr("type", "text")
        .build();
    let bytes = marshal(&node).unwrap();
    println!("Encoded: {:02x?}", bytes);
}

Error Handling

pub enum BinaryError {
    UnexpectedEof,
    InvalidToken(u8),
    InvalidListSize,
    AttrParse(String),
    Jid(JidParseError),     // JID parse failures from AttrParser
    LeftoverData(usize),
    Io(std::io::Error),
}
The Jid variant is emitted by AttrParser::optional_jid when a string attribute fails to parse as a JID. This ensures malformed JIDs in protocol messages are surfaced as typed errors rather than silently ignored. Location: wacore/binary/src/error.rs

References

  • Source: wacore/binary/src/
  • Token dictionary: wacore/binary/src/token.rs
  • Node builder: wacore/binary/src/builder.rs