Parsing Email Headers from Scratch: What Every Received Line Actually Means
Email headers are a stack of metadata appended every time a message hops between servers. This is a developer-focused walkthrough of how to parse Received lines, what each token means, and how to reconstruct a message's full delivery path from raw text.
Sohail Hussain15 min readA Received: header is the breadcrumb a mail server prepends to every message it accepts. Read top to bottom, a stack of Received: lines tells you the exact route a message took, in reverse chronological order: which server handed it off, which IP it came from, which TLS cipher was used, how long the handoff took, and what the receiving server decided to call it. If you can parse one, you can debug almost any deliverability mystery without guessing.
This post walks through Received: headers from the protocol up. It is written for developers who have to write a parser, a log enricher, an abuse detector, or just a script that figures out why one customer's mail is two hours late. By the end you will be able to read a raw mail log the way a sysadmin reads dmesg.
The format is governed by RFC 5321 section 4.4, which defines the SMTP trace fields, and RFC 5322 section 3.6.7, which defines how trace headers appear in the final message. Together they specify the grammar; every real-world Received: line you will ever see is a (usually loose) implementation of that grammar.
Table of contents
What a Received line is for
Every time an SMTP server accepts a message, it prepends a Received: header at the top of the existing headers. "Prepend" is the magic word: the newest hop is at the top, the original sender is at the bottom. This makes the header stack a literal call trace.
Why this matters in practice:
- Path reconstruction. If a message took 4 hours to arrive, the timestamps on each
Received:line tell you exactly where the delay was. - Anti-spoofing. A
From:header is trivially forgeable, but aReceived:line written by Gmail's MX naming the sending IP is not. Headers above the spammer's forgery overwrite anything they could fake. - Abuse reporting. When you file a phishing report, the first
Received:line above the spam is the one that matters; it names the IP and the ASN to escalate to. - Forensics. If a message claims to be from your bank but the bottom-most
Received:line names a residential IP in another country, you have your answer.
Here is one real Received: line, dissected over the rest of this post:
Received: from mail.example.com (mail.example.com. [203.0.113.42])
by mx.google.com with ESMTPS id abc123-def456ghi.78.2026.05.14.09.27.04
for <alice@example.org>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Wed, 14 May 2026 09:27:05 -0700 (PDT)
That single line carries nine distinct facts. Let us go through them.
The grammar of a Received line
The formal ABNF in RFC 5321 looks intimidating, but the working grammar is short. A Received: header is a header name (Received:), followed by zero or more from/by/via/with/id/for clauses (in any order, though from and by almost always come first), optionally followed by parenthesized comments, ending in a semicolon and a timestamp.
Practically, every Received: line follows this rough shape:
Received: from <SENDER_HOST> (<SENDER_HOSTNAME_FROM_PTR> [<SENDER_IP>])
by <RECEIVER_HOST>
with <PROTOCOL>
id <QUEUE_ID>
for <ENVELOPE_RECIPIENT>
(<COMMENT — usually TLS info>);
<DATE>
Every one of those tokens is optional except the header name. Servers in the wild routinely skip pieces, reorder them, or stuff extra info into comments. A parser must be tolerant. Let us go field by field.
The from clause: who said they were
from mail.example.com (mail.example.com. [203.0.113.42])
The unparenthesized hostname is what the sender claimed during the SMTP HELO or EHLO greeting. The parenthesized hostname is what the receiver got by doing a reverse DNS lookup (PTR) on the sender's IP. The square-bracketed value is the IP itself.
These three values can disagree. When they do, that disagreement is meaningful:
- HELO hostname matches PTR → totally normal.
- HELO hostname does not match PTR → the sender is either misconfigured or lying. Most spam filters score this negatively.
- No PTR (
unknownor just the IP in the parens) → the sending IP has no reverse DNS at all. Google's bulk sender guidance requires forward-confirmed reverse DNS for senders above 5,000 messages/day to Gmail.
The IP in brackets is the field your parser should trust above all the others. Hostnames are advisory; IPs are what the TCP stack actually saw.
The by clause: who received it
by mx.google.com
This is the receiving server. Combined with the timestamp, it gives you the unambiguous "what host received the handoff at this time." If you are debugging a delay, you reconstruct the trace by reading the by clauses bottom-to-top (oldest to newest) and looking at the gap between each timestamp.
Some implementations include the receiver's IP and software ID:
by mx.google.com with ESMTPS (gsmtp google.com)
The parenthesized comment after by is non-standard but useful; Google, Microsoft, and Yahoo all stuff identifying info there.
The with clause: which protocol
with ESMTPS
The with token names the SMTP variant the handoff used. The common values:
| Token | Meaning |
|---|---|
SMTP | Plain old SMTP from RFC 821/5321. Rare today. |
ESMTP | Extended SMTP, the modern baseline. Uses EHLO instead of HELO. |
ESMTPS | ESMTP plus STARTTLS — the connection was encrypted. |
ESMTPA | ESMTP plus SMTP AUTH — the sender authenticated to the server. |
ESMTPSA | ESMTP plus STARTTLS plus SMTP AUTH. Common on submission ports (587). |
LMTP | Local Mail Transfer Protocol, used by mailbox stores like Dovecot. |
BSMTP | Batch SMTP, used by gateways that batch messages. |
ESMTPS and ESMTPSA are what you want to see on hops crossing the public internet. Plain ESMTP (no S) means the message rode an unencrypted link, which is increasingly rare and worth flagging.
The id clause: the receiver's queue ID
id abc123-def456ghi.78.2026.05.14.09.27.04
The id is whatever the receiving server wants to call this message internally. It is the unique key in their queue. If you ever have to ask Google or Microsoft to investigate a delivery on your behalf, the id from their Received: line is what their support team will ask for first.
Queue IDs are not standardized; every MTA picks its own format. Postfix uses a 10-12 character base-62 string. Exim uses three sets of letters/digits separated by hyphens (1abcDE-000ABC-1g). Sendmail uses an 8-digit hex string. Microsoft uses a long opaque string starting with a timestamp. You will recognize MTAs by their queue ID format alone after a while.
The for clause: the envelope recipient
for <alice@example.org>
The for clause names the SMTP RCPT TO, which is the envelope recipient. This is often different from the To: header. Mailing lists are the classic example: a message addressed To: list@example.org is delivered to each subscriber, and each subscriber's copy will have a for <subscriber@theirhost.com> line — even though the visible To: says the list address.
This is the field that exposes BCC recipients in some configurations. If you BCC boss@yourcompany.com and the message goes through a server that includes for clauses, the boss's copy will carry for <boss@yourcompany.com> and the other recipients will not be able to see it, but the boss can see they were BCCed by reading the trace. Modern public MTAs (Gmail, Outlook) strip the for clause when forwarding to protect BCCs; private MTAs often do not.
The parenthesized TLS comment
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256)
These are non-standardized but near-universal. The triple of version, cipher, and bits describes the TLS session used for this hop. If you are auditing for downgrade attacks or for hops that fell back to TLS 1.0, this comment is your evidence.
Google adds further detail in their own Received lines:
(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits))
Microsoft uses a different vocabulary again. Your parser should treat the parenthesized comment as free-form key-value text and not assume a fixed schema.
The timestamp
Wed, 14 May 2026 09:27:05 -0700 (PDT)
RFC 5322 section 3.3 defines this format precisely. The day-of-week is optional. The timezone offset is required. The parenthesized abbreviation (PDT here) is informational and frequently lies; trust the numeric offset.
For path reconstruction, parse timestamps to UTC. The gap between consecutive Received: timestamps is the time spent in transit between those two hops. A 30-minute gap on a hop that normally takes 200ms is your delay.
Reading the full stack
Here is a complete header trace for a message sent from Outlook to a Gmail user, with the path through a corporate forwarder:
Received: from mail.gmail.com (mail.gmail.com. [142.250.190.5])
by mx.recipient.com with ESMTPS id ee123.456
for <user@recipient.com>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Wed, 14 May 2026 09:27:30 -0700 (PDT)
Received: from NAM12-MW2-obe.outbound.protection.outlook.com
([52.100.158.42])
by mx.gmail.com with ESMTPS id m7-20020a05600c138900b003eaf12345
for <user@gmail.com>
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Wed, 14 May 2026 09:27:04 -0700 (PDT)
Received: from MW4PR12MB5678.namprd12.prod.outlook.com (2603:10b6:303:178::6)
by MN2PR12MB1234.namprd12.prod.outlook.com (2603:10b6:208:18a::11)
with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
id 15.20.4567.12; Wed, 14 May 2026 16:27:01 +0000
Received: from MW4PR12MB5678.namprd12.prod.outlook.com ([fe80::a1b2:c3d4:5678])
by MW4PR12MB5678.namprd12.prod.outlook.com ([fe80::a1b2:c3d4:5678%9])
with Microsoft SMTP Server id 15.20.4567.12 via Mapi; Wed, 14 May 2026 16:27:00 +0000
Reading bottom to top in chronological order:
- 16:27:00 UTC — Sender's Outlook client deposits the message on their Exchange server
MW4PR12MB5678over MAPI (not SMTP, hence thevia Mapitoken). - 16:27:01 UTC — Exchange relays internally to
MN2PR12MB1234over TLS 1.2. - 16:27:04 UTC (= 09:27:04 PDT) — Microsoft's outbound MX
NAM12-MW2-obe.outbound.protection.outlook.comhands off to Gmail's MXmx.gmail.comover TLS 1.3. - 16:27:30 UTC (= 09:27:30 PDT) — Gmail forwards to the corporate forwarder
mx.recipient.com. Note thefor <user@recipient.com>— this is the corporate forwarder's view of where to deliver.
The whole trace took 30 seconds. The biggest gap (26 seconds) is between Microsoft's outbound and Gmail's MX, which includes whatever spam scoring Gmail did at the gateway. The forwarding step on the final hop is what kicks in when a Google Workspace domain forwards to an external mailbox; understanding that this is happening explains why the message landed at mx.recipient.com instead of straight at a Gmail mailbox.
Building a parser: a working approach
Here is a minimal, defensive parser pattern that handles real-world Received: lines. The approach: treat the line as a sequence of <token> <value> clauses, where <value> may extend until the next known token or the final ; separator.
import re
from datetime import datetime
from email.utils import parsedate_to_datetime
KNOWN_TOKENS = {"from", "by", "via", "with", "id", "for"}
def parse_received(line: str) -> dict:
# Strip the header name if present
if line.lower().startswith("received:"):
line = line[9:].strip()
# Split the date off the end
if ";" in line:
body, _, date_part = line.rpartition(";")
try:
timestamp = parsedate_to_datetime(date_part.strip())
except (TypeError, ValueError):
timestamp = None
else:
body, timestamp = line, None
# Normalize whitespace; CFWS comments are kept inline
body = re.sub(r"\s+", " ", body).strip()
# Tokenize: walk left-to-right, collecting clauses
result = {"timestamp": timestamp}
words = body.split(" ")
i, current_token = 0, None
current_value = []
while i < len(words):
word = words[i].lower().rstrip(":")
if word in KNOWN_TOKENS:
if current_token:
result[current_token] = " ".join(current_value).strip()
current_token = word
current_value = []
else:
current_value.append(words[i])
i += 1
if current_token:
result[current_token] = " ".join(current_value).strip()
return result
What this parser does not handle and a production version must:
- Folded headers. RFC 5322 allows a header to wrap across multiple lines with leading whitespace. Unfold (replace
\r\n[ \t]with a single space) before parsing. - Nested comments. RFC 5322 comments (in parentheses) can nest. A
forclause may contain a comment. Walk parens with a depth counter, not a regex. - Quoted strings.
for "<weird recipient>"is legal. Treat double-quoted strings as a single token. - IPv6 addresses in brackets.
[2001:db8::1]looks like a normal bracketed IP but contains colons. Do not split on colons. - Timestamp variants. Some old systems write timestamps without commas, with timezone names instead of offsets, or with two-digit years. Python's
email.utils.parsedate_to_datetimehandles most of these; for the survivors, fall back to dateutil.
Once you have a clean structured record per line, joining them in array order gives you the full trace. Reverse the array to read chronologically.
Authentication-Results: the receiver's verdict
The Received: stack tells you the path. The Authentication-Results: header tells you what the receiver decided about authentication. It is technically separate from the Received: stack but is added by the same hop, so you usually want to read them together.
A typical Authentication-Results: line:
Authentication-Results: mx.google.com;
dkim=pass header.i=@example.com header.s=selector1 header.b=Abc123de;
spf=pass (google.com: domain of bounces@example.com designates 198.51.100.1 as
permitted sender) smtp.mailfrom=bounces@example.com;
dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=example.com
The format is defined in RFC 8601. A parser should split on ; (after the first, which is the authserv-id), then for each method (dkim=, spf=, dmarc=, arc=, bimi=, etc.) read the result and the ptype.property=value pairs.
For DMARC, the practical question is always: did dmarc=pass, and does header.from match what your application thinks the sender was?
Tools and shortcuts
If you do not want to write your own parser:
- Google's Messageheader tool takes a pasted raw header and produces a per-hop summary with timestamps and delays.
- Microsoft's Header Analyzer does the same with more detail on Microsoft-specific fields.
- Mailneo's email header analyzer parses the full stack including authentication results and explains each field in plain English.
- For programmatic use, Python's
emailmodule (specificallyemail.parser.BytesParser) handles RFC 5322 parsing and folded headers correctly. Node'smailparserpackage does the same.
Common parsing pitfalls
A few things I have personally been burned by:
Reverse order matters. The first Received: line is the last hop. Off-by-one errors here will make you trust the wrong server.
The from hostname is sender-supplied. Anyone running nc smtp.example.com 25 and typing HELO whatever.com will have whatever.com show up in the from clause of the receiver's Received: line. Only the bracketed IP is trustworthy.
Comments can contain anything. Including parentheses, colons, and even the literal strings from or by. A regex-based parser will get this wrong on adversarial inputs.
Some MTAs do not add Received: at all. Internal handoffs between processes on the same machine sometimes skip the trace header. You may see a gap between the user's MUA and the first external MX with no internal hops named.
Time skew is real. A Received: timestamp is whatever the receiving host's clock said at the moment. If two consecutive hops disagree by 30 seconds, that is more likely clock drift than actual transit time.
Key takeaways
- A
Received:header is prepended at every hop; read the stack top-to-bottom for newest-to-oldest, or bottom-to-top for chronological order. - Trust the bracketed IP and the timestamp; treat the
fromhostname and the parenthesized comments as sender-supplied data that can be wrong or forged. - Combine the
Received:stack with theAuthentication-Results:header to get both the path and the receiver's verdict on SPF, DKIM, and DMARC. - For path reconstruction, parse timestamps to UTC and look at the gaps between consecutive hops; that is where delays live.
- A defensive parser must handle folded headers, nested comments, quoted strings, IPv6 bracketed addresses, and non-standard date formats. Off-the-shelf libraries (
python email,mailparserin Node) save weeks of edge-case work.
Frequently asked questions
Can a sender remove or modify Received lines?
A sender can write whatever they want into the Received: lines they prepend to a message they are forwarding, but they cannot alter Received: lines added by upstream servers (those are above their own line in the stack). For inbound mail, the top-most Received: line — the one added by your own MX — is the one you can fully trust.
Why are there sometimes Received lines without for clauses?
The for clause exposes the envelope recipient, which leaks BCC. Public MTAs like Gmail and Outlook strip the for clause from messages they forward externally. Internal hops on a private mail system usually keep it because trace fidelity matters more than recipient privacy inside the perimeter.
What is the difference between Return-Path: and the from clause of a Received line?
Return-Path: is a single header added by the final delivery agent that contains the envelope sender (the MAIL FROM). The from clause of a Received: line is the server's record of who handed it the message at this hop. The two are related — both ultimately come from the SMTP envelope — but Return-Path: is canonical for the message, while Received: from is per-hop.
My received line says with ESMTPS but version=TLS1_0. Is that a problem?
Yes. TLS 1.0 has been deprecated for years and is treated as effectively unencrypted by most modern security policies. If a hop in your trace fell back to TLS 1.0 or below, audit the sending and receiving MTAs for outdated configuration. Modern MTAs should negotiate TLS 1.2 minimum, TLS 1.3 preferred.
How do I tell which hop introduced a delay?
Parse each Received: timestamp to UTC, then walk the stack in chronological order (bottom to top). The interval between two consecutive timestamps is the time the message spent in transit (plus any queueing) between those two hops. A "normal" inter-MTA hop on the modern internet takes well under a second; anything above a few seconds is a queue, a content scan, or a reputation-based deferral.
Explore: Email Deliverability
Related Articles
Understanding Email Headers: A Technical Guide
Email headers are the metadata that rides along with every message; they tell you where a mail came from, every server it touched, whether SPF, DKIM, and DMARC passed, and why a message got delayed, bounced, or flagged as spam.
SPF, DKIM, and DMARC for Developers Who Just Want Their App's Email to Land in the Inbox
A practical, opinionated walkthrough of the three DNS records your app needs to send transactional or product email that doesn't land in spam. Written for developers who would rather ship than read RFCs.
SPF vs DKIM vs DMARC: Email Authentication Explained
SPF, DKIM, and DMARC are three DNS-based email authentication standards that together verify senders, protect message integrity, and tell inbox providers what to do with spoofed mail. Skip any one of them and your deliverability suffers.
Transactional vs Marketing Emails: Key Differences
Transactional vs marketing email comes down to purpose and consent; a transactional email completes a transaction the recipient already started (a receipt, a password reset), while a marketing email promotes something. They need different infrastructure, different consent, and different legal treatment under CAN-SPAM and GDPR.
Ready to supercharge your email marketing?
Start sending smarter emails with AI-powered campaigns. No credit card required.
Get Started Free