Parsing Email Headers from Scratch: What Every Received Line Actually Means

Q: Can a sender remove or modify Received lines?

A sender can write whatever they want into the `Received:` lines they prepend to a message they are forwarding, but they cannot alter `Received:` lines added by upstream servers (those are above their own line in the stack). For inbound mail, the top-most `Received:` line — the one added by your own MX — is the one you can fully trust.

Q: Why are there sometimes Received lines without `for` clauses?

The `for` clause exposes the envelope recipient, which leaks BCC. Public MTAs like Gmail and Outlook strip the `for` clause from messages they forward externally. Internal hops on a private mail system usually keep it because trace fidelity matters more than recipient privacy inside the perimeter.

Q: What is the difference between `Return-Path:` and the `from` clause of a Received line?

`Return-Path:` is a single header added by the final delivery agent that contains the envelope sender (the `MAIL FROM`). The `from` clause of a `Received:` line is the server's record of who handed it the message at this hop. The two are related — both ultimately come from the SMTP envelope — but `Return-Path:` is canonical for the message, while `Received:` `from` is per-hop.

Q: How do I tell which hop introduced a delay?

Parse each `Received:` timestamp to UTC, then walk the stack in chronological order (bottom to top). The interval between two consecutive timestamps is the time the message spent in transit (plus any queueing) between those two hops. A "normal" inter-MTA hop on the modern internet takes well under a second; anything above a few seconds is a queue, a content scan, or a reputation-based deferral.

A Received: header is the breadcrumb a mail server prepends to every message it accepts. Read top to bottom, a stack of Received: lines tells you the exact route a message took, in reverse chronological order: which server handed it off, which IP it came from, which TLS cipher was used, how long the handoff took, and what the receiving server decided to call it. If you can parse one, you can debug almost any deliverability mystery without guessing.

This post walks through Received: headers from the protocol up. It is written for developers who have to write a parser, a log enricher, an abuse detector, or just a script that figures out why one customer's mail is two hours late. By the end you will be able to read a raw mail log the way a sysadmin reads dmesg.

The format is governed by RFC 5321 section 4.4, which defines the SMTP trace fields, and RFC 5322 section 3.6.7, which defines how trace headers appear in the final message. Together they specify the grammar; every real-world Received: line you will ever see is a (usually loose) implementation of that grammar.

What a Received line is for

Every time an SMTP server accepts a message, it prepends a Received: header at the top of the existing headers. "Prepend" is the magic word: the newest hop is at the top, the original sender is at the bottom. This makes the header stack a literal call trace.

Why this matters in practice:

Path reconstruction. If a message took 4 hours to arrive, the timestamps on each Received: line tell you exactly where the delay was.
Anti-spoofing. A From: header is trivially forgeable, but a Received: line written by Gmail's MX naming the sending IP is not. Headers above the spammer's forgery overwrite anything they could fake.
Abuse reporting. When you file a phishing report, the first Received: line above the spam is the one that matters; it names the IP and the ASN to escalate to.
Forensics. If a message claims to be from your bank but the bottom-most Received: line names a residential IP in another country, you have your answer.

Here is one real Received: line, dissected over the rest of this post:

Received: from mail.example.com (mail.example.com. [203.0.113.42])
        by mx.google.com with ESMTPS id abc123-def456ghi.78.2026.05.14.09.27.04
        for <alice@example.org>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 14 May 2026 09:27:05 -0700 (PDT)

That single line carries nine distinct facts. Let us go through them.

The grammar of a Received line

The formal ABNF in RFC 5321 looks intimidating, but the working grammar is short. A Received: header is a header name (Received:), followed by zero or more from/by/via/with/id/for clauses (in any order, though from and by almost always come first), optionally followed by parenthesized comments, ending in a semicolon and a timestamp.

Practically, every Received: line follows this rough shape:

Received: from <SENDER_HOST> (<SENDER_HOSTNAME_FROM_PTR> [<SENDER_IP>])
        by <RECEIVER_HOST>
        with <PROTOCOL>
        id <QUEUE_ID>
        for <ENVELOPE_RECIPIENT>
        (<COMMENT — usually TLS info>);
        <DATE>

Every one of those tokens is optional except the header name. Servers in the wild routinely skip pieces, reorder them, or stuff extra info into comments. A parser must be tolerant. Let us go field by field.

The `from` clause: who said they were

from mail.example.com (mail.example.com. [203.0.113.42])

The unparenthesized hostname is what the sender claimed during the SMTP HELO or EHLO greeting. The parenthesized hostname is what the receiver got by doing a reverse DNS lookup (PTR) on the sender's IP. The square-bracketed value is the IP itself.

These three values can disagree. When they do, that disagreement is meaningful:

HELO hostname matches PTR → totally normal.
HELO hostname does not match PTR → the sender is either misconfigured or lying. Most spam filters score this negatively.
No PTR (unknown or just the IP in the parens) → the sending IP has no reverse DNS at all. Google's bulk sender guidance requires forward-confirmed reverse DNS for senders above 5,000 messages/day to Gmail.

The IP in brackets is the field your parser should trust above all the others. Hostnames are advisory; IPs are what the TCP stack actually saw.

The `by` clause: who received it

by mx.google.com

This is the receiving server. Combined with the timestamp, it gives you the unambiguous "what host received the handoff at this time." If you are debugging a delay, you reconstruct the trace by reading the by clauses bottom-to-top (oldest to newest) and looking at the gap between each timestamp.

Some implementations include the receiver's IP and software ID:

by mx.google.com with ESMTPS (gsmtp google.com)

The parenthesized comment after by is non-standard but useful; Google, Microsoft, and Yahoo all stuff identifying info there.

The `with` clause: which protocol

with ESMTPS

The with token names the SMTP variant the handoff used. The common values:

Token	Meaning
`SMTP`	Plain old SMTP from RFC 821/5321. Rare today.
`ESMTP`	Extended SMTP, the modern baseline. Uses `EHLO` instead of `HELO`.
`ESMTPS`	ESMTP plus STARTTLS — the connection was encrypted.
`ESMTPA`	ESMTP plus SMTP AUTH — the sender authenticated to the server.
`ESMTPSA`	ESMTP plus STARTTLS plus SMTP AUTH. Common on submission ports (587).
`LMTP`	Local Mail Transfer Protocol, used by mailbox stores like Dovecot.
`BSMTP`	Batch SMTP, used by gateways that batch messages.

ESMTPS and ESMTPSA are what you want to see on hops crossing the public internet. Plain ESMTP (no S) means the message rode an unencrypted link, which is increasingly rare and worth flagging.

The `id` clause: the receiver's queue ID

id abc123-def456ghi.78.2026.05.14.09.27.04

The id is whatever the receiving server wants to call this message internally. It is the unique key in their queue. If you ever have to ask Google or Microsoft to investigate a delivery on your behalf, the id from their Received: line is what their support team will ask for first.

Queue IDs are not standardized; every MTA picks its own format. Postfix uses a 10-12 character base-62 string. Exim uses three sets of letters/digits separated by hyphens (1abcDE-000ABC-1g). Sendmail uses an 8-digit hex string. Microsoft uses a long opaque string starting with a timestamp. You will recognize MTAs by their queue ID format alone after a while.

The `for` clause: the envelope recipient

for <alice@example.org>

The for clause names the SMTP RCPT TO, which is the envelope recipient. This is often different from the To: header. Mailing lists are the classic example: a message addressed To: list@example.org is delivered to each subscriber, and each subscriber's copy will have a for <subscriber@theirhost.com> line — even though the visible To: says the list address.

This is the field that exposes BCC recipients in some configurations. If you BCC boss@yourcompany.com and the message goes through a server that includes for clauses, the boss's copy will carry for <boss@yourcompany.com> and the other recipients will not be able to see it, but the boss can see they were BCCed by reading the trace. Modern public MTAs (Gmail, Outlook) strip the for clause when forwarding to protect BCCs; private MTAs often do not.

The parenthesized TLS comment

(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256)

These are non-standardized but near-universal. The triple of version, cipher, and bits describes the TLS session used for this hop. If you are auditing for downgrade attacks or for hops that fell back to TLS 1.0, this comment is your evidence.

Google adds further detail in their own Received lines:

(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))

Microsoft uses a different vocabulary again. Your parser should treat the parenthesized comment as free-form key-value text and not assume a fixed schema.

The timestamp

Wed, 14 May 2026 09:27:05 -0700 (PDT)

RFC 5322 section 3.3 defines this format precisely. The day-of-week is optional. The timezone offset is required. The parenthesized abbreviation (PDT here) is informational and frequently lies; trust the numeric offset.

For path reconstruction, parse timestamps to UTC. The gap between consecutive Received: timestamps is the time spent in transit between those two hops. A 30-minute gap on a hop that normally takes 200ms is your delay.

Reading the full stack

Here is a complete header trace for a message sent from Outlook to a Gmail user, with the path through a corporate forwarder:

Received: from mail.gmail.com (mail.gmail.com. [142.250.190.5])
        by mx.recipient.com with ESMTPS id ee123.456
        for <user@recipient.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 14 May 2026 09:27:30 -0700 (PDT)
Received: from NAM12-MW2-obe.outbound.protection.outlook.com
        ([52.100.158.42])
        by mx.gmail.com with ESMTPS id m7-20020a05600c138900b003eaf12345
        for <user@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 14 May 2026 09:27:04 -0700 (PDT)
Received: from MW4PR12MB5678.namprd12.prod.outlook.com (2603:10b6:303:178::6)
        by MN2PR12MB1234.namprd12.prod.outlook.com (2603:10b6:208:18a::11)
        with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
        id 15.20.4567.12; Wed, 14 May 2026 16:27:01 +0000
Received: from MW4PR12MB5678.namprd12.prod.outlook.com ([fe80::a1b2:c3d4:5678])
        by MW4PR12MB5678.namprd12.prod.outlook.com ([fe80::a1b2:c3d4:5678%9])
        with Microsoft SMTP Server id 15.20.4567.12 via Mapi; Wed, 14 May 2026 16:27:00 +0000

Reading bottom to top in chronological order:

16:27:00 UTC — Sender's Outlook client deposits the message on their Exchange server MW4PR12MB5678 over MAPI (not SMTP, hence the via Mapi token).
16:27:01 UTC — Exchange relays internally to MN2PR12MB1234 over TLS 1.2.
16:27:04 UTC (= 09:27:04 PDT) — Microsoft's outbound MX NAM12-MW2-obe.outbound.protection.outlook.com hands off to Gmail's MX mx.gmail.com over TLS 1.3.
16:27:30 UTC (= 09:27:30 PDT) — Gmail forwards to the corporate forwarder mx.recipient.com. Note the for <user@recipient.com> — this is the corporate forwarder's view of where to deliver.

The whole trace took 30 seconds. The biggest gap (26 seconds) is between Microsoft's outbound and Gmail's MX, which includes whatever spam scoring Gmail did at the gateway. The forwarding step on the final hop is what kicks in when a Google Workspace domain forwards to an external mailbox; understanding that this is happening explains why the message landed at mx.recipient.com instead of straight at a Gmail mailbox.

Building a parser: a working approach

Here is a minimal, defensive parser pattern that handles real-world Received: lines. The approach: treat the line as a sequence of <token> <value> clauses, where <value> may extend until the next known token or the final ; separator.

import re
from datetime import datetime
from email.utils import parsedate_to_datetime

KNOWN_TOKENS = {"from", "by", "via", "with", "id", "for"}

def parse_received(line: str) -> dict:
    # Strip the header name if present
    if line.lower().startswith("received:"):
        line = line[9:].strip()

    # Split the date off the end
    if ";" in line:
        body, _, date_part = line.rpartition(";")
        try:
            timestamp = parsedate_to_datetime(date_part.strip())
        except (TypeError, ValueError):
            timestamp = None
    else:
        body, timestamp = line, None

    # Normalize whitespace; CFWS comments are kept inline
    body = re.sub(r"\s+", " ", body).strip()

    # Tokenize: walk left-to-right, collecting clauses
    result = {"timestamp": timestamp}
    words = body.split(" ")
    i, current_token = 0, None
    current_value = []

    while i < len(words):
        word = words[i].lower().rstrip(":")
        if word in KNOWN_TOKENS:
            if current_token:
                result[current_token] = " ".join(current_value).strip()
            current_token = word
            current_value = []
        else:
            current_value.append(words[i])
        i += 1
    if current_token:
        result[current_token] = " ".join(current_value).strip()

    return result

What this parser does not handle and a production version must:

Folded headers. RFC 5322 allows a header to wrap across multiple lines with leading whitespace. Unfold (replace \r\n[ \t] with a single space) before parsing.
Nested comments. RFC 5322 comments (in parentheses) can nest. A for clause may contain a comment. Walk parens with a depth counter, not a regex.
Quoted strings. for "<weird recipient>" is legal. Treat double-quoted strings as a single token.
IPv6 addresses in brackets. [2001:db8::1] looks like a normal bracketed IP but contains colons. Do not split on colons.
Timestamp variants. Some old systems write timestamps without commas, with timezone names instead of offsets, or with two-digit years. Python's email.utils.parsedate_to_datetime handles most of these; for the survivors, fall back to dateutil.

Once you have a clean structured record per line, joining them in array order gives you the full trace. Reverse the array to read chronologically.

Authentication-Results: the receiver's verdict

The Received: stack tells you the path. The Authentication-Results: header tells you what the receiver decided about authentication. It is technically separate from the Received: stack but is added by the same hop, so you usually want to read them together.

A typical Authentication-Results: line:

Authentication-Results: mx.google.com;
       dkim=pass header.i=@example.com header.s=selector1 header.b=Abc123de;
       spf=pass (google.com: domain of bounces@example.com designates 198.51.100.1 as
         permitted sender) smtp.mailfrom=bounces@example.com;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=example.com

The format is defined in RFC 8601. A parser should split on ; (after the first, which is the authserv-id), then for each method (dkim=, spf=, dmarc=, arc=, bimi=, etc.) read the result and the ptype.property=value pairs.

For DMARC, the practical question is always: did dmarc=pass, and does header.from match what your application thinks the sender was?

Tools and shortcuts

If you do not want to write your own parser:

Google's Messageheader tool takes a pasted raw header and produces a per-hop summary with timestamps and delays.
Microsoft's Header Analyzer does the same with more detail on Microsoft-specific fields.
Mailneo's email header analyzer parses the full stack including authentication results and explains each field in plain English.
For programmatic use, Python's email module (specifically email.parser.BytesParser) handles RFC 5322 parsing and folded headers correctly. Node's mailparser package does the same.

Common parsing pitfalls

A few things I have personally been burned by:

Reverse order matters. The first Received: line is the last hop. Off-by-one errors here will make you trust the wrong server.

The from hostname is sender-supplied. Anyone running nc smtp.example.com 25 and typing HELO whatever.com will have whatever.com show up in the from clause of the receiver's Received: line. Only the bracketed IP is trustworthy.

Comments can contain anything. Including parentheses, colons, and even the literal strings from or by. A regex-based parser will get this wrong on adversarial inputs.

Some MTAs do not add Received: at all. Internal handoffs between processes on the same machine sometimes skip the trace header. You may see a gap between the user's MUA and the first external MX with no internal hops named.

Time skew is real. A Received: timestamp is whatever the receiving host's clock said at the moment. If two consecutive hops disagree by 30 seconds, that is more likely clock drift than actual transit time.

Key takeaways

A Received: header is prepended at every hop; read the stack top-to-bottom for newest-to-oldest, or bottom-to-top for chronological order.
Trust the bracketed IP and the timestamp; treat the from hostname and the parenthesized comments as sender-supplied data that can be wrong or forged.
Combine the Received: stack with the Authentication-Results: header to get both the path and the receiver's verdict on SPF, DKIM, and DMARC.
For path reconstruction, parse timestamps to UTC and look at the gaps between consecutive hops; that is where delays live.
A defensive parser must handle folded headers, nested comments, quoted strings, IPv6 bracketed addresses, and non-standard date formats. Off-the-shelf libraries (python email, mailparser in Node) save weeks of edge-case work.

Frequently asked questions

Can a sender remove or modify Received lines?

A sender can write whatever they want into the Received: lines they prepend to a message they are forwarding, but they cannot alter Received: lines added by upstream servers (those are above their own line in the stack). For inbound mail, the top-most Received: line — the one added by your own MX — is the one you can fully trust.

Why are there sometimes Received lines without `for` clauses?

The for clause exposes the envelope recipient, which leaks BCC. Public MTAs like Gmail and Outlook strip the for clause from messages they forward externally. Internal hops on a private mail system usually keep it because trace fidelity matters more than recipient privacy inside the perimeter.

What is the difference between `Return-Path:` and the `from` clause of a Received line?

Return-Path: is a single header added by the final delivery agent that contains the envelope sender (the MAIL FROM). The from clause of a Received: line is the server's record of who handed it the message at this hop. The two are related — both ultimately come from the SMTP envelope — but Return-Path: is canonical for the message, while Received: from is per-hop.

My received line says `with ESMTPS` but `version=TLS1_0`. Is that a problem?

Yes. TLS 1.0 has been deprecated for years and is treated as effectively unencrypted by most modern security policies. If a hop in your trace fell back to TLS 1.0 or below, audit the sending and receiving MTAs for outdated configuration. Modern MTAs should negotiate TLS 1.2 minimum, TLS 1.3 preferred.

How do I tell which hop introduced a delay?

Parse each Received: timestamp to UTC, then walk the stack in chronological order (bottom to top). The interval between two consecutive timestamps is the time the message spent in transit (plus any queueing) between those two hops. A "normal" inter-MTA hop on the modern internet takes well under a second; anything above a few seconds is a queue, a content scan, or a reputation-based deferral.

Parsing Email Headers from Scratch: What Every Received Line Actually Means

Table of contents

What a Received line is for

The grammar of a Received line

The `from` clause: who said they were

The `by` clause: who received it

The `with` clause: which protocol

The `id` clause: the receiver's queue ID

The `for` clause: the envelope recipient

The parenthesized TLS comment

The timestamp

Reading the full stack

Building a parser: a working approach

Authentication-Results: the receiver's verdict

Tools and shortcuts

Common parsing pitfalls

Key takeaways

Frequently asked questions

Can a sender remove or modify Received lines?

Why are there sometimes Received lines without `for` clauses?

What is the difference between `Return-Path:` and the `from` clause of a Received line?

My received line says `with ESMTPS` but `version=TLS1_0`. Is that a problem?

How do I tell which hop introduced a delay?

Sohail Hussain

Explore: Email Deliverability

Related Articles

Understanding Email Headers: A Technical Guide

SPF, DKIM, and DMARC for Developers Who Just Want Their App's Email to Land in the Inbox

SPF vs DKIM vs DMARC: Email Authentication Explained

Return-Path vs Reply-To: What Each Email Header Does

Ready to supercharge your email marketing?

Table of contents

What a Received line is for

The grammar of a Received line

The from clause: who said they were

The by clause: who received it

The with clause: which protocol

The id clause: the receiver's queue ID

The for clause: the envelope recipient

The parenthesized TLS comment

The timestamp

Reading the full stack

Building a parser: a working approach

Authentication-Results: the receiver's verdict

Tools and shortcuts

Common parsing pitfalls

Key takeaways

Frequently asked questions

Can a sender remove or modify Received lines?

Why are there sometimes Received lines without for clauses?

What is the difference between Return-Path: and the from clause of a Received line?

My received line says with ESMTPS but version=TLS1_0. Is that a problem?

How do I tell which hop introduced a delay?

Sohail Hussain

Explore: Email Deliverability

Related Articles

Understanding Email Headers: A Technical Guide

SPF, DKIM, and DMARC for Developers Who Just Want Their App's Email to Land in the Inbox

SPF vs DKIM vs DMARC: Email Authentication Explained

Return-Path vs Reply-To: What Each Email Header Does

Ready to supercharge your email marketing?

The `from` clause: who said they were

The `by` clause: who received it

The `with` clause: which protocol

The `id` clause: the receiver's queue ID

The `for` clause: the envelope recipient

Why are there sometimes Received lines without `for` clauses?

What is the difference between `Return-Path:` and the `from` clause of a Received line?

My received line says `with ESMTPS` but `version=TLS1_0`. Is that a problem?