Home » Php » php – Can't understand why Zend_Mail::addHeader() strips newlines

php – Can't understand why Zend_Mail::addHeader() strips newlines

Posted by: admin July 12, 2020 Leave a comment

Questions:

(Since this is my first SO question, let me just say I hope it’s not too Zend-specific. As far as I can tell this shouldn’t be a problem. Although I could have posted it in a Zend-specific forum, I feel like I’m at least as likely to get a good answer here, especially since the answer might involve MIME-related issues that transcend Zend Framework. I’m basically trying to understand whether the issue I’m facing should be considered a ZF bug, or if I’m misunderstanding something or misusing it.)

I’ve been using Zend_Mail to build up a MIME message that gets sent through SendGrid, an email distribution service. Their platform allows you to send emails through their SMTP server, but gives added features when you use a special header (X-SMTPAPI) whose value is a JSON-encoded string of proprietary parameters, which can get quite long.

Eventually, the header I was passing got too long (I think >1000 chars), and I got errors. I was confused because I knew that it was getting passed through PHP’s native wordwrap() function before I passed the value to Zend_Mail::addHeader(), so I thought line length should never be a problem.

It turns out that addHeader() strips newlines very deliberately, and with no particular explanation by way of comments.

// In Zend_Mail::addHeader()
$value = $this->_filterOther($value);


// In Zend_Mail::_filterOther()
$rule = array("\r" => '',
              "\n" => '',
              "\t" => '',
);
return strtr($data, $rule);

Ok, this seemed reasonable at first — maybe ZF wants full control of formatting and line-wrapping. The next method called in Zend_Mail::addHeader() is

$value = $this->_encodeHeader($value);

This method encodes the value (either quoted-printable or base64 as appropriate) and chunks it into lines of appropriate length, but only if it contains “non-printable characters”, as determined by Zend_Mime::isPrintable($value).

Looking into that method, newlines (\n) are indeed considered non-printable characters! So if only they hadn’t been stripped out of the string in the previous method call, the long header would get encoded as QP and chunked into 72-char lines, and everything would work fine. In fact, I did a test where I commented out the call to _filterOther(), and the long header gets encoded and goes through with no problem. But now I’ve just made a careless hack to ZF without really understanding the purpose behind the line I removed, so this can’t be a long-term solution.

My medium-term solution has been to extend Zend_Mail and create a new method, addHeaderForceEncode(), which will always encode the value of the header, and thus always chunk it into short lines. But I’m still not satisfied because I don’t understand why that _filterOther() call was necessary in the first place — maybe I shouldn’t be working around it at all.

Can anyone explain to me why this behaviour exists of stripping newlines? It seems to inevitably lead to situations where a header can get too long if it doesn’t contain any “non-printable characters” other than newlines.

I’ve done a bunch of different searches on this subject and looked through some ZF bug reports, but haven’t seen anyone talking about this. Surprisingly it seems to be a really obscure issue. FYI I’m working with ZF 1.11.11.


Update: In case anyone wants to follow the ZF issue I opened about this, here it is: Zend_Mail::addHeader() UNfolds long headers, then throws exception

How to&Answers:

You’re probably running into a few things. Per RFC 2821, text lines in SMTP can’t exceed 1000 characters:

text line

The maximum total length of a text line including the is
1000 characters (not counting the leading dot duplicated for
transparency). This number may be increased by the use of SMTP
Service Extensions.

A header can’t contain newlines, so that’s probably why Zend is stripping them. For long headers, it’s common to insert a line break (CRLF in SMTP) and a tab to “wrap” them.

Excerpt from RFC 822:

Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called “folding”. The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted.

I would say that the _encodeHeader() function should possibly look at line length, and if the header is longer than some magic value, do the “wrap and tab” to have it span multiple lines.