Home » Php » PHP – Gmail-like Email Content Separation

PHP – Gmail-like Email Content Separation

Posted by: admin July 12, 2020 Leave a comment

Questions:

I am building a ticket system, but I don’t want to put one of those messages

************************* REPLY ABOVE THIS LINE ***********************

Gmail tends to do a pretty good idea with their “quoted text”. Does anyone know any premade script or method for doing this easily? I am trying to pipe their replies back into our system.

Thanks,
Kerry

How to&Answers:

I think you need something like my full array diff function:

 /** 
        Full Array Diff implemented in pure php, written from scratch. 
        Copyright (C) 2011 Andres Morales <[email protected]> 

        This program is free software; you can redistribute it and/or 
        modify it under the terms of the GNU General Public License 
        as published by the Free Software Foundation; either version 2 
        of the License, or (at your option) any later version. 

        This program is distributed in the hope that it will be useful, 
        but WITHOUT ANY WARRANTY; without even the implied warranty of 
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
        GNU General Public License for more details. 

        You should have received a copy of the GNU General Public License 
        along with this program; if not, write to the Free Software 
        Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA. 

        http://www.gnu.org/licenses/gpl.html 

        About:
        I needed a function to compare a email and its response but array_diff()
        does not cover my expectations. So I reimplement a full array diff function.
        You can use it directly in your code and adopt to your needs.

        Contact:
        [email protected] <Andres Morales> 
**/ 
function farray_diff($array1, $array2){
        $out = array();
        $max_arr = count($array1) > count($array2)? count($array1) : count($array2);

        $i = 0;
        $j = 0;

        while($i < $max_arr && $j< $max_arr){
            if($array1[$i] == $array2[$j]){
                array_push($out, $array1[$i]);
            }
            else {
                if(in_array($array1[$i], array_slice($array2, $j))){
                    for($k = $j; $k<$max_arr; $k++){
                        if($array1[$i]==$array2[$k]){
                            array_push($out, $array2[$k]);
                            $j = $k;
                            break;
                        }
                        else{
                            array_push($out, array('o' => '', 'n' => $array2[$k]));
                        }
                    }
                }
                elseif(in_array($array2[$j], array_slice($array1, $i))){
                    for($k = $i; $k<$max_arr; $k++){
                        if($array2[$j]==$array1[$k]){
                            array_push($out, $array1[$k]);
                            $i = $k;
                            break;
                        }
                        else {
                            array_push($out, array('o' => $array1[$k], 'n' => ''));
                        }
                    }
                }
                else{
                    if(!empty($array1[$i]))
                        array_push($out, array('o' => $array1[$i], 'n' => $array2[$j]));
                    else
                        array_push($out, array('o' => '', 'n' => $array2[$j]));
                }
            }
            $i++; $j++;
        }
        return $out;
    }

So, you can simply use it as in the following example:

$str1 = "This is a simple text that can you reply, so can you do it?";
$str2 = "I response in your text: This is a simple text (no so simple) that can be replied, so can you do it? Yes, I can!";
// Printing the full array diff of single space exploded strings
print_r(farray_diff(explode(' ', $str1), explode(' ', $str2)));

Returns:

Array
(
    [0] => Array
        (
            [o] => 
            [n] => I
        )

    [1] => Array
        (
            [o] => 
            [n] => response
        )

    [2] => Array
        (
            [o] => 
            [n] => in
        )

    [3] => Array
        (
            [o] => 
            [n] => your
        )

    [4] => Array
        (
            [o] => 
            [n] => text:
        )

    [5] => This
    [6] => is
    [7] => a
    [8] => simple
    [9] => text
    [10] => Array
        (
            [o] => 
            [n] => (no
        )

    [11] => Array
        (
            [o] => 
            [n] => so
        )

    [12] => Array
        (
            [o] => 
            [n] => simple)
        )

    [13] => that
    [14] => can
    [15] => Array
        (
            [o] => 
            [n] => be
        )

    [16] => Array
        (
            [o] => 
            [n] => replied,
        )

    [17] => Array
        (
            [o] => 
            [n] => so
        )

    [18] => Array
        (
            [o] => 
            [n] => can
        )

    [19] => you
    [20] => Array
        (
            [o] => reply,
            [n] => 
        )

    [21] => Array
        (
            [o] => so
            [n] => 
        )

    [22] => Array
        (
            [o] => can
            [n] => 
        )

    [23] => Array
        (
            [o] => you
            [n] => 
        )

    [24] => do
    [25] => it?
    [26] => Array
        (
            [o] => 
            [n] => Yes,
        )

    [27] => Array
        (
            [o] => 
            [n] => I
        )

    [28] => Array
        (
            [o] => 
            [n] => can!
        )

It’s like a simple diff, but without “+” and “-“, both have been replaced for after easy parsing with a “o” (for old) and a “n” (for new) array keys. And you can use the following function to parse the result:

function format_response($diff_arr){
    $new = false;
    echo '<span class="old">';
    foreach($diff_arr as $item)
    {
        $content = '';
        if (!is_array($item)){
            $new = false;
            $content = $item;
        }
        else
            if (empty($item['o']) && !empty($item['n'])){
                $new = true;
                $content = $item['n'];
            }

        if($old_new != $new){
            if($new)
                echo '</span><span class="new">';
            else
                echo '</span><span class="old">';
        }

        echo $content . (!empty($content)?' ':'');

        $old_new = $new;
    }
    echo '</span>'; 
}

So, instead of use a simple “print_r” you can parse the array using:

format_response(farray_diff(explode(' ', $str1), explode(' ', $str2)));

And you obtain (following the example) something like this:

<span class="old"></span><span class="new">I response in your text: </span><span class="old">This is a simple text </span><span class="new">(no so simple) </span><span class="old">that can </span><span class="new">be replied, so can </span><span class="old">you do it? </span><span class="new">Yes, I can! </span>

Obviously, to correctly show the result you before need to define the css “old” and “new” classes with some diference, pex: diferent foreground color:

<style>.old{color: #808080;}.new{color:#000000}</style>

for html emails, or you can modify the format_response function to show no-html emails.

NOTE: As you can see my functions are free software and are under the GNU General Public License.

Hope it helps you.

Answer:

You could always use HTML emails and put some sort of separator in HTML comments:

<!-- **********SEPARATOR********** -->

and fall back to a simple

**********SEPARATOR**********

in case the user doesn’t support HTML emails. You simply look for the latter on the emails you’re parsing and it should work fine on both cases (plain text and html).

Answer:

It seem like Gmail is doing some elaborate regex matching on the popular “quoted text” headings, i.e.

—–Original Message—–
From: …
Sent: …
To: …
Subject: …

OR

On <date>, John Smith <email> wrote:

OR

________________
From: …
Sent: …
To: …
Subject: …

And they don’t actually recognize all of them well ….