0

I have the following text (received in an email):

----boundary_3_f515675d-c033-4705-a01e-244d1d6c8368
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

=0D=0ANew Lead from X Akows kl iut Sop=0D=0A=0D=0AName:=0D=0A Mic=
hael Knight=0D=0A =0D=0AEmail Address:=0D=0A <a href=3D"mailto:mi=
ck@emailaddress.co.uk">mick@emailaddress.co.uk</a>=0D=0A =0D=0ATelephon=
e:=0D=0A  00447783112974=0D=0A =0D=0AComments:=0D=0A Please send =
over more details =0D=0A=0D=0BBIOTS Reference:=0D=0A CV1614218=0D=0A=
=0D=0AYour Ref:=0D=0A 12194-109543=0D=0A=0D=0AView Property:=0D=0A=
 http://abropetisd.placudmnsdwlmn.com/CV1614218 =0D=0A=0D=0A =0D=0A=
 ----------------------------------------------------------------=
---------------=0D=0A=0D=0APlease note: You may not pass these de=
tails on to any 3rd parties.=0D=0AThis enquiry was sent to you by=
 X Akows kl iut Sop, txd UK?s #1 klsue fus kwhesena luhdsnry.  Vi=
sit www.placudmnsdwlmn.com for more information.=0D=0AQuestions? =
Email agents@placudmnsdwlmn.com=0D=0A
----boundary_3_f515675d-c033-4705-a01e-244d1d6c8368

I want to parse it in order to obtain certain information.

I need:

Name:
Email Address:
Telephone:
Comments:
Reference:
Your Ref:
View Property:

How can I extract this information using "bash"?

rubo77
  • 19,527
  • 31
  • 134
  • 226
Neil Reardon
  • 65
  • 1
  • 9

2 Answers2

3

Okay, I'll bite. The data is quoted-printable, and we want the plain text version. So let's use Perl, which already has code for this.

#!/usr/bin/perl

use strict;
use PerlIO::via::QuotedPrint;

# Open input file through quoted-printable filter    
$ARGV[0] ne "" or die "No file specified";
open(IN, '<:via(QuotedPrint)', $ARGV[0]) or die "Could not open file";

# needles to search in the haystack.
my @needles = ( 'Name',
                'Email Address',
                'Telephone',
                'Comments',
                'Reference',
                'Your Ref',
                'View Property' );

my $line;
my $key = "";

# handle the file linewise.
foreach $line (<IN>) {

    # The data we want is always one line after the
    # key line, so:

    # If we remember a key
    if($key ne "") {
        # print key and line, reset key variable.
        print "$key =$line";
        $key = "";
    } else {
        # otherwise, see if we find a key in the current line.
        # If so, remember it so that the data in the next line
        # will be printed.
        my $n;
        foreach $n (@needles) {
            if(index($line, $n) != -1) {
                $key = $n;
                last;
            }
        }
    }
}

Put this in a file, say extract.pl, chmod +x it, and run ./extract.pl yourfile.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
  • Nicely done. Looks like at most 1 key is expected per line, so perhaps put a `last;` after `$key = $p` to terminate the loop once a key is found. Finally, a quibble (and it really is just that): "pattern" suggests a regex (or globbing pattern), but you're searching for string _literals_ (which you've named "key" on finding a match). – mklement0 Mar 07 '15 at 14:00
  • The `last;` is a good point; I don't know why I left it out in the first place. As for the patterns, let's go with needles (as found in a haystack). – Wintermute Mar 07 '15 at 14:11
  • Needles to say (if you will), thanks for the update. – mklement0 Mar 07 '15 at 14:28
  • @NeilReardon: For the benefit of both answerers and future readers: If an answer _solved_ your problem, please _accept it_ by clicking the large check mark next to it; if you found it at least _helpful_, please _up-vote_ it by clicking the up-arrow icon. – mklement0 Mar 07 '15 at 14:28
2

Firstly, thank you all for your help.

I have found another way to do this and I would like to post it here.

sed -e 's/=C2=A0/ /g' abc.txt | perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);' | grep "^Interested in:" | cut  -d' ' -f3-

sed -e 's/=C2=A0/ /g' abc.txt | perl -pe 'use MIME::QuotedPrint; $_=MIME::QuotedPrint::decode($_);' | grep "^Name:" | cut  -d' ' -f2-

I am not sure why, but the raw text contained "=C2=A0" which seems to be the same as " ". So I just used "sed" to strip them out.

Best regards,

Neil.

Neil Reardon
  • 65
  • 1
  • 9