0

I have extracted data from one source to .txt file. Source is some sort of address book and I used macro recorder for extraction. Now I have several files which are formated exactly in next way (example on 4 contacts):

Abbrucharbeiten
ATR Armbruster
Werkstr. 28
78727 Oberndorf  
Tel. 0175 7441784
Fax 07423 6280
Abbrucharbeiten
Jensen & Sohn, Karl
Schallenberg 6A
25587 Münsterdorf
Tel. 04821 82538
Fax 04821 83381
Abbrucharbeiten
Kiwitt, R.
Auf der Heide 54
48282 Emsdetten
Tel. 02572 88559
Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau
Josef Grabmeier GmbH
Reitgesing 1
85560 Ebersberg
Tel. 08092 24701-0
Fax 08092 24701-24

1st row is always field(name) of bussines 2nd row is always name of company/firm 3rd row is always street adress 4th row is always Zip code and Place and then 5th row and next couple of rows (sometimes are two rows sometimes more) are eithar Tel. or Fax.

I want to format it so it would be something like excel sheet like:

Branche:    Name:     Address:   Place:    contact1:   contact2:
1st row     2nd row   3rd row    4th row   5th row     6th row.....

Now the main problem is I have over 500.000 contacts and my main problems are last fields which aren't always the same number... I don't wan't to do it manually, please help me...

pnuts
  • 58,317
  • 11
  • 87
  • 139
dzordz
  • 2,277
  • 13
  • 51
  • 74
  • so do you want to write some code to do the extraction? – Nida Sahar May 29 '13 at 08:34
  • if it's possible to sort this out with some code than yes, I have some little experience on python and vbs, but not so much to solve this – dzordz May 29 '13 at 08:36
  • 1
    yes, each `name` has a `Branche` althought, `branche` isn't always the same, in upper example you can see, sometimes it's just one item, sometimes it's several items, for example, sometimes is only `'Abbrucharbeiten'` or `'Transporte'` and sometimes its `'Abbrucharbeiten, Transporte'` – dzordz May 29 '13 at 09:17

1 Answers1

2

Neither python nor visual basic but shouldn't be very difficult to translate to those languages. This is perl.

perl -lne '
        ## Print header. Either the header and data will be separated with pipes.
        ## Contacts(contact1, contact2, etc) are not included because at this 
        ## moment I can not know how many there will be. It could be done but script
        ## would be far more complex.
        BEGIN { 
                push @header, q|Branche:|, q|Name:|, q|Address:|, q|Place:|;
                printf qq|%s\n|, join q{|}, @header;
        }

        ## Save information for each contact. At least six lines. Over that only
        ## if lines begins with strings "Tel" or "Fax".
        if ( $line < 6 || m/\A(?i)tel|fax/ ) {
                push @contact_info, $_;
                ++$line;

                ## Not skip the printing of last contact.
                next unless eof;
        }

        ## Print info of contact, initialize data structures and repeat process
        ## for the next one.
        printf qq|%s\n|, join q{|}, @contact_info;

        $line = 0;
        undef @contact_info;

        push @contact_info, $_;
        ++$line;

' infile

It's a one-liner (I know it doesn't seem, but you can get rid of comments and remove newlines to get it), so run it directly from your shell. It yields:

Branche:|Name:|Address:|Place:
Abbrucharbeiten|ATR Armbruster|Werkstr. 28|78727 Oberndorf  |Tel. 0175 7441784|Fax 07423 6280
Abbrucharbeiten|Jensen & Sohn, Karl|Schallenberg 6A|25587 Münsterdorf|Tel. 04821 82538|Fax 04821 83381
Abbrucharbeiten|Kiwitt, R.|Auf der Heide 54|48282 Emsdetten|Tel. 02572 88559|Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau|Josef Grabmeier GmbH|Reitgesing 1|85560 Ebersberg|Tel. 08092 24701-0|Fax 08092 24701-24

Take into account that I didn't print the full header and that fields are separated with pipes. I think that is not problematic to import it in Excel.

Birei
  • 35,723
  • 2
  • 77
  • 82
  • @pnuts: I don't know what do you mean with a column for the fax. As I understand the question, some contacts couldn't have any faxes while other have several of them. Same for telephones. Anyway, it's a start to solve his problem, and it's very probably he will need to customize it to suit his needs. – Birei May 29 '13 at 09:46