0

I'm working on a project for my university, requiring to go though decently large files (>50MB), and i used flex to basically rewrite a file composed of line such as

1 1:41 3:54 7:40 13:8
2 4:7 7:8 23:85

into

1 1 3 7 13
2 4 7 23

(basically, turning the number_a:number_b into number_a, and printing the output in a file at execution)

My question is, since i'm writing the rest of my program in C++ : was that a good reflex since flex is supposed to be fast (the fin flex standing for fast), or am I just wrong and there's much simpler and still efficient way to do that in C++ ?

I'm pretty new to C++, so I have a lot of C coding reflexes and little knowledge of all the tools available, and of their performance.

Here is the piece of code i wrote in flex :

%{
#include<stdio.h> 
unsigned int nb_doc = 1;//prend en compte le premier doc
unsigned int i;
%}
couple_entiers      [0-9]+:[0-9]+
retour_chariot      \n[0-9]+
autre               .
%%
{couple_entiers}      {i=0;
                       while(yytext[i] != ':'){
                            printf("%c",yytext[i]);
                            i++;
                         }
                      }
{retour_chariot}      {nb_doc ++; printf("%s",yytext);}
{autre}               {printf("%s",yytext);}

%%
int main (void){
    yylex();
    printf("\n\n%d",nb_doc);
    return 0;
}
splash
  • 13,037
  • 1
  • 44
  • 67
m.raynal
  • 2,983
  • 2
  • 21
  • 34
  • 1
    Flex is fast. But is it fast enough? There are scads of things you could do to try and speed up your program. Also, there are plenty of simple ways you could write the code in pure C++. But are you going to need to do more? If you write a simple "read a line, emit every other number" function in C++, it will be quick but will not 'understand' the lines, and so if you need to add more functionality you will have trouble. Whereas if you keep your flex code, you could easily do something with the second numbers in the pairs. Which is more likely? – aghast Feb 05 '17 at 17:12
  • 1
    Actually, the most likely in this case is that i'm not going to need more functionalities, but simply rising the point makes me conscious of it, which is nice ; and since I know only of the first part of the project, i'll tend to keep it in flex, just to keep the whole thing flexible. – m.raynal Feb 05 '17 at 17:15
  • 1
    @AustinHastings actually the teacher just added some more things to do, such as creating random sets of 'parts' of my file, and few other functionalities : your comment was just golden, thanks again ! – m.raynal Feb 18 '17 at 12:04

1 Answers1

1

Consider replacing your custom code with a general solution:

system("sed -E 's/:\d+ / /g'");

:)

aghast
  • 14,785
  • 3
  • 24
  • 56