7

I have written a program to parse a text file which contains a sample C program with if, else and while condition.

I have 2 ArrayLists and my program will parse through the file. I'm using Matcher and have specified pattern Strings in Pattern.compile(). I am trying to draw a control flow graph for a particular program; however, I'm just finding the nodes for now and will link them up later.

Here is my code:

//import static LineMatcher.ENCODING;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class CFG {

  public void findLines(String aFileName) {
    List<Integer> a = new ArrayList<Integer>();
    List<Integer> b = new ArrayList<Integer>();
    // int [] a = new int[10000];
    // int [] b = new int[10000];
    Pattern regexp = Pattern.compile("if|else|while");
    Matcher exp1 = regexp.matcher("if");
    Matcher exp2 = regexp.matcher("else");
    Matcher exp3 = regexp.matcher("while");

    Path path = Paths.get(aFileName);
    try (BufferedReader reader = Files.newBufferedReader(path, ENCODING);
        LineNumberReader lineReader = new LineNumberReader(reader);) {
      String line = null;
      while ((line = lineReader.readLine()) != null) {
        // exp1.reset(line); //reset the input
        int counter = 1;
        if (exp1.find()) {
          int l = lineReader.getLineNumber();

          b.add(l);
        }
        if (exp2.find()) {
          int l = lineReader.getLineNumber();

          b.add(l);
        }
        if (exp3.find()) {
          int l = lineReader.getLineNumber();

          b.add(l);
        } else {
          int l = lineReader.getLineNumber();
          a.add(l);
        }
      }
      // counter++;

      System.out.println(a);
      System.out.println(b);
    }

    catch (IOException ex) {
      ex.printStackTrace();
    }
  }

  final static Charset ENCODING = StandardCharsets.UTF_8;

  public static void main(String... arguments) {
    CFG lineMatcher = new CFG();
    lineMatcher.findLines("C:Desktop\\test.txt");
  }
}

What I'm trying to do here is, if my String is found, enter the line number in ArrayList b, otherwise enter the line number in ArrayList a. Hence, I know, which lines have if, else and while statements.

I don't know if my code is incorrect or what, the input file is as below :

#include <stdio.h>

int main()
{
  int i=1, sum = 0;  
  if( i = 1)  {
    sum += i;
  }  else
    printf("sum = %d\n", sum);

  return 0;
}

and the output of the program is:

run: 
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
[1, 1, 1]

PS: I'm an amateur, this program could be logically incorrect.

Please let me know if any more information is needed.

EDIT :

Code that works fine for just one string search :

Pattern regexp = Pattern.compile("if");
    Matcher matcher = regexp.matcher("if");


    Path path = Paths.get(aFileName);
    try (
      BufferedReader reader = Files.newBufferedReader(path, ENCODING);
      LineNumberReader lineReader = new LineNumberReader(reader);
    ){
      String line = null;
      while ((line = lineReader.readLine()) != null) {
       matcher.reset(line); //reset the input


       if(matcher.find())
       {

         int a= lineReader.getLineNumber();
         System.out.println(a);
                }

      }      
    }    
    catch (IOException ex){
      ex.printStackTrace();
    }

Above one works fine(its just a part of code, not entire program. program is same as above one) and returns the line number where if is found. I used same logic and added the else and while part.

Taryn
  • 242,637
  • 56
  • 362
  • 405
Polynomial Proton
  • 5,020
  • 20
  • 37
  • 2
    You're using your `Regex` wrong. I understand you want to parse a content of a line, but in your code there is no connection between the line and the regex. What you do is you always test the same strings ("if", "else", "while"). You want to create `Matcher` for each line, like `regexp.matcher(line)`. – lpiepiora May 02 '14 at 06:55
  • 1
    @lpiepiora is right, but you also need to change your regex. The simplest to work is something like `.*(if|else|while).*`. But this will give you incorrect results if the line contains variables with names like the operators (diff,...). – TomasZ. May 02 '14 at 07:04
  • @TomasZ. yes, I dont want to do that. Hence, I used the expression `("if|else|while")` so it detects them only if there is no other string before them. Is that expression correct? or should I use the one you gave here? Thanks. – Polynomial Proton May 02 '14 at 18:21
  • @lpiepiora I'm sorry, I did not quite understand what you said. shouldnt i be specifying the string i need to search in `regexp.matcher(line)` If I specify line instead of string how would it know what string to find. Sorry, if this is a stupid question. I have added a piece a code, in the above post, which works fine for me and returns the line number if I specify just one string. But when I try for multiple it doesnt. – Polynomial Proton May 02 '14 at 18:23
  • 1
    You have a `Pattern`, which defines what you want to find. Imagine a simple regexp `Regex(TheUnknown)`, which would just match your username. The `regexp.matcher(input)` is an application of that definition on the input resulting in a `Matcher`. If you pass different arg. to the `matcher()` you'll get another `Matcher` instance valid for that other input. Given that regexp we've discussed before, if I call `regexp.matcher("TheUnknown")` you'll get a `Matcher` instance which matches it's input, but if I pass `regexp.matcher("lpiepiora")` it will not match. Construct a simpler example and try it – lpiepiora May 02 '14 at 18:41
  • 1
    With the `find` method you can use your regex `if|else|while`. For the `matcher` method you could use mine. But stick to the `find` method, it's clearer. Just keep in mind that for a more complicated source code you will have to tune the regex. `String imgName = "x.gif";` -> this will also be counted as an `if`. But for the start your regex should be enough. – TomasZ. May 02 '14 at 19:31
  • Thanks for the input @lpiepiora and TomasZ. I'll try to work on this further. I actually added a few lines of code to the above one and I'm getting the output needed. I just added `exp1.reset(line); exp2.reset(line); exp3.reset(line);` now its actually detecting the line number and adding it in array, but it adds same line number 3 times like `[5,5,5,9,9,9]` so I'm checking on that. – Polynomial Proton May 02 '14 at 20:07
  • Awesome! Its working now. I'll edit the above code and add the working code.. – Polynomial Proton May 02 '14 at 20:13
  • 1
    @TheUknown cool it's working for you - I wouldn't reset the `Mather` though, I would get a new instance for each line each time. Just move this `Matcher exp1 = regexp.matcher("if|else|while");` in your `while` loop and change to `Matcher exp1 = regexp.matcher(line);` – lpiepiora May 02 '14 at 20:26
  • Perfect, did what you said and it works fine :) I believe this would be an optimized version of my code. I had one small question, if you could help me. How do I know, what pattern was found at what line? It does give me line number 5 and 9, is there a function which would tell me whether it found `if, else or while` and on what line number? – Polynomial Proton May 02 '14 at 20:37
  • Nevermind, it was a stupid question. We use `group()`. Thanks for all the help @lpiepiora – Polynomial Proton May 02 '14 at 20:41
  • 4
    This question appears to be off-topic because it is about a code review and should be on codereview.stackexchange.com – Tetsujin no Oni Jun 06 '14 at 18:10

3 Answers3

1

Finally, I got this working (Thanks for the amazing inputs). below are the changes I made :

public void findLines(String aFileName) {

     List<Integer> a = new ArrayList<Integer>();
     List<Integer> b = new ArrayList<Integer>();

    Pattern regexp = Pattern.compile("(if|else|while).*");
    Matcher exp1 = regexp.matcher("if|else|while");
    Path path = Paths.get(aFileName);
        try (
          BufferedReader reader = Files.newBufferedReader(path, ENCODING);
          LineNumberReader lineReader = new LineNumberReader(reader);
        ){
          String line = null;
          while ((line = lineReader.readLine()) != null) {
          exp1.reset(line); 


            if(exp1.find())
            {
                int l= lineReader.getLineNumber();


                b.add(l);

                     }

            else
            {int l= lineReader.getLineNumber();
                  a.add(l);


               }   

          } 


        System.out.println(a);
        System.out.println(b);
                }


       catch (IOException ex){
         ex.printStackTrace();
        }

The input file is same and the output is :

[1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13]
[5, 9]
Taryn
  • 242,637
  • 56
  • 362
  • 405
Polynomial Proton
  • 5,020
  • 20
  • 37
0

Sounds like you are trying to recognize and work with the grammar of another language. I tried to do this some time ago and ended up scraping my custom code and decided to use the ANTLR API instead. It really cut down the time it took to complete my project. I would recommend you go that route if applicable.

Here is the ANTLR site: http://www.antlr.org/

Deoji
  • 31
  • 3
0

You are doing pattern matching when you are trying to match "if", that expects that the whole line is equal to "if", I think what you need to do is ".if." which would be looking to see if the line contains "if". Since that is the case, use the .contains() method of the string in which you are looking for various statements instead of using regex. It is more efficent.

Deoji
  • 31
  • 3
  • there's an [edit](http://stackoverflow.com/posts/23745054/edit) link for your 1st answer, try including this part into it and then delete this one :) – Frakcool May 19 '14 at 19:00
  • Thanks. I have already added the solution to this. Its in the question itself, at the end. Also `.if.` will not work, what if there is a print statement containing `if` and hence, I used `if.*` to see if start of statement is `if` – Polynomial Proton May 19 '14 at 23:04