1

Below is a snippet of the text file format structure

Historical Sales for: 12th of October  2019, 11:37 am

PRODUCT NAME      QUANTITY
Coke B            5

Historical Sales for: 21st of October  2019, 8:15 pm

PRODUCT NAME      QUANTITY
Peanuts           2

I want to process only the column labels and row values, but not including the main heading; in this case, the Historical Sales for: 12th of October 2019, 11:37 am.

This is the code I wrote to process the text using the regex (\\b)

        StringBuilder temporary = new StringBuilder();
   
        InputStream inputStream = new FileInputStream(new File(FILE_NAME));            
        BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));
        
        String next; 
        
        while ((next = readFile.readLine()) != null) {
           temporary.append(next).append("\n");
        }   

        next = String.format("%13s", ""); // spacing for column headers          
        System.out.println(temporary.toString().replaceAll("(\\b)", next));
Marome
  • 47
  • 1
  • 11
  • 2
    `\\b{3}` matches an empty string at each word boundary position three times. So, matching an empty string effectively. `\\b{3}` = `\\b` – Wiktor Stribiżew Jun 18 '20 at 19:51
  • 2
    You should use a pre-formatted text block (code block) for your sample text file, not a _**picture**_ of a text file. —— Is this one single text file that has multiple "Historical Sales for:" headings, or multiple files each with a heading? How consistent are the headings? Maybe you could match and discard the headings as you process the file(s) if the headings themselves fit a pattern. – Stephen P Jun 18 '20 at 19:58
  • 1
    `for (; ;)` is the same as `while ()` — `while` is a much more natural construct for this.... `while ( line = readFile.readLine() ) != null) { if (isHeaderLine(line) { continue; } temporaryData.append..... }` – Stephen P Jun 18 '20 at 20:02
  • Yes it would be nice to clarify or strictly define the file format. – bsaverino Jun 18 '20 at 20:41
  • the regex `(\\b{3})` matches an escape character then b character 3 times. that is whert engine sees –  Jun 18 '20 at 21:05
  • it is simple to parse the text outlined in red in yuior image. did have any parrtakular thangs matcherd ? –  Jun 18 '20 at 21:14
  • @Stephen P it's a single text file containing the heading "Historical Sales for:" in multiple sections – Marome Jun 18 '20 at 22:48
  • 1
    @Edward It matched the entire text within the file, which is not what I want. My aim was to completely discard the headings **Historical Sales for:** and the corresponding date with time – Marome Jun 18 '20 at 22:53
  • this is the regex in yuior post `(\\b{3})` , the regx engine will match this `\bbb` and only tyhat ! what does 'Historical sites' have to do with that regx ? –  Jun 21 '20 at 18:16
  • @Edward, to add spaces in the"Historical sales for" using the format specifier `%13s`. I stand corrected, as @Wiktor Stribiżew, clearly indicated the inefficiency of my regex with the statement `\\b{3} = \\b` – Marome Jun 21 '20 at 22:05
  • lost me, i think yuior not understandering. `\\b{3} = \\b ` has no context, again, if the regex engine sees `\\b{3}` it will match \ + `bbb` wherass `\\b` will match \ + `b` .. thekse are cleearly not the same ie `\\b{3}` != `\\b ` –  Jun 22 '20 at 17:13
  • @Edward, I do understand the usage of quantifiers and the exact meaning of the `b{3}`. Thank you for the constructive criticism, I'll read further into regex – Marome Jun 22 '20 at 22:39
  • @Edward, just a note, the "\" won't be matched. Reason being that the **backslash** is an escape character in Java, so the regex `\b` is the equivalent of `\\b` in the Java language. Here's a reference, https://www.baeldung.com/java-regexp-escape-char, found under **Escaping Using Backslash** – Marome Jun 22 '20 at 23:04
  • 1
    the regex `\\b` matches a \ + b. demo - > https://regex101.com/r/XIvBBb/1 . As well, there is no `\\b in the Java language` that I know of. –  Jun 25 '20 at 16:04
  • @Edward, thanks for the demo, It looks extremely helpful. As for the regex `\\b` interpretation, please read here https://stackoverflow.com/questions/8777982/issue-with-java-regex-b for more clarity on the usage of `\b` and `\\b` in the Java Language. – Marome Jun 25 '20 at 22:55
  • @Edward, here's a demo http://tpcg.io/BaJ2MYc4 I hope that clarifies everything for you – Marome Jun 26 '20 at 01:33

1 Answers1

2

If your intention is to print just the lines:

PRODUCT NAME      QUANTITY
Chips             2
Coke B            5

And similares. I suggest you use Java 8 streams and use the regex bellow to remove the unwanted lines:

public static void main(String[] args) throws Exception {
    String collect = Files.lines(Paths.get("file.txt"))
            .filter(line -> !line.matches("^Historical Sales for.*$") && !line.matches("^\\s*$"))
            .map(line -> line+="\n")
            .collect(Collectors.joining());
    System.out.println(collect);
}

This way you'll have:

PRODUCT NAME      QUANTITY
Chips             2
Coke B            5
PRODUCT NAME      QUANTITY
(...)

One advantage of using Streams is the .collect() method that allows you to parse the string directly into a List.

If you want to keep your example, you could do:

StringBuilder temporaryData = new StringBuilder();

InputStream inputStream = new FileInputStream(new File("file.txt"));
BufferedReader readFile = new BufferedReader(new InputStreamReader(inputStream));

String next;

while ((next = readFile.readLine()) != null) {
    temporaryData.append(next).append("\n");
}

next = String.format("%13s", ""); // spacing for column headers
String formattedString = temporaryData.toString().replaceAll("(\\b{3})", next);
String stringWithoutHeaders = formattedString.replaceAll("^Historical Sales for.*$", "").replaceAll("^\\s*$", "");
System.out.println(stringWithoutHeaders);
fjsv
  • 705
  • 10
  • 23