0

I have to do some file processing(retrieve some value from a file and return it) in java.I can easily do this in Perl.I have written a perl script but i am unable to run it from java file.

Can anybody please help me as to how to run it or if i can do this in java alone.

My java code which calls perl script is:

while ((output = responseBuffer.readLine()) != null) {

    if(output.contains(hidden_tag)){
      String toRun = "perl PatternSearch.pl "+ output;
      Process p = Runtime.getRuntime().exec(toRun);
    }
}
sakura
  • 2,249
  • 2
  • 26
  • 39

1 Answers1

0

Not sure what you mean here, but if the "value to retrieve" is done using regexes, I have just completed a project allowing to use regexes on large text files (max 2 GiB approximately, due to the int limit of a CharSequence):

private static final LargeTextFactory FACTORY
    = LargeTextFactory.defaultFactory();

public List<String> getMatchesForPattern(final Path file, final String re)
    throws IOException
{
    final Pattern p = Pattern.compile(re);
    final List<String> ret = new ArrayList<>();

    try (
        final LargeText text = FACTORY.fromPath(file);
    ) {
        final Matcher m = p.matcher(text);
        while (m.find())
            ret.add(m.group());
        return ret;
    }
}

No need to use perl anymore ;) Except if you use perl-specific regex features, of course


Final note: if your input is structured, ie an HTML input etc, you are better off using a dedicated library; for HTML, that would be jsoup for instance.

fge
  • 119,121
  • 33
  • 254
  • 329
  • Java has many regex deficiencies. –  Mar 31 '14 at 05:27
  • @sin cite one... They work pretty well as far as I'm concerned. Of course, they are not a builtin part of the language like perl – fge Mar 31 '14 at 05:29
  • I would set up [the perl script as a CGI](http://perldoc.perl.org/CGI.html) and access it using [Apache HTTPClient](http://hc.apache.org/) as a web service. If you would like further assistance, post your code and I'd be happy to expand this into an answer. – hd1 Mar 31 '14 at 05:40
  • For Java7: recursion, conditionals, \g named/numbered backrefs, branch reset, named uni char. Java 6 has a few more. –  Mar 31 '14 at 06:01
  • "Input is structured" - do you mean if the regular expression needs to be able to count or otherwise keep state? – Thorbjørn Ravn Andersen Mar 31 '14 at 06:10
  • Hi, Thanks for the answers.I was able to solve my problem using java.util.regex.pattern and matcher.Below is the java code,in case anybody else needs it :) String pattern = "^\t*($)"; String hiddenKey = null; String hiddenValue = null; Pattern r = Pattern.compile(pattern); Matcher m = r.matcher(output); if (m.find()) { hiddenKey = m.group(2); hiddenValue = m.group(4); hiddenHashMap.put(hiddenKey, hiddenValue); } – user3479834 Mar 31 '14 at 06:13
  • @sin if you need a whole grammar, in Java, you won't use regexes; you will use something like, said, parboiled, antlr or javacc. Or a dedicated library for the particular file format. – fge Mar 31 '14 at 06:15
  • @ThorbjørnRavnAndersen bad wording; I really mean that if a dedicated package exists for this or that format (HTML, XML, CSV, etc etc) you are better off using a dedicated library than regexes. I mean, even perl has `HTML::Parser` – fge Mar 31 '14 at 06:16
  • @fge - Don't know what you mean. What in my list made you think whole grammar. Recursion is a misnomer, its just a function call. Those items mentioned are a very big deal. –  Mar 31 '14 at 06:48