1

I have a question which make me think about how to improve speed and memory of system. I will describe it by example, I have this file which have some string:

<e>Customer</e>
    <a1>Customer Id</a1>
    <a2>Customer Name</a2>
<e>Person</e>

It similar to xml file.

Now, my solution is when I read <e>Customer</e>, I will read from that to a nearest tag and then, substring from <e>Customer</e> to a nearest tag.

It make the system need to process so much. I used only regular expression to do it. I thought I will do the same as real compiler which have some phases (lexical analysis, parser).

Any ideas?

Thanks in advance!

Trung Huynh
  • 293
  • 1
  • 5
  • 13
  • 2
    Why not define an actual XML document (you can define your own tags, so I think what you've shown may be already valid...), then just run it though one of the XML parsing libraries? – Clockwork-Muse Apr 05 '13 at 16:07
  • Thanks for quick reply, it's just a challenge for me. I'm searching for open source xml to view, but all of open source use source from jdk. – Trung Huynh Apr 05 '13 at 16:29
  • Although technically the JDK itself seems to be for Linux (deployment), a fair bit of the Java code itself should still be 'portable' - check out [OpenJDK](http://openjdk.java.net/) which is an open-source version of the JDK. – Clockwork-Muse Apr 05 '13 at 17:29

3 Answers3

2

Regular expressions are not the right tool for parsing complex structures like this. Since your file looks a lot like XML, it may make sense to add what's missing to make it XML (i.e. the header), and feed the result to an XML parser.

XML parsers are optimized for processing large volumes of data quickly (especially the SAX kind). You should see a significant improvement in performance if you switch to parsing XML from processing large volumes of text with regular expressions.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2

If you really don't want to use one of the free and reliable xml parsers then a truly fast solution will almost certainly involve a state machine.

See this How to create a simple state machine in java question for a good start.

Please be sure to have a very good reason for taking this route.

Community
  • 1
  • 1
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
1

Just don't invest the time into an XML lexer/parser (its not worth it) and use what is allready out there.

For example http://www.mkyong.com/tutorials/java-xml-tutorials/ is a good tutorial,just use google.

Quonux
  • 2,975
  • 1
  • 24
  • 32