0

I have a file from which I want to parse specific values. How do I put all three of the following regular expressions together to return a single group of entries for each test, whether it has measurements or not and whether it has errors or not AND include the measurements and errors if there are any? There can be any number of tests, any number of measures in a test, but a single error in a test with no other measures. I have tried many different combinations without success. I figure I need to use lookahead and alternation but haven't found the right combination. FYI, The regular expression is stored in a db and used by a C# application. Thanks in advance!

Input file:

<event>
<common>
    <event_start_time>2014-01-29T17:30:36</event_start_time>
    <operator>10586546</operator>
    <shift>A</shift>
    <program>PPM</program>
    <program_revision>eo01</program_revision>
</common>
<test_instance>
<teststart startid = "ABCDEF">
        <test>MB</test>
        <test_start_time>2014-01-29T17:30:39</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>CheckVersion</subtest>
        <subtest_number>1</subtest_number>
    </teststart>
    <testend endid = "ABCDEF">
        <test_result>PASS</test_result>
        <test_duration duration_units="millisec">1000</test_duration>
    </testend>
    <teststart startid = "CDEFG">
        <test>MB</test>
        <test_start_time>2014-01-29T17:30:40</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>Program1</subtest>
        <subtest_number>2</subtest_number>
    </teststart>
    <measurement measid = "CDEFG">
        <measurement_name>CycleCounter </measurement_name>
        <numeric_measurement> 1</numeric_measurement>
        <measurement_time>2014-01-29T17:30:50</measurement_time>
    </measurement>
    <measurement measid = "CDEFG">
        <measurement_name>Counter </measurement_name>
        <numeric_measurement> 1</numeric_measurement>
        <measurement_time>2014-01-29T17:30:50</measurement_time>
    </measurement>
    <testend endid = "CDEFG">
        <test_result>PASS</test_result>
        <test_duration duration_units="millisec">10000</test_duration>
    </testend>
    <teststart startid = "xYZABC">
        <test>MB</test>
        <test_start_time>2014-01-29T17:36:01</test_start_time>
        <exe>HelloWorld</exe>
        <subtest>Check2</subtest>
        <subtest_number>17</subtest_number>
    </teststart>
    <measurement measid = "xYZABC">
        <measurement_name>ERROR1</measurement_name>
        <error_code>31001717</error_code>
        <error_message>MB:FAILED_CHECK_TEST</error_message>
        <measurement_time>2014-01-29T17:36:50</measurement_time>
        <measurement_result>FAIL</measurement_result>
    </measurement>
    <testend endid = "xYZABC">
        <test_result>FAIL</test_result>
        <test_duration duration_units="millisec">49000</test_duration>
    </testend>
</test_instance>
<event_duration duration_units="sec">374</event_duration>
<event_result>FAIL</event_result>

To parse the test portion I am using the regular expression, which works:

\<teststart\sstartid\s=\s"
(?<tid>.*?)"\>
.*\n
.*\<test\>
(?<testid>.*?)\<
.*\n
.*\<test_start_time\>
(?<teststartdate>.*?)T
(?<teststarttime>.*?)\</.*\n
.*?
\<exe\>
(?<texe>.*?)\<.*\n
(.*?\n)*?
.*?\<testend.*?\n
.*?\<test_result\>
(?<result>.*?)\<.*\n
.*?duration_units="
(?<dunits>.{1}).*?
\>
(?<duration>.*?)\<

To parse the measurement data I use the regular expression, which works:

.*?\<measurement\smeasid\s=\s"
(?<measid>.*?)"\>.*\r\n
(.*?\r\n)*?
.*?
\<measurement_name\>
(?<measurename>.*?)\<.*\r\n
.*?
\<numeric_measurement\>
(?<measurenum>[^/s].*?)\<.*\r\n
.*?
\<measurement_time\>
(?<measureDate>[^/s].*?)T
(?<measureTime>[^/s].*?)\<.*\r\n

To parse the Error I use the regular expression, which works:

.*?\<measurement\smeasid\s=\s"
(?<measid>.*?)"\>.*\r\n
.*?\<measurement_name\>
(?<measurename>.*?)\<.*\r\n
.*?\<error_code\>
(?<sterrcode>.*?)\<.*\r\n
.*?\<error_message\>
(?<sterrmsg>.*?)\<.*\r\n
.*?\<measurement_time\>
(?<measureDate>[^/s].*?)T
(?<measureTime>[^/s].*?)\<.*\r\n
.*?\<measurement_result\>
(?<measureResult>[^/s].*?)\<.*\r\n

DISCLAIMER: Yes, I know the input is XML but I cannot change the application to deserialize, it uses regular expressions.

Unihedron
  • 10,902
  • 13
  • 62
  • 72

1 Answers1

0

You can use back-references from within zero-width assertions.

(?=.*?(?<foo>a))?(?=.*?(?<bar>b))?

when applied to any of

ab
ba

will report

group "foo" = "a"
group "bar" = "b"

and when applied to

a

it will report

group "foo" = "a"
group "bar" = (does not exist)

and vice versa.

Tomalak
  • 332,285
  • 67
  • 532
  • 628