1

I have to parse a String to create a PathSegmentCollection. The string is composed of numbers separated by comas and/or (any) whitespaces (like newline, tab, etc...), also the numbers can be written using scientific notation.

This is an example: "9.63074,9.63074 -5.55708e-006 0 ,0 1477.78"

And the points are: P1(9.63074, 9.63074), P2(-0,555708, 0), P3(0, 1477.78)

To extract numbers I use a regular expression:

Dim RgxDouble As New Regex("[+-]?\b[0-9]+(\.[0-9]+)?(e[+-]?[0-9]+)?\b")
Dim Matches As MatchCollection = RgxDouble.Matches(.Value)
Dim PSegmentColl As New PathSegmentCollection
Dim PFigure As New PathFigure

With Matches

  If .Count < 2 OrElse .Count Mod 2 <> 0 Then Exit Sub

  PFigure.StartPoint = New Point(.Item(0).Value, .Item(1).Value)

  For i As UInteger = 2 To .Count - 1 Step 2
    Dim x As Double = .Item(i).Value, y As Double = .Item(i + 1).Value
    PSegmentColl.Add(New LineSegment With {.Point = New Point(x, y)})
  Next

End With

It works, but I have to parse about a hundred thousand (or more) strings, and in this way is too slow. I want to find a more efficient solution whereas: most of the times the numbers are not written in scientific notation and, if you think that's a better way, I have no problem to use an assembly written in C++/CLI that use C/C++ not managed code, or C# unsafe code.

gliderkite
  • 8,828
  • 6
  • 44
  • 80
  • 2
    You should add tags with the languages you are interested in - I don't think many people track "string" as a topic, whereas many follow C#, VB or C++ (or any other language you think is relevant for your question). – assylias Apr 07 '12 at 10:02
  • Done (.NET and C++/CLI are relevant), thanks. – gliderkite Apr 07 '12 at 10:10
  • By the way, `-5.55708e-006` is not `-0,555708`, it is `-0,00000555708`. – Vlad Apr 07 '12 at 10:13

1 Answers1

2

Why are you trying to parse the path markup syntax yourself? It's a complicated thing, and perhaps a subject to be changed (at least extended) in the future. WPF can do this for you: http://msdn.microsoft.com/en-us/library/system.windows.media.geometry.parse.aspx, so it's better to let the framework work.


Edit:
If the parsing is your bottleneck, you can try to parse yourself. I would recommend trying the following and checking if it's fast enough:

char[] separators = new char[] { ' ', ',' }; // should be created only once
var parts = pattern.Split(separators, StringSplitOptions.RemoveEmptyEntries);
double firstInPair = 0.0;
for (int i = 0; i < parts.Length; i++ )
{
    double number = double.Parse(parts[i]);
    if (i % 2 == 0)
    {
        firstInPair = number;
        continue;
    }
    double secondInPair = number;
    // do whatever you want with the pair (firstInPair, secondInPair) ...
}
Vlad
  • 35,022
  • 6
  • 77
  • 199
  • Because is not a path markup syntax. It's only a sequence of numbers, the Parse method can not parse my string. But if i change the string perhaps it might work, I have to test it. – gliderkite Apr 07 '12 at 10:34
  • @gliderkite: well, isn't the string coming from some path description? If not, try to prepend the string with "L", so that it will be a valid path. – Vlad Apr 07 '12 at 10:43
  • I have to insert 'M' as first character, 'L' after the second number and 'z' as last character. Any idea? – gliderkite Apr 07 '12 at 10:47
  • I decided to use this solution for now. However this: `MyPathGeometry.AddGeometry(Geometry.Parse(MyString)` is my new application bottleneck. I'll have to find an alternative time or another. – gliderkite Apr 07 '12 at 19:19
  • @gliderkite: you could just insert "M 0,0 L" at the beginning. so you wouldn't need to parse the string in order to find the _second_ number. Then, you'll perhaps need to ignore the initial parsed segment. – Vlad Apr 11 '12 at 09:31
  • I test it, but unfortunately to create a new Geometry with points, it has the same performance that using [Parse](http://msdn.microsoft.com/en-us/library/system.windows.media.geometry.parse(v=vs.90).aspx) method. I use [StreamGeometryContext](http://msdn.microsoft.com/en-us/library/ms635540(v=vs.90).aspx) to describe the geometry. – gliderkite Apr 14 '12 at 17:40
  • 1
    @gliderkite: asked question about your bottleneck [here](http://stackoverflow.com/questions/10161572/parse-without-string-split). – Vlad Apr 15 '12 at 11:30