0

Lets say I have class definition like this (fairly simple):

class Person
{
  public string Balance;
  public string Escrow;
  public string Acc;
  // .. and more 

}

and I need to parse this string into the class above:

BALANCE:      746.67     ESCROW PAYMENT:      271.22     LAST ACT:05/03/12
ACC: 10                   YTD DIVIDENDS:          .27   ENTRY DATE:12/20/10

The string comes in this weird format.

I am thinking to read each line one by one and parse its content but I like to learn a better way maybe. At least 2 brains are stronger than one brain.

Tarik
  • 79,711
  • 83
  • 236
  • 349
  • You need to be able to distinguish data from the headers (property names). Is there any rule you could use to do this? Are the headers always preceded by at least two spaces? Or, can you assume that the header strings are always exactly the same? – phoog May 31 '12 at 19:07

2 Answers2

1

If the string is always in that format then you should be able to just split on a ":" character and index into the array.

public Person ParsePerson(string line1, string line2) 
{
  string[] fields1 = line1.Split(new char[] {':', ' '}, StringSplitOptions.RemoveEmptyEntries);
  string[] fields2 = line2.Split(new char[] {':', ' '}, StringSplitOptions.RemoveEmptyEntries);
  return new Person() {
    Balance = fields1[1],
    Escrow = fields1[3],
    Acc = fields1[1]
  };
}
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • 1
    Might have better luck treating it as a fixed width file, rather than trying to find a good delimiter. It seems that the whitespace is there precisely to make it a fixed width file. – Servy May 31 '12 at 19:00
  • @Braveyard my bad, forgot about removing the whitespace – JaredPar May 31 '12 at 19:05
  • That looks good now :) but I don't think you need `.Trim()` anymore since spaces are already removed when you use `StringSplitOptions.RemoveEmptyEntries` – Tarik May 31 '12 at 19:08
  • @JaredPar: One more thing, indexes are wrong but you way lead me to another solution so I will mark your question as the answer. – Tarik Jun 02 '12 at 00:33
1

You could use a regular expression to extract the value for each property from the source string like so:

using System.Text.RegularExpressions;
...
Regex balanceRegex = new Regex("(?<=BALANCE:\\s*)[^\\s]+");
string balance = balanceRegex.Match(source).Value;

This could be wrapped up in a function to search for any named property like this:

private static string GetProperty(string source, string propertyName)
{
    string pattern = String.Format("(?<={0}:\\s*)[^\\s]+", propertyName);
    Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
    return regex.Match(source).Value;
}

Then you could populate a Person object like this:

Person person = new Person
{
    Balance = GetProperty(source, "Balance"),
    Escrow = GetProperty(source, "Escrow Payment"),
    Acc = GetProperty(source, "Acc")
};

You might need to tweak the regex if, for example, you have whitespace inside your property values e.g. ACCOUNT NAME: MR SMITH

The regex approach is quite flexible as it will work even if the order of the properties or the amount of whitespace changed.

Tim S
  • 689
  • 6
  • 16