-1

Does anyone know how to split this file with regex

1 TESTAAA      SERNUM    A DESCRIPTION
2 TESTBBB      ANOTHR    ANOTHER DESCRIPTION
3 TESTXXX      BLAHBL

The lenght of each column

{id} {firsttext} {serialhere} {description}
 4    22          6            30+

I'm planning to do it with a regex to store all my values in a string[] like this.

        using (StreamReader sr = new StreamReader("c:\\file.txt"))
        {
            string line = string.Empty;
            string[] source = null;
            while ((line = sr.ReadLine()) != null)
            {
                source = Regex.Split(line, @"(.{4})(.{22})(.{6})(.+)", RegexOptions.Singleline);
            }

        }

But I have 2 problems.

  1. The split creates a 6 elements source[0] = "" and source[5] ="" when as you can see I have only 4 elements(columns) per line.
  2. In the case of 3rd line which have the 4th column, if I have blank spaces it creates a position for it but if there's no blank spaces this column is missed.

So what would be the best pattern or solution to split with regex or another solution will be aprreciate it!!! I want to split fixed width. Thanks.

Maximus Decimus
  • 4,901
  • 22
  • 67
  • 95
  • You don't want to split, but `.Match()` instead – zerkms Nov 04 '13 at 01:45
  • @DerekTomes thanks for your answer, but before asking I've already searched in google and obviously I didn't find a solution! Maybe someone can fix my pattern. – Maximus Decimus Nov 04 '13 at 02:00
  • Didn't you like the answers you got when you asked this question already? http://stackoverflow.com/questions/19649617/how-to-split-a-text-lines-by-fixed-width-c-sharp – Enigmativity Nov 04 '13 at 02:07
  • @Enigmativity thanks for your previous answer. But I really wanted to store each element into a string[] at the moment of a reading a line from a StreamReader and sincerely I didn't undestand your complex answer. – Maximus Decimus Nov 04 '13 at 02:47

2 Answers2

3

Using a regular expression seems like overkill, when you already know exactly where to get the data. Use the Substring method to get the parts of the string:

string[] source = new string[]{
  line.Substring(0, 4),
  line.Substring(4, 22),
  line.Substring(26, 6),
  line.Substring(32)
};

Edit:

To make it more configurable, you can use column widths from an array:

int[] cols = new int[] { 4, 22, 6 };

string[] source = new string[cols.Length + 1];
int ofs = 0;
for (int i = 0; i < cols.Length; i++) {
  source[i] = line.Substring(ofs, cols[i]);
  ofs += cols[i];
};
source[cols.Length] = line.Substring(ofs)
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • It was an example. It's gonna be a text file with differents column sizes and it can change in the time. This solution it's good but so hardcode. – Maximus Decimus Nov 04 '13 at 01:58
  • @MaximusDecimus: I see. I added a more configurable solution above. – Guffa Nov 04 '13 at 02:04
  • Ok, it works. When the col array like that gives me the 4th column as null and if I add it like this 4,22,6,30, it works but I have a preocupation... if the last column doesn't fit with the length I will crash. So I will need to be sure that the file fits well with the sizes! – Maximus Decimus Nov 04 '13 at 02:19
  • @MaximusDecimus: Did you forget the line after the loop? It takes the rest of the string and puts in the last item in the `source` array. – Guffa Nov 04 '13 at 02:22
2

It's easier to just use Substring method if you have fixed length, e.g.

string id = line.Substring(0, 4);
string firsttext = line.Substring(4, 22);
string serial = line.Substring(26, 6);
string description = line.Substring(32);

If you really want to use regular expressions, you can use the one below. Please note that it will only work if the data in the first 3 columns doesn't have spaces. Also, I assumed the first column is digits and the rest just alpha.

String input = "2 TESTBBB      ANOTHR    ANOTHER DESCRIPTION";
Match match = Regex.Match(input, @"^(\d*)\s*(\w*)\s*(\w*)\s*(.*)$");
if (match.Groups.Count == 5)
{
    string id = match.Groups[1].Value;
    string firsttext = match.Groups[2].Value;
    string serial = match.Groups[3].Value;
    string description = match.Groups[4].Value;
}
Szymon
  • 42,577
  • 16
  • 96
  • 114