0

i am trying to parse a tabular data in a text file into a data table.

the text file contains text

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
  11 root        1 171   52     0K    12K RUN     23:46 80.42% idle
  12 root        1 -20 -139     0K    12K RUN AS    0:56  7.96% swi7:

the code i have is like

 public class Program
{
    static void Main(string[] args)
    {
        var lines = File.ReadLines("bb.txt").ToArray();
        var headerLine = lines[0];
        var dt = new DataTable();
        var columnsArray = headerLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
        var dataColumns = columnsArray.Select(item => new DataColumn { ColumnName = item });
        dt.Columns.AddRange(dataColumns.ToArray());
        for (int i = 1; i < lines.Length; i++)
        {
            var rowLine = lines[i];
            var rowArray = rowLine.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
            var x = dt.NewRow();
            x.ItemArray = rowArray;
            dt.Rows.Add(x);

        }
    }
}

i get an error that "Input array is longer than the number of columns in this table" at second attempt on

x.ItemArray = rowArray;

Off course because second row has "RUN AS" as the value of 8th column. it also has a space between it which is a common split character for the entire row hence creating a mismatch between array's length and columns length.

what is the possible solution for this kind of situation.

Raas Masood
  • 1,475
  • 3
  • 23
  • 61
  • It looks like your file should be tab delimited but the tabs were replaced by spaces? Since it's neither fixed length nor delimited by a single character you might have to consider using regular expressions to parse it. – juharr Jan 24 '16 at 03:48
  • Is there any example to parse tabular data using regex. How to fetch text under a text. Like how to pick all the USERNAME values – Raas Masood Jan 24 '16 at 04:43
  • Can you change the text file? For instance, if the columns may contain two words, you change it before hand to `"RUN AS"` instead of `RUN AS` this way is a lot cleaner. Else, you might need to check everytime if your array consists more element and try to collapse every extra from 8th column onwards to your 7th column. – Ian Jan 24 '16 at 15:17

1 Answers1

0

Assuming that "RUN AS" is your only string that causes you the condition like this, you could just run var sanitizedLine = rowLine.Replace("RUN AS", "RUNAS") before your split and then separate the words back out afterwards. If this happens more often, however, you may need to set a condition to check that the array generated by the split matches the length of the header, then combine the offending indexes in a new array of the correct length before attempting to add it.

Ideally, however, you would instead have whatever is generating your input file wrap strings in quotes to make your life easier.

Jonathon Chase
  • 9,396
  • 21
  • 39
  • RUN AS is not the only string. Its just that instance where a column value could contain any space splited string. So there is now certain way of knowing offending columns. – Raas Masood Jan 24 '16 at 03:42
  • And this is a linux generated result that cant be forced either. – Raas Masood Jan 24 '16 at 03:42
  • Right, if you're using ps you should be able to add your own delimiter to the output. There's an example here: http://stackoverflow.com/questions/3114741/generating-a-csv-list-from-linux-ps – Jonathon Chase Jan 24 '16 at 03:50
  • Additionally, if you're only going to use this particular output, I believe that STATE is the only column likely to offend. If you end up with an rowLine.Length > columnsArray.Length you should be able to programmatically determine which extra array elements are 'extra' and build a new array, starting at rowLine[7] through rowLine[rowLine.Length - (3 + 1), with the start being inclusive and the end being exclusive. – Jonathon Chase Jan 24 '16 at 04:03
  • This is true if STATE is the only possible offender but it could be any column. These are outputs of JUNOS – Raas Masood Jan 24 '16 at 04:45
  • I see, in that case, JUNOS allows you to pipe output of show commands into `display xml` or `display json`. It looks like you're using the `show system processes extensive` command, so using `show system processes extensive | display json` should give you a much cleaner output that would be easy to deserialize with JSON.NET into a simple container class. – Jonathon Chase Jan 24 '16 at 04:53
  • Display json isn't working. Display xml works but it returns the entire output in one big output node – Raas Masood Jan 24 '16 at 05:01
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/101500/discussion-between-jonathon-chase-and-raas-masood). – Jonathon Chase Jan 24 '16 at 05:03