-4

So the problem is: I have a file with *.sld extesnion. This file contains about 94 columns and 24500 rows with numbers and can be read by as normal text file. What is the best way to access these numbers from program? For example, I want all numbers from column 15 to be stored as double. What options do I have? I have tried dataTable, but loading whole file with File.ReadAllLines takes about 150MB of RAM memory to run the program and I have to consider that more than one file like this will be used by the program. The piece of *.sld file looks like this:

0.000    96.47     2.51     1.43     2.56     2.47     5.83 -> more columns
1.030    96.47     2.52     1.39     3.14     2.43     5.60  |
2.044    96.47     2.43     1.63     2.96     2.34     5.86  \/
3.058    96.47     2.47     0.76     2.59     2.44     5.62  more rows
4.072    96.47     2.56     1.39     2.99     2.38     5.89

Except there are more columns and rows mentioned before. My solution was something like this:

//Read all lines of opened file to string array
string[] lines = System.IO.File.ReadAllLines(@OFD.FileName,Encoding.Default);
//Remove more than one whitespace with only one whitespace in cycle (cycle not shown)
string partialLine = Regex.Replace(lines[i], @"\s+", " ");
//Split string to string array and add it to dataTable
string[] partialLineElement = partialLine.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries);
fileData.Rows.Add(partialLineElement);

But I have problems accessing whole column of data and it´s a string array, not double numbers. I need it to add one column of this file to ZedGraph as double[]. I have also tried assign this dataTable to dataGridView as:

dataGridView1.DataSource = fileData;
dataGridView1.Refresh();

But how to access columns as double[] ??? Any suggestions ?

DejmiJohn
  • 81
  • 1
  • 1
  • 6
  • 1
    How are you using this data? That will determine if you need to bring the entire file's data into memory or not. If you can get by only ever reading in a row at a time, for example, then you can dramatically decrease the memory footprint. – Servy Jun 18 '13 at 17:52
  • Are all of the columns double values, or do different columns have different types? – Servy Jun 18 '13 at 17:52
  • Here is a good article about using regex to "pick out" some values from log files. http://msdn.microsoft.com/en-us/library/ms972965.aspx However, it could be slow, because of your file size. ........ You should consider parsing the file (once a day?) and saving the parses results somewhere....and binding your data to the parse results instead. – granadaCoder Jun 18 '13 at 18:11

2 Answers2

1

But how to access columns as double[] ??? Any suggestions ?

You can use File.ReadLines which doesn't load the whole file into memmory.

The ReadLines and ReadAllLines methods differ as follows: When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.

double[] col4 = File.ReadLines(filename)
                .Select(line => line.Split(new char[]{' '},StringSplitOptions.RemoveEmptyEntries))
                .Select(p => double.Parse(p[4],CultureInfo.InvariantCulture))
                .ToArray();

To get all columns

double[][] allCols = File.ReadLines(filename)
                    .Select(line => line.Split(new char[]{' '},StringSplitOptions.RemoveEmptyEntries))
                    .Select(p => p.Select(s => double.Parse(s, CultureInfo.InvariantCulture)).ToArray())
                    .ToArray();
I4V
  • 34,891
  • 6
  • 67
  • 79
0

I have used StreamReader in the past to import around 30,000 lines from a sample file, parsed each line into 30 different cells, and used that to import into a database. The reading and parsing took a matter of seconds. You could give that a shot. Just make sure to use it inside a "using" statement.

As far as parsing column 15, I cant think of a better way than to just write a function.

khinkle
  • 111
  • 8