2

Having a few problems trying to parse a CSV in the following format using the FileHelpers library. It's confusing me slightly because the field delimiter appears to be a space, but the fields themselves are sometimes quoted with quotation marks, and other times by square brackets. I'm trying to produce a RecordClass capable of parsing this.

Here's a sample from the CSV:

xxx.xxx.xxx.xxx - - [14/Jun/2008:18:04:17 +0000] "GET http://www.some_url.com HTTP/1.1" 200 73662339 "-" "iTunes/7.6.2 (Macintosh; N; Intel)"

It's an extract from an HTTP log we receive from one of our bandwidth providers.

bkaid
  • 51,465
  • 22
  • 112
  • 128
Richard
  • 1,252
  • 12
  • 23

3 Answers3

3

While I thank Marc Gravell and Jon Skeet for their input, my question was how to go about parsing a file containing lines in the format described using the FileHelpers library (albeit, I worded it badly to begin with, describing 'CSV' when in fact, it isn't).

I have now found a way to do just this. It's not particularly the most elegant method, however, it gets the job done. In an ideal world, I wouldn't be using FileHelpers in this particular implementation ;)

For those who are interested, the solution is to create a FileRecord class as follows:

[DelimitedRecord(" ")]
public sealed class HTTPRecord
{

public String IP;

// Fields with prefix 'x' are useless to me... we omit those in processing later
public String x1;
[FieldDelimiter("[")]
public String x2;


[FieldDelimiter("]")]
public String Timestamp;

[FieldDelimiter("\"")]
public String x3;

public String Method;
public String URL;

[FieldDelimiter("\"")]
public String Type;

[FieldIgnored()]
public String x4;

[FieldDelimiter(" ")]
public String x5;

public int HTTPStatusCode;

public long Bytes;

[FieldQuoted()] 
public String Referer;

[FieldQuoted()] 
public String UserAgent;
}
bkaid
  • 51,465
  • 22
  • 112
  • 128
Richard
  • 1,252
  • 12
  • 23
2

The obvious statement is "then it isn't CSV"...

I'd be tempted to use a quick regex to munge the date into the same escaping as everything else... on a line-by-line basis, something like:

string t = Regex.Replace(s, @"\[([^\]]*)\]", @"""$1""")

Then you should be able to use a standard parser using space as a delimiter (respecting quotes).

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1

In what way is that CSV? It looks like it's just a particular log file format which should be fairly easily parsed, but not by a CSV parser. In particular, you may well find that a regex works perfectly well. (You'd need to check what would happen to quotes in the user agent etc.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • My mistake, stuck in CSV mode today as that's what I've been dealing with all morning. FileHelpers says that it reads "data from fixed length or delimited records in files"; I presumed this is delimited (by spaces), but that it has different field quotes. I'll look into a regex, thanks. – Richard Jun 05 '09 at 10:43