0

I have a bunch of files in the format following like this:

20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )
20130201:14:58:56 I request: search | For ID | Search
20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )

Is there something in C# like there is in python called a restkey? I want to grab the first three items (Date time, I (called an action), search/request) and insert each of those into their own column in an SQL table then in the 4th column put the rest of the line.

In python this was quite easy but I couldn't deal with all the hoops I had to jump through to get it to insert into my sql table. So I moved to C# where connection to SSMS is easier.

smartmeta
  • 1,149
  • 1
  • 17
  • 38
B-M
  • 1,231
  • 1
  • 19
  • 41
  • Just a note, not all the lines look like these three. There are different letters where the 'I' is and different action completes where the 'search' and 'request' are. – B-M Feb 15 '13 at 23:06
  • Is the text "I search: xx~~~ in every line? Or is it just for demonstration? – XtrmJosh Feb 15 '13 at 23:09

5 Answers5

3

Although String.Split() is probably the nice and simple way around, I prefer using Regex for this kind of parsing. In this case a pattern like this:

(?<DateTime>\d{8}\:\d{2}\:\d{2}\:\d{2})\s(?<Action>\w)\s(?<SearchOrRequest>search|request)\:\s(?<RestOfTheLine>.*)

Gives you everything you need, nicely grouped into "DateTime", "Action", "SearchOrRequest" and "RestOfLine" match groups.

var pattern = "(?<DateTime>\d{8}\:\d{2}\:\d{2}\:\d{2})\s(?<Action>\w)\s(?<SearchOrRequest>search|request)\:\s(?<RestOfTheLine>.*)";
var regex = new Regex(pattern);
var match = regex.Match(inputString);

var theDate = match.Groups["DateTime"].Value;
var theAction = match.Groups["Action"].Value;
var theChoice = match.Groups["SearchOrRequest"].Value;
var theRest = match.Groups["RestOfTheLine"].Value;
Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
  • He calls out in a comment that there are more values than just "search" or "request" for the third match, but assuming they're all a single word you can just change that pattern to \w+ and be done – Preston Guillot Feb 15 '13 at 23:50
  • @PrestonGuillot nothing forbids *building* the pattern string with a `StringBuilder` either, this answer is a fishing rod, not a fish. – Mathieu Guindon Feb 16 '13 at 00:08
  • I'm not sure what you're resonding to. Your regex, as is, matches "search|request" for the third group, which is fine as long as those are the only two values. All I'm calling out is that he later stated that those aren't necessarilly the only two values, but replacing that match with "\w+" should suffice. I deleted my String.Split answer because I liked this one more. – Preston Guillot Feb 16 '13 at 00:14
0

Using string.Split method

string myString = "20130201:14:58:47 I search: xx (\'ID\'= (xxxxxxxx) )"

string[] strarr = myString.split(' ');

string theLetterIVariableThing = strarr[1];

string iddate = strarr[0];

StringBuilder sb = new StringBuilder();

for (int i = 1; i < strarr.Length; i++)
{
    sb.Append(strarr[i]);
    sb.Append(" ");
}

string trailingText = sb.ToString();

string id = iddate.split(':')[0];

sb.Clear();

for (int i = 1; i < 4; i++)
{
    sb.Append(iddate.split(':'))[i];
}

string date = sb.ToString();

I think this will work, but it's probably the long way around.

XtrmJosh
  • 889
  • 2
  • 14
  • 33
0

You can do it using the .NET function String.Split().

Assuming your date string is of fixed-length, this should work:

        //string inputStr = "20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )";
        //string inputStr = "20130201:14:58:56 I request: search | For ID | Search";
        string inputStr = "20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )";

        string dateStr = inputStr.Substring(0, 17);
        string[] splitStr = inputStr.Substring(18).Split(new char[] { ':' });

        string actionStr = splitStr[0].Substring(0, splitStr[0].IndexOf(' '));
        string userStr = splitStr[0].Substring(2);
        string restStr = splitStr[1].TrimStart();

        // print out what we parsed
        Console.WriteLine(inputStr);
        Console.WriteLine(dateStr);
        Console.WriteLine(actionStr);
        Console.WriteLine(userStr);
        Console.WriteLine(restStr);

Output:

20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )
20130201:14:58:56
I
search
xx ('ID'= (xxxxxxx) )
alldayremix
  • 727
  • 8
  • 20
0

I tried a slightly different approach. I created a console program that can convert these files into fully qualified csv files. Then you can import into sql using ssms very easily.

static void Main(string[] args)
    {
        if (args.Length == 2)
        {
            using (StreamWriter sw = new StreamWriter(args[1]))
            {
                using (StreamReader sr = new StreamReader(args[0]))
                {
                    String line;

                    while ((line = sr.ReadLine()) != null)
                    {
                        int index = 0;
                        int oldIndex = 0;
                        string dateTime = null;
                        string action = null;
                        string task = null;
                        string details = null;

                        index = line.IndexOf(' ', oldIndex);
                        dateTime = line.Substring(oldIndex, index - oldIndex);
                        oldIndex = index + 1;

                        index = line.IndexOf(' ', oldIndex);
                        action = line.Substring(oldIndex, index - oldIndex);
                        oldIndex = index + 1;

                        index = line.IndexOf(':', oldIndex);
                        task = line.Substring(oldIndex, index - oldIndex);
                        oldIndex = index + 1;

                        details = line.Substring(oldIndex + 1);

                        sw.WriteLine("\"{0}\",\"{1}\",\"{2}\",\"{3}\"", dateTime, action, task, details);
                    }
                }

            }
        }
        else
        {
            Console.WriteLine("Usage: program <input> <output>");
        }
    }
Osa E
  • 1,711
  • 1
  • 15
  • 26
0

This is a case where a regular expression is probably the right thing to use.

var testVectors = new[]
{
    "20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )",
    "20130201:14:58:56 I request: search | For ID | Search",
    "20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )"
};

var expression = @"^(?<TimeStamp>[0-9]{8}(:[0-9]{2}){3}) (?<Action>[^ ]+) (?<Type>search|request): (?<Rest>.*)$";

var regex = new Regex(expression);

foreach (var testVector in testVectors)
{
    var match = regex.Match(testVector);

    Console.WriteLine(match.Groups["Timestamp"]);
    Console.WriteLine(match.Groups["Action"]);
    Console.WriteLine(match.Groups["Type"]);
    Console.WriteLine(match.Groups["Rest"]);
}

The expression used makes some assumptions - what you called action is a sequence of characters not containing any spaces and only search and request are valid values for what I call type. But it should be easy to adopt the expression if any of the assumptions does not hold.

Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143