0

I'm currently parsing some HTTP request headers from a log file, I need to split them up and create a dictionary for easier lookups. The code I'm using is:

public static Dictionary<string, string> CreateLookupDictionary(string input)
    {
        Debug.WriteLine(input);
        return input.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
            .Select(x => x.Split(new string[] {": "}, StringSplitOptions.None))
            .ToDictionary(x => x[0], x => x[1], StringComparer.InvariantCultureIgnoreCase);
    }

This is working for 99% of the headers, but then...

...
Keep-Alive: timeout=20
Expires: Sat, 04 Jun 2011 18:43:08 GMT
Cache-Control: max-age=31536000
Cache-Control: public
Accept-Ranges: bytes
...

Now the key Cache-Control already exists, so it's throwing an exception about the key already existing.

Is there an elegant way to overwrite the value that's there, I don't want to have to rewrite the LINQ unless I really have to.

Thanks

Kind Contributor
  • 17,547
  • 6
  • 53
  • 70
Tony
  • 3,587
  • 8
  • 44
  • 77
  • 1
    HTTP requires `\r\n` be used to separate headers but your code only splits on `\r`. This means you'll have trailing `\r` characters in your header values. – Dai Jun 28 '19 at 15:19
  • Normally you'd be 100% correct, but in this case the system that produced the logs has already mangled up the headers, so they just have a '\n' at the end. Thanks for the tip though! – Tony Jun 28 '19 at 15:25
  • Also, someone might want to add a class file like this [https://social.msdn.microsoft.com/Forums/en-US/d9e21ef5-fdb6-4af0-970d-c15a668638c2/how-to-parse-an-http-request-from-byte-or-string-?forum=ncl] then your answer would be a single-line without any LINQ. (Adding a code file might not be considered "elegant" - not already included in the BCL) – Kind Contributor Jun 28 '19 at 15:51

3 Answers3

4
  • .ToDictionary requires each key be unique, by design.
  • Linq doesn't have a .DistinctBy( x => x.y ) method, but we can get the same behaviour with .GroupBy( x => x.y ).Select( grp => grp.Last() ). This has the effect of discarding all previous results with the same y value.

So if you group by the HTTP header name first and then select the last item in each group then that will get you what you want:

// Using cached static fields to avoid unnecessary array allocation:
static readonly String[] _splitOnLines = new String[] { "\r\n" };
static readonly String[] _splitHeader  = new String[] { ": " };

public static Dictionary<String,String> CreateLookupDictionary(String input)
{
    Debug.WriteLine(input);
    return input
        .Split( _splitOnLines , StringSplitOptions.RemoveEmptyEntries )
        .Select( line => line.Split( _splitHeader, StringSplitOptions.None ) )
        .Where( arr => arr.Length == 2 ) // filter out invalid lines, if any
        .Select( arr => ( name: arr[0], value: arr[1] ) ) // using C# 7 named tuples for maintainability
        .GroupBy( header => header.name )
        .Select( duplicateHeaderGroup => duplicateHeaderGroup.Last() )
        .ToDictionary( header => header.name, header.value, StringComparer.InvariantCultureIgnoreCase );
}

Alternatively, use a custom aggregation which uses the key-indexed Item setter property which always succeeds. This approach may have faster performance if duplicates are rare compared to my previous example.

public static Dictionary<String,String> CreateLookupDictionary(String input)
{
    Debug.WriteLine(input);
    return input
        .Split( _splitOnLines , StringSplitOptions.RemoveEmptyEntries )
        .Select( line => line.Split( _splitHeader, StringSplitOptions.None ) )
        .Where( arr => arr.Length == 2 )
        .Select( arr => ( name: arr[0], value: arr[1] ) )
        .Aggregate(
            new Dictionary<String,String>( StringComparer.InvariantCultureIgnoreCase ),
            ( d, header ) =>
            {
                d[ header.name ] = header.value;
                return d;
            }
        );
}
Dai
  • 141,631
  • 28
  • 261
  • 374
  • Rather than aggregating like that, you could create a collection of `KeyValuePair<,>` and pass to the ctor. Also, use the overload of `String.Split()` that includes a count of results. – Jeff Mercado Jun 28 '19 at 15:37
  • Thanks for this. I went for your first solution as it also included the ValueTuples, which is something I've been meaning to look at for a while. – Tony Jun 28 '19 at 15:41
0

.Net Core 2.0 has a solution: see https://stackoverflow.com/a/54075677/887092

Use Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpParser class:

If you use this approach, you will be able to handle many unforeseen edge cases.

Here's the pseudocode (untested) - and this will only work with .Net Core (apparently)

using Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http;
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Text;

public class ExampleUsage 
{
    public static void Main(string[] args)
    {
        string requestString =
        @"POST /resource/?query_id=0 HTTP/1.1
        Host: example.com
        User-Agent: custom
        Accept: */*
        Connection: close
        Content-Length: 20
        Content-Type: application/json

        {""key1"":1, ""key2"":2}";

        var headerResult = Parser.Parse(requestString);
    }
}

public class Parser : IHttpHeadersHandler
{
    private Dictionary<string, string> result = null;

    public Dictionary<string, string> Parse(string requestString)
    {
        result = new Dictionary<string, string>();  

        byte[] requestRaw = Encoding.UTF8.GetBytes(requestString);
        ReadOnlySequence<byte> buffer = new ReadOnlySequence<byte>(requestRaw);
        HttpParser<Program> parser = new HttpParser<Program>();

        parser.ParseRequestLine(this, buffer, out var consumed, out var examined);
        buffer = buffer.Slice(consumed);

        parser.ParseHeaders(this, buffer, out consumed, out examined, out var b);
        buffer = buffer.Slice(consumed);
    }

    public void OnHeader(Span<byte> name, Span<byte> value)
    {
        result.Add(Encoding.UTF8.GetString(name), Encoding.UTF8.GetString(value));
    }
}
Kind Contributor
  • 17,547
  • 6
  • 53
  • 70
  • This approach requires taking a dependency on `Microsoft.AspNetCore.Server.Kestral` which may be undesirable for the OP. – Dai Jun 28 '19 at 15:24
  • I've already looked at that, but this is written in .net, and I don't really want to mix .net and .core stuff. Plus I don't know how to do that, but thank you :) – Tony Jun 28 '19 at 15:25
  • Good point. But it's good to have this for others too. OP didn't specify either way. "Elegant" is up for interpretation. – Kind Contributor Jun 28 '19 at 15:25
  • @Tony Fair enough. In .Net Framework, you can typically include any .Net core library. From what I read, Microsoft is moving toward .Net Standard unification. But I believe the separate CLRs will live on for a long time (.Net Framework CLR and .Net Core CLR) – Kind Contributor Jun 28 '19 at 15:27
  • 1
    @Todd Given it's an "internal" type (even if it is `public`) it means its existence is not guaranteed in future versions of the package the authors are not obligated to support consumers. – Dai Jun 28 '19 at 15:30
  • agreed, but it's also open source. I'm not saying that I'm expecting this to be the accepted answer. It might become obsolete after the OP is updated – Kind Contributor Jun 28 '19 at 15:42
  • my answer doesn't use LINQ - that's a plus – Kind Contributor Jun 28 '19 at 15:44
0

Personally I would use .ToLookup instead to preserve the multiple values for the same key. .ToLookup will not error on duplicate keys, it will create an IEnumerable<V> of the value, this case being IEnumerable<string>:

public static ILookup<string, string> CreateLookup(string input)
{
    Debug.WriteLine(input);
    return input.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries)
        .Select(x => x.Split(new string[] {": "}, StringSplitOptions.None))
        .ToLookup(x => x[0], x => x[1], StringComparer.InvariantCultureIgnoreCase);
}
Dan D
  • 2,493
  • 15
  • 23