10

I would like to be able to parse date and/or date-time values from a csv file and get their DateTime format (or in Excel terms NumberFormat).

For example I would like to pass "2008-06-07 00:00:00.000" to a function and have it return something like "yyyy-MM-dd hh:mm:ss.000".

The catch is that there can be many possible date formats in the csv files and that they are not known ahead of time so I can't use DateTime.TryParseExact() since it requires you to know the DateTime format string ahead of time to test to see if it works with a specific value.

By knowing the date format, I could set it to be the custom number format in Excel and put in the value and it will appear exactly as it was in the text of the csv file while also having the ability to use it in Excel formulas.

Of course there could be could be ambiguity so ideally it would be nice to get a list possible date formats and then check against multiple dates to remove some of the options by looking at the intersections of the lists of possible date formats.

I have just discovered the NodaTime API and it looks like it may be able to do something like this, but I haven't seen example code that will achieve this as most of the questions regarding similar queries were asked for earlier versions where it didn't seem to be supported.

Any other methods of doing this would be greatly appreciated as well.

Edit

I think the only way to do this, as Jon Skeet said, is to have a list of the common patterns and test against them. I can get the set of all cultured date-time patterns by doing the following. However this list will be missing some custom patterns that may be common in use. For example (I don't know if this is commonly used or not) yyyy/dd/MM is not in the list.

    private static HashSet<string> _patterns; 
    public static HashSet<string> AllCulturedDateTimePatterns
    {
        get
        {
            if (_patterns != null)
                return _patterns;

            _patterns = new HashSet<string>();
            var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
            foreach (var culture in cultures)
            {
                _patterns.UnionWith(culture.DateTimeFormat.GetAllDateTimePatterns());
            }
            return _patterns;
        }
    }
jjdem
  • 379
  • 5
  • 14
  • 1
    How would you parse 1/2/2015? MM/dd/yyyy or dd/MM/yyyy? – Eric J. Sep 04 '15 at 00:09
  • 1
    the function would ideally return both formats in a List. By testing multiple date values, you may be able to eliminate some possible options. for example if you test 1/2/2015, it returns a list of ["MM/dd/yyyy","dd/MM/yyyy"] if you test 6/13/2015, it returns a list of ["MM/dd/yyyy"]. By making an assumption that all dates in a single column of a csv use the same format, we take the intersection of those two lists and have a minimal set of possible date formats. In this case just "MM/dd/yyyy" – jjdem Sep 04 '15 at 00:21

2 Answers2

7

There's no such thing as "all possible date formats". A format of "'Year:' yyyy 'Month:' MM 'Day:' dd" would be valid, but highly unusual, for example. There's nothing that Noda Time supplies here that would be particularly helpful. I suspect you'll need to create your own list of all the date/time formats you've ever seen - then you can either create a pattern for each format within Noda Time, and try to parse it checking for success in the result, or use DateTime.TryParseExact in the BCL. For example:

BCL version

var allFormats = new List<string>
{
    "yyyy-MM-dd HH:mm:ss",
    "dd/MM/yyyy HH:mm:ss",
    "MM/dd/yyyy HH:mm:ss",
    // etc
};

DateTime ignored;
var matchingFormats = allFormats
    .Where(format => DateTime.TryParseExact(text, format, CultureInfo.InvariantCulture,
                                            DateTimeStyles.None, out ignored))
    .ToList();

Noda Time version

var allFormats = ...; // As before
var allPatterns = allFormats
    .Select(format => LocalDateTimePattern.CreateWithInvariantCulture(format))
    .ToList();

var matchingPatterns = allPatterns.Where(pattern => pattern.Parse(text).Success)
                                  .ToList();

Note that IIRC, we don't currently expose the underlying format string from a LocalDateTimePattern (or any other pattern). We could do so, but we don't at the moment...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    thanks @Jon Skeet! I have updated the question with code to get a list of all the patterns for all the cultures, however it may be missing some commonly used patterns. Do you know of a website that has a list of the common date format patterns that may not be generated by my code? – jjdem Sep 04 '15 at 16:27
  • 1
    @jjdem: No, I'm afraid not - it will really be up to whatever's creating the CSV file. – Jon Skeet Sep 04 '15 at 16:35
  • @JonSkeet you are my hero. – ganjeii Mar 08 '22 at 21:05
1

You have to have standards on incoming data. Without this you will drive yourself crazy. Even if you clean it with a vba macro (which I recommend you do anyway), the data will never be correct if you blindly accept every format under the sun without acceptance rules.

One thing you can do (with your vba macro) is input the data source (which should have a known date format) and clean it according to that format. For example, site example.com gives you a .csv in date format MM/dd/yyyy, your macro should be smart enough to know that it's not in format dd/MM/yyyy.

If you are familiar to Excel vba, this should make sense to you. If you are not then welcome to programming in Excel vba!

Benj Sanders
  • 481
  • 5
  • 15
  • VBA would not be my first choice language for data cleansing logic. I would do that in C#, given that the rest of the processing is in C#, probably as a separate pre-import step. – Eric J. Sep 04 '15 at 00:34
  • 1
    FYI, I am doing a C# addin (VSTO), not VBA. The Excel part of this question is not really relevant anyways, I want to do this in C#. I only mentioned Excel because the Excel custom NumberFormat might be slightly different than the date-time formats in C#. – jjdem Sep 04 '15 at 01:16
  • You can do that, but it's way easier to debug in vba. You can actually post the data to SQL directly from vba (if that's what you're doing). You can even invoke vba macros from your C# VSTO code... – Benj Sanders Sep 04 '15 at 16:07