1

Presently my code is like this but I am wondering if there is a way that I could simplify it. What I am looking for is for one of the strings "JLPT N1","JLPT N2","JLPT N3","JLPT N4","JLPT N5" and then when I see one of these I will set the value of a phraseSources[seq].JishoJlpt.

Note that only one of the above can appear at once.

            if (nodes2[0].InnerText.Contains("JLPT N5"))
            {
                phraseSources[seq].JishoJlpt = "5";
            }
            if (nodes2[0].InnerText.Contains("JLPT N4"))
            {
                phraseSources[seq].JishoJlpt = "4";
            }
            if (nodes2[0].InnerText.Contains("JLPT N3"))
            {
                phraseSources[seq].JishoJlpt = "3";
            }
            if (nodes2[0].InnerText.Contains("JLPT N2"))
            {
                phraseSources[seq].JishoJlpt = "2";
            }
            if (nodes2[0].InnerText.Contains("JLPT N1"))
            {
                phraseSources[seq].JishoJlpt = "1";
            }

Would appreciate if anyone can suggest how I can simplify this.

Alan2
  • 23,493
  • 79
  • 256
  • 450

3 Answers3

6

You should look into Regular Expressions.

This method would grab the int from your text, which you could then use.

    using System.Linq;
    using System.Text.RegularExpressions;

    public int ContainedNum(string s)
    {
        var match = Regex
           .Match(s, (@"JLPT N(\d{1})"))
           .Groups
           .Cast<Group>()
           .Skip(1) // the first match is the entire string not the group
           .Single()
           .Value;
        var num = int.Parse(match);
        return num;
    }

This assumes that every argument always follows this pattern and doesn't go bigger than 9. If that isn't the case you would need to be more conservative using Single() and figure out what you want to return if that isn't case.

You could use it like this:

var text = nodes2[0].InnerText;
var num = ContainedNum(text);
phraseSources[seq].JishoJlpt = num.ToString();
Aage
  • 5,932
  • 2
  • 32
  • 57
  • Wouldn't `^JLPT N(\d{1:3})` work for 1-3 digit numbers, for example, with no other changes? – Kuba hasn't forgotten Monica Oct 13 '19 at 15:46
  • 1
    Since the OP uses `Contains` I don't think you want to use the `^` anchor. Also there doesn't seem to be much point in parsing the number to an `int` to just turn around and format it back to a `string`. – juharr Oct 13 '19 at 17:02
  • @juharr Good call. I removed the `^`. I realised the casting to an `int` is unnecessary, but when working with numbers I like to make that explicit in the `signature`. – Aage Oct 13 '19 at 19:02
1

I like the Try pattern for this kind of parsing:

private static bool TryParseJLPT(string text, out int value)
{
    var match = Regex.Match(text, @"JLPT N(?<Number>\d+)", RegexOptions.None);
    if (!match.Success) { value = default; return false; }
    return Int32.TryParse(match.Groups["Number"].Value, out value);
}

Usage example:

if (TryParseJLPT(nodes2[0].InnerText, out int value))
{
    phraseSources[seq].JishoJlpt = value;
}

Depending on the case it may be more appropriate to throw instead of returning a boolean value. If a parsing failure is an expected case that can be handled, then the Try pattern is OK. If a parsing failure is exceptional and you don't know how to continue, then throwing is better.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
-1

why not using an extension ?

I've been using This extension for a while :

internal static class Extensions
{
    public static bool In(this string value, params string[] args)
    {
        return args.Contains(value);
    }

}

Applying the In extension on your code would be like this :

if (nodes2[0].InnerText.In("JLPT N1","JLPT N2","JLPT N3","JLPT N4","JLPT N5"))
{
    phraseSources[seq].JishoJlpt = nodes2[0].InnerText.Last(); // get the last character
}
iSR5
  • 3,274
  • 2
  • 14
  • 13
  • 1
    The OP's code is checking if any of those values are substrings of `InnerText`, so this will not work the same as their code. – juharr Oct 13 '19 at 17:05