1

I'm working on a twitter text c# library and twitter has added a double word unicode character test to their conformance tests.

https://github.com/twitter/twitter-text-conformance/blob/master/validate.yml

Here's an nUnit test method to run against the above file.

    [Test]
    public void TestDoubleWordUnicodeYamlRetrieval()
    {
        var yamlFile = "validate.yml";
        Assert.IsTrue(File.Exists(conformanceDir + yamlFile), "Yaml file " + conformanceDir + yamlFile + " does not exist.");

        var stream = new StreamReader(Path.Combine(conformanceDir, yamlFile));
        var yaml = new YamlStream();
        yaml.Load(stream);

        var root = yaml.Documents[0].RootNode as YamlMappingNode;
        var testNode = new YamlScalarNode("tests");
        Assert.IsTrue(root.Children.ContainsKey(testNode), "Document is missing test node.");
        var tests = root.Children[testNode] as YamlMappingNode;
        Assert.IsNotNull(tests, "Test node is not YamlMappingNode");

        var typeNode = new YamlScalarNode("lengths");
        Assert.IsTrue(tests.Children.ContainsKey(typeNode), "Test type lengths not found in tests.");
        var typeTests = tests.Children[typeNode] as YamlSequenceNode;
        Assert.IsNotNull(typeTests, "lengths tests are not YamlSequenceNode");

        var list = new List<dynamic>();
        var count = 0;
        foreach (YamlMappingNode item in typeTests)
        {
            var text = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "text").Value) as string;
            var description = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "description").Value) as string;
            Assert.DoesNotThrow(() => {text.Normalize(NormalizationForm.FormC);}, String.Format("Yaml couldn't parse a double word unicode string at test {0} - {1}.", count, description));
            count++;
        }
    }

This is the error produced: Vocus.TwitterText.Tests.ConformanceTest.TestDoubleWordUnicodeYamlRetrieval: Yaml couldn't parse a double word unicode string at test 5 - Count unicode chars outside the basic multilingual plane (double word). Unexpected exception: System.ArgumentException

Mark Evaul
  • 653
  • 5
  • 11

1 Answers1

0

I don't think it's is the yaml parser, try something like:

using (var stream = new StreamReader(path, Encoding.UTF8))
{
    var yaml = new YamlStream();
    yaml.Load(stream);
    //Do the rest of your code
}
  • Sorry so late in replying, but this did not help. The particular line that has issues is not actually UTF8 characters, but unicoded character representations: text: "\U00010000\U0010ffff" when using a stream reader to output the file to a string, the characters are correct. When using yaml to retrieve the node, the output is \0. – Mark Evaul Feb 05 '15 at 14:38