1

I'm working on a regular expression and I just can't figure out what the problem is. I've tried several helping sites like http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx and http://gskinner.com/RegExr/ but somehow when I put the tested regular expression in c# it is not processed correctly

I'm working on a JSON string I can receive from JIRA. The heavily stripped down and beautified version of this JSON string is as follows:

{
    "fields": {
        "progress": {
            "progress": 0,
            "total": 0
        },
        "summary": "Webhook listener is working",
        "timetracking": {},
        "resolution": null,
        "resolutiondate": null,
        "timespent": null,
        "reporter": {
            "self": "http://removed.com/rest/api/2/user?username=removed",
            "name": "removed@nothere.com",
            "emailAddress": "removed@nothere.com",
            "avatarUrls": {
                "16x16": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=16",
                "24x24": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=24",
                "32x32": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=32",
                "48x48": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=48"
            },
            "displayName": "Wubinator]",
            "active": true
        },
        "updated": "2013-08-20T14:08:00.247+0200",
        "created": "2013-07-30T14:41:07.090+0200",
        "description": "Say what?",
        "customfield_10001": null,
        "duedate": null,
        "issuelinks": [],
        "customfield_10004": "73",
        "worklog": {
            "startAt": 0,
            "maxResults": 0,
            "total": 0,
            "worklogs": []
        },
        "project": {
            "self": "http://removed.com/rest/api/2/project/EP",
            "id": "10000",
            "key": "EP",
            "name": "EuroPort+ Suite",
            "avatarUrls": {
                "16x16": "http://removed.com/secure/projectavatar?size=xsmall&pid=10000&avatarId=10208",
                "24x24": "http://removed.com/secure/projectavatar?size=small&pid=10000&avatarId=10208",
                "32x32": "http://removed.com/secure/projectavatar?size=medium&pid=10000&avatarId=10208",
                "48x48": "http://removed.com/secure/projectavatar?pid=10000&avatarId=10208"
            }
        },
        "customfield_10700": null,
        "timeestimate": null,
        "lastViewed": null,
        "timeoriginalestimate": null,
        "customfield_10802": null
    }
}

I need to convert this JSON to a XML of course this is not directly possible because of the "16x16", "24x24", "32x32" and "48x48" bits inside the json which would be transformed into <16x16 />, <24x24 />, <32x32 /> and <48x48 /> tags which are invalid tags.

The receiver of the XML doesn't even need those avatar urls so I was thinking about stripping out the entire "avatarUrls":"{ ..... }, bit before handing the json over to JSON.NET for converting.

I was thinking about doing this using a regular expression. After some testing on the mentioned websites I came to the following regular expression:

("avatarUrls)(.*?)("displayName")

The Regex.Replace method should remove all found results instead of the third groep (a.k.a. "displayName")

The website http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx shows me the correct groups and find results and says that the mentioned regular expression should be used inside C# is:

@"(""avatarUrls)(.*?)(""displayName"")"

So inside C# I wrote the following:

string expression = @"(""avatarUrls)(.*?)(""displayName"")";
string result = Regex.Replace(json, expression, "$3");

return result;

When I look at the result after the RegexReplace nothing has been replaced. Does anyone see what I did wrong here?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Wubinator
  • 741
  • 1
  • 7
  • 16
  • Have you considered xml escaping the invalid tag names and doing a straight conversion to xml? http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx – Gusdor Oct 17 '13 at 13:21
  • Ditch this approach and just snag you some [json.net](http://james.newtonking.com/json). Has a json => xml conversion there ready for you to go. –  Oct 17 '13 at 13:26
  • I would debug into it by running a Regex.Match, and checking the groups and captures. Make sure that the 3rd group really contains the display name, etc, then go from there. – Baldrick Oct 17 '13 at 13:32
  • @Will I am using http://james.newtonking.com/json for the convertion it's json.net that is throwing the error: http://puu.sh/4SfNF/d149a02884.png – Wubinator Oct 17 '13 at 13:38
  • @Wubinator: Ah, that's too bad. –  Oct 17 '13 at 13:52

3 Answers3

1

I wouldn't use regular expressions to remove these nodes. I'd instead use JSON .Net to remove the nodes you don't want.

I refer to the quote:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

Using the answer found here, you could write:

var jsonObject = (JObject)JsonConvert.DeserializeObject(yourJsonString);
removeFields(jsonObject.Root, new[]{"avatarUrls"});

(Note that I was not sure if you wanted to delete both "avatarUrls" nodes.)

Community
  • 1
  • 1
Ben Smith
  • 19,589
  • 6
  • 65
  • 93
0

There's an overload of Regex.Replace that takes RegexOptions that you may need to look into. For example, for . to matches every character (instead of every character except \n), you'd need to specify RegexOptions.Singleline. Also, it looks like you're trying to replace every match of @"(""avatarUrls)(.*?)(""displayName"")" with $3 is that intended? You might be better off doing something like this:

var match = Regex.Match(json, pattern, options);
while (match.Success) {
      // Do stuff with match.Groups(1)
      match = match.NextMatch();
}  

However... I'm not really sure that's going to replace it in the source string.

Jeff B
  • 8,572
  • 17
  • 61
  • 140
0

The problem is something completely different:

Inside the following string:

{"16x16":"http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=16, "32.32"

There is an '&' the magic symbol that indicates a next parameter is started. Therefor no complete JSON is read and therefor it cannot convert it properly. It also indicates why nothing is being replaced inside the regular expression I used because "displayName" is not inside the string, so nothing matches.

Wubinator
  • 741
  • 1
  • 7
  • 16