0

I want to select a specific range of elements within my XML files

Here is an example of the XML:

<urlset>
    <url>
        <loc>e1</loc>
        <priority>1</priority>
    </url>
    <url>
        <loc>e2</loc>
        <priority>2</priority>
    </url>
    <url>
        <loc>e3</loc>
        <priority>1</priority>
    </url>
    <url>
        <loc>e4</loc>
        <priority>3</priority>
    </url>
    <url>
        <loc>e4</loc>
        <priority>1</priority>
    </url>
    <url>
        <loc>e5</loc>
        <priority>2</priority>
    </url>
</urlset>

How to get the value of e2 to e4?

Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
Chelsea_cole
  • 1,055
  • 3
  • 15
  • 21

4 Answers4

2
var result = urlset.Elements("url").Where(url =>
            url.Element("loc").Value.ToString().CompareTo("e2") >= 0 &&
            url.Element("loc").Value.ToString().CompareTo("e4") <= 0).Select(element => element.Element("loc").Value.ToString());

It uses standard (string) comparison - same as in alphabetical sorting, and it doesn't protect you against cases where some element would not have loc subelement at all (null reference exception would be thrown).

Konrad Morawski
  • 8,307
  • 7
  • 53
  • 91
  • I should have used `CompareTo` (>= does not work for strings) - I updated this. As for your comment: I just tested it and it works as expected, whether the xml elements are sorted or not. The example given above is inclusive of "e4", if that's not intended just use `("e4") < 0` instead of `<=`. It's admittedly less clean than `SkipWhile` / `TakeWhile`, though. – Konrad Morawski Aug 15 '11 at 14:37
  • Chop off `Select` if you want to retrieve entire XML elements rather than just the `loc` values by themselves. – Konrad Morawski Aug 15 '11 at 14:39
  • Change `e3` to `e10`. It won't select `e10`. As I understand, OP needs elements between `e2` and `e4`. – Kirill Polishchuk Aug 15 '11 at 14:44
  • Do we want it to select `e10`? `e10` is not between `e2` and `e4`. A problem would occur if you had `e30` (treated as string, it does fall between `e2` and `e4`). But this problem occurs whether you use `Where` or `SkipWhile` / `TakeWhile`, no difference here. A small method converting `"e30"` to `(int)30`, `"e4"` to `(int)4` etc. would be needed. – Konrad Morawski Aug 15 '11 at 14:49
  • Maybe I misinterpret OP question. As I understand, for example, in sequence `1, 2, 10, 3, 4, 5` OP needs elements between `2` and `4`. In this sample: `2, 10, 3`. – Kirill Polishchuk Aug 15 '11 at 14:52
  • Ah OK I see your point now. I understood it differently. He should probably just clarify this. – Konrad Morawski Aug 15 '11 at 14:53
2
var doc = XDocument.Parse(xml);

var result = doc.Element("urlset").Elements("url")
    .SkipWhile(x => x.Element("loc").Value != "e2")
    .TakeWhile(x => x.Element("loc").Value != "e4");
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
  • Wouldn't it just return all the `e2`s and nothing else? I did not downvote by the way, I just undid my upvote (which was for the use of `SkipWhile` and `TakeWhile` - I didn't know these). The OP wants a a range of elements e2 from to e4 (presumably including e3, for example) – Konrad Morawski Aug 15 '11 at 14:18
  • @Morawski, It will return `e2`, `e3` in provided sample, in other words from `e2` to `e4` – Kirill Polishchuk Aug 15 '11 at 14:21
  • I'd consider using OrderBy, same for Morawski's answer. Although in sample elements are ordered, it may not be the case (that should be cleared by OP). Edit: TakeWhile won't take last element (e4), just checked. Also, you should Select to get the interested value instead of XElement. – Marcin Deptuła Aug 15 '11 at 14:52
1

an alternative way:

var urls = from url in doc.Descendants("urlset").Elements("url")
    let str = Int32.Parse(url.Element("loc").Value.Replace("e",""))
    where str >= 2 && str <= 4  
    select url;

or actually, a safer option that doesn't throw exceptions if loc is not in the form "e + integer value" (courtesy of Marc Gravell) would be:

int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : (int?)null;
}

var urls = from url in doc.Descendants("urlset").Elements("url")
    let str = TryParse(url.Element("loc").Value.Replace("e",""))
    where str >= 2 && str <= 4  
    select url;
Community
  • 1
  • 1
Paolo Falabella
  • 24,914
  • 3
  • 72
  • 86
1

You can use this XPath:

//url[loc = 'e2' or 
    (preceding-sibling::url/loc = 'e2' and following-sibling::url/loc = 'e4')
]

It will select url with loc = e2, e3, e4

Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125