I need to understand how to have substring-before or -after apply to multiple nodes.
The code immediately below returns not just the city I want but additional unwanted details.
require(XML)
require(httr)
doc <- htmlTreeParse("http://www.cpmy.com/contact.asp", useInternal = TRUE)
> (string <- xpathSApply(doc, "//div[@id = 'leftcol']//p", xmlValue, trim = TRUE))
[1] "Philadelphia Office1880 JFK Boulevard10th FloorPhiladelphia, PA 19103Tel: 215-587-1600Fax: 215-587-1699Map and Directions"
[2] "Westmont Office216 Haddon AvenueSentry Office Plaza, Suite 703Westmont, NJ 08108Tel: 856-946-0400Fax: 856-946-0399Map and Directions"
[3] "Boston Office50 Congress StreetSuite 430Boston, MA 02109Tel: 617-854-8315Fax: 617-854-8311Map and Directions"
[4] "New York Office5 Penn Plaza23rd FloorNew York, NY 10001Tel: 646-378-2192Fax: 646-378-2001Map and Directions"
I added substring-before(), but it returns only the first element, correctly shortened, but not the remaining three:
> (string <- xpathSApply(doc, "substring-before(//div[@id = 'leftcol']//p, 'Office')", xmlValue, trim = TRUE))
[1] "Philadelphia "
How should I revise my XPath expression to extract in shortened form -- before "Office" all four elements?
Thank you.