0

I gain a matrix from following program, and trying to extract the specified url from it:

#loading webpage and analyzing
library(XML)
community_url<-"http://www.irgrid.ac.cn/community-list"
com_source <- readLines(community_url, encoding = "UTF-8")
com_parsed <- htmlTreeParse(com_source, encoding = "UTF-8", useInternalNodes = TRUE)
hrefs <- xpathSApply(com_parsed, '//li[@class="communityLink"]//@href', function(x) unname(x))
irname <- xpathSApply(com_parsed, '//li[@class="communityLink"]//a', xmlValue)
cl_table <- cbind(hrefs, irname)
colnames(cl_table) <- c("hrefs", "h1")

I tried url <- cl_table[cl_table$h1=="文献情报中心", ]and which(cl_table$h1=="文献情报中心") to do it, but it cause an error:

$ operator is invalid for atomic vectors

and even if it works, I could not get the url of "文献情报中心"(/handle/1471x/23660), how should I do? By the way, is there any method for both matrix and dataframe class? If we save this table and read it the class will be dataframe as default. Thanks a lot.

agenis
  • 8,069
  • 5
  • 53
  • 102
赵鸿丰
  • 185
  • 9
  • well, don't use the dollar operator for matrices – agenis May 30 '16 at 08:36
  • I had tried to use bracks instead of $, just like:`> url <- cl_table(url,)[cl_table(,h1)=="文献情报中心"] Error: could not find function "cl_table" > url <- cl_table[url,][cl_table[,h1]=="文献情报中心"] Error in cl_table[url, ] : subscript out of bounds` – 赵鸿丰 May 31 '16 at 07:33
  • you mean like this? `cl_table[, 'h1']` – agenis May 31 '16 at 08:14
  • It doesn't work even changed into `> url <- cl_table('url',)[cl_table(, 'h1')=="文献情报中心"] Error: could not find function "cl_table"` – 赵鸿丰 Jun 01 '16 at 02:59

1 Answers1

1

Well it seems that you are having trouble with how to select a column of a matrix by its name. You might find a helping answer here also.

In fact the dollar sign is OK for data.frames but not for matrices (which your data is). In your case I would do:

url <- cl_table[cl_table[,'h1']=="文献情报中心", 'hrefs']
print(url)
#### hrefs 
#### "/handle/1471x/23660" 

And then, if you want to use this piece of url to open it for instance (you have to make sure that you get only one element in the url result of course):

browseURL( paste0("http://www.irgrid.ac.cn/", url[1]) )
Community
  • 1
  • 1
agenis
  • 8,069
  • 5
  • 53
  • 102