4

In a webcrawler/webscraper-setting, I'd like to dynamically extend my base Reference Class URL in order to be able to write specific methods for respective hosts/domains. Just to be clear, by dynamically I mean something like "automatically generate class definitions as new domains are encountered (e.g. class URL_something.com which would inherit from class URL)".

Works a treat, the only problem is that my class WebPage expects the value of field url to be of class URL. It will accept objects of class URL_something.com as this inherits from class URL, but then actually turns the object into an instance of class URL. So I lose the information that it's actually of class URL_something.com.

Do you have any idea of how I can prevent losing that crucial information?

Code Example

setRefClass(Class="URL", fields=list(x="character"))
setRefClass(Class="WebPage", fields=list(url="URL"))

obj <- new("WebPage", url=new("URL", x="http://www.something.com/home/index.html"))
obj$url

# Method would recognize that there is no class 'URL_something.com' 
# yet and thus create it:
setRefClass(Class="URL_something.com", contains="URL")

# Another method would take care of mapping field values to 
# an instance of the new class:
> url.obj <- new("URL_something.com", x="http://www.something.com/home/index.html")
> inherits(url.obj, "URL")
[1] TRUE

> obj$url <- url.obj
> class(obj$url)
[1] "URL"
# So I lose the information that it was actually of class "URL_something.com"
Jason Plank
  • 2,336
  • 5
  • 31
  • 40
Rappster
  • 12,762
  • 7
  • 71
  • 120
  • 1
    This works as expected in R version 2.14.0 alpha (2011-10-16 r57263), so maybe update your R? Also `a` is not a field of `WebPage` and the recommended way to construct reference objects is `getRefClass("WebPage")$new(<...>)` or to use the return value of `setRefClass`. – Martin Morgan Oct 19 '11 at 00:45
  • @MartinMorgan Do you want to provide that as an Answer so we can close out this Q? – Gavin Simpson Oct 19 '11 at 08:15
  • @MartinMorgan: thanks for the advice, I'll have a look at R 2.14.0 then. Also corrected the class def, sorry for the mistake. Do you know the implications of using `new(...)` as opposed to `getRefClass("WebPage")$new(<...>)`? – Rappster Oct 19 '11 at 10:01

1 Answers1

1

Picking up on what Martin said (see comments above): R 2.14.0 fixes what I described above.

Rappster
  • 12,762
  • 7
  • 71
  • 120