0

I am working with a data set that looks like this:

  ClusterID      URL               Text_Body
   0            www.text.com       texttexttexttexttext.....
   1            www.text1.com      texttexttexttexttext.....
   2            www.text2.com      texttexttexttexttext.....
   3            www.text3.com      texttexttexttexttext.....
   4            www.text4.com      texttexttexttexttext.....
   5            www.text5.com      texttexttexttexttext.....
   6            www.text6.com      texttexttexttexttext.....
   7            www.text7.com      texttexttexttexttext.....
   8            www.text8.com      texttexttexttexttext.....

Lets call this data set "onlinearticles". ClusterID is the cluster that an article appears in, url is the distinct url for each article, and text body is the actual article. I need to build an additional column which assigns a value of 1 to any row belonging to clusterID 0, 4, 6, and 7. Any other clusterID should have a value of 0. I need to build this column in order to do a regression tree. How can I go about building said column?

Vindication09
  • 45
  • 2
  • 8
  • 3
    `ifelse(onlinearticles$ClusterID %in% c(0, 4, 6, 7), 1, 0)` – bouncyball May 26 '17 at 14:09
  • or `as.integer(onlinearticles$ClusterID %in% c(0, 4, 6, 7))` – Sotos May 26 '17 at 14:15
  • If I wrote it like this: onlinearticles2<-ifelse(onlinearticles$ClusterID %in% c(0, 4, 6, 7), 1, 0) Would this be okay in the sense that I can now refer to onlinearticles2? – Vindication09 May 26 '17 at 14:15
  • @Vindication09 almost! You just need to assign the result to a variable within `onlinearticles`. Something like: `onlinearticles$Cluster_Dummy <- ifelse(onlinearticles$ClusterID %in% c(0, 4, 6, 7), 1, 0)` – bouncyball May 26 '17 at 14:20

0 Answers0