0

My goal is to develop a network visualization in R, starting from a "classic" data frame. I thus have to create two files: nodes and links. My current data set looks like this:

Driver   Insurance_taker  Counterparty1    Counterparty2   Counterparty3  Counterparty4   Counterparty5
Allan    Steven           NA               Patrick         Oliver         Jean             William
Ana      Anastasia        Max              Pierre          Jack           Sam              NA

Sample data (Please note that there are multiple NAs in the data):

mydata <- data.table(Driver=c("Allan", "Ana"),
      Insurance_taker = c("Steven", "Anastasia"), 
      Counterparty1= c(NA, "Max"),
      Counterparty2= c("Patrick", "Pierre"),
      Counterparty3= c("Olivier", "Jack"),
      Counterparty4= c("Jean", "Sam"),
      Counterparty5= c("William",NA))

My goal is to have one file called "nodes.csv" like:

Names      Type
Allan      Driver
Ana        Driver
Steven     Insurance_taker
Anastasia  Insurance_taker
...        ...

I have managed to get this file, but I also want to create another file (called "links" let's say) that would look like this:

From        To         Weight       Type
Patrick     Allan      30           witness1_driver
Allan       Steven     20           car_driver
....        ...        ...          ....

The weights will be determine according to the relationship type (eg witness1_driver => weight = 30) Any help would be really much appreciated.

Thanks a lot!! :)

emilliman5
  • 5,816
  • 3
  • 27
  • 37
AllanLC
  • 167
  • 2
  • 11
  • 1
    It's unclear what you are asking. In terms of the `nodes.csv` where is the Person's **Type** coming from or are you trying to derive it? And what do NA's mean? Should they be kept in the network or discarded? – emilliman5 Feb 19 '18 at 18:48
  • The person's Type is coming from the column name of mydata. I managed to get this file. My problem is really to get the "links.csv". Thanks a lot @emilliman5 – AllanLC Feb 19 '18 at 18:52

1 Answers1

1

I think you just need to melt mydata. Here is one option:

melt(mydata, id.vars = "Driver", varying=names(mydata)[-1])

# Driver        variable     value
# 1:  Allan Insurance_taker    Steven
# 2:    Ana Insurance_taker Anastasia
# 3:  Allan   Counterparty1        NA
# 4:    Ana   Counterparty1       Max
# 5:  Allan   Counterparty2   Patrick
# 6:    Ana   Counterparty2    Pierre
# 7:  Allan   Counterparty3   Olivier
# 8:    Ana   Counterparty3      Jack
# 9:  Allan   Counterparty4      Jean
# 10:    Ana   Counterparty4       Sam
# 11:  Allan   Counterparty5   William
# 12:    Ana   Counterparty5        NA
emilliman5
  • 5,816
  • 3
  • 27
  • 37
  • Yes it is! Thanks a lot. However, I get the error "Warning message: attributes are not identical across measure variables; they will be dropped ". Not sure what it means – AllanLC Feb 19 '18 at 19:16
  • I suspect you some data in a column that is not identical across groups and so is getting dropped. If you can post `dput(head(mydata))` we can figure out what is going on. – emilliman5 Feb 19 '18 at 19:29
  • I'm sorry for the late answer. IT appears that melt is not even working since it's giving me NA in the "value" column – AllanLC Feb 20 '18 at 08:42
  • That's because you have NAs in the original dataframe. I left them because you did not specify what to do with them – emilliman5 Feb 20 '18 at 11:26