0

I am looking to add a 'description' variable to the vertices data frame which describes the cluster in which a node is found. My network is family relationships so clusters could be a family of two adults and two children, single parent with three children, couple etc.

My data looks like

Vertices data frame 

 ID      Date.Of.B    Nationality    
 X1      02/05/1995   Ugandan 
 X2      10/10/2010   Ugandan 
 X3      15/12/1975   Irish 
 :           :          : 

Edgelist

ID1    ID2    

X1     X2 
X1     X3  
X2     X3 
X3     X1  
:      :

I plan to create factor levels to describe clusters i.e

 2 adults            = 2A
 2 adults 2 children = 2A2C
 5 adults 0 children = 5A

After creating the graph using graph_from_data_frame() I can extract the components using componets() with components$membership giving each cluster a membership number with the IDs an attribute of components$membership. I can apply a label to each vertex to determine their status as an adult or child.

Basically I am looking to add another variable which classes each ID given the cluster it is in:

New vertices data frame

ID      Date.Of.B    Nationality   Class  
 X1      02/05/1995   Ugandan      2A1C
 X2      10/10/2010   Ugandan      2A1C
 X3      15/12/1975   Irish        2A1C
 :           :          : 

I am thinking I am going to have to use some sort of loop to go through each cluster and apply a level to each vertex by component$membership

This is one option I thought of and am currently working on.

Please let me know if you have any other ideas or better ways to do it.

Thanks

williamg15
  • 77
  • 7

1 Answers1

0

Maybe this helps:

library(igraph)
library(dplyr)
library(tidyr)

Generate example data:

set.seed(1)
vertices <- data.frame(ID = 1:20,
                   date = as.character(rnorm(20, -5000, 3000) + Sys.Date()),
                   Nationality = letters[1:20])
edgelist <- data.frame(from = sample(1:20, 15, replace = T),
                   to = sample(1:20, 15, replace = T))
g <- graph_from_data_frame(edgelist,
                       directed = F,
                       vertices = vertices)
cp <- components(g)

Save component-membership as new vertex attribute:

V(g)$components <- membership(cp)

Extract vertices plus additional attributes:

df <- get.data.frame(g, "vertices")

Work with the dataframe: First generate a new coding variable based on age (in days), count the occurence and paste the result into a new variable.

 df <- df %>%
       mutate(coding = ifelse(Sys.Date() - as.Date(df$date) > 6570, "A", "C")) %>% 
       group_by(components, coding) %>%
       mutate(n = n()) %>%
       ungroup() %>% 
       mutate(new = paste(n, coding, sep = "")) %>% 
       select(-coding, -n)

Then nest the dataframe based on components into a new dataframe and delete duplicates.

 df2 <- df %>% 
        select(new, components) %>%
        distinct(.keep_all = T) %>% 
        nest(-components)

After that you can merge the two dataframes and loop through (sapply) to unlist your new class variable (in this case called data), which is also your final result.

 df3 <- left_join(df, df2) %>% 
        select(-new)
 df3$data <- sapply(df3$data, function(x) paste(unname(unlist(x)), collapse = ""))
Ben Nutzer
  • 1,082
  • 7
  • 15