1

Hello and hope all goes well. I made an edit to my previous question and hope it makes it more clear.

I created an igraph object and would like to run same analysis several times and extract some information in each iteration.

I can't share the whole data, so I am sharing just a small subset. df_edge is as follows:

library(dplyr)
job_1 <-c(1,2,6,6,5,6,7,8,6,8,8,6,6,8)
job_2 <- c(2,4,5,8,3,1,4,6,1,7,3,2,4,5)
weight <- c(1,1,1,2,1,1,2,1,1,1,2,1,1,1)

df_edge <- tibble(job_1,job_2,weight)
df_edge %>% glimpse()

Rows: 14
Columns: 3
$ job_1  <dbl> 1, 2, 6, 6, 5, 6, 7, 8, 6, 8, 8, 6, 6, 8
$ job_2  <dbl> 2, 4, 5, 8, 3, 1, 4, 6, 1, 7, 3, 2, 4, 5
$ weight <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1

df_node is as follows:

job_id <- c(1,2,3,4,5,6,7,8)
job_type <- c(1,2,0,0,3,1,1,1)

df_node <- tibble(job_id,job_type)
df_node %>% glimpse()

Rows: 8
Columns: 2
$ job_id   <dbl> 1, 2, 3, 4, 5, 6, 7, 8
$ job_type <dbl> 1, 2, 0, 0, 3, 1, 1, 1

Creating the igraph object:

library(igraph)
library(tidygraph)

tp_network_subset <- graph.data.frame(df_edge,vertices = df_node,directed = F)

summary of job_type column in the df_node

    df_node %>%
     count(job_type)
   
A tibble: 4 x 2
  job_type     n
     <dbl> <int>
1        0     2
2        1     4
3        2     1
4        3     1

What I am doing manually is the following:

### finding a job_id that belongs to job_type==1 category

    df_node %>% filter(job_type==1) %>%
    select(job_id) 

 A tibble: 4 x 1
  job_id
   <dbl>
1      1
2      6
3      7
4      8
# for instance, I picked one of them and it is job_id = 6
### using the job_id to create a subgraph by selecting order 1 neighbors of this job_id (6)

node_test <- make_ego_graph(tp_network_subset,order = 1 ,nodes="6")

### creating a dataframe of this subgrapgh where there is no isolated nodes

df_test <- as_tbl_graph(node_test[[1]]) %>% 
    activate(nodes) %>%
    filter(!node_is_isolated()) %>% 
    as_tibble()
df_test %>% glimpse()
Rows: 6
Columns: 2
$ name     <chr> "1", "2", "4", "5", "6", "8"
$ job_type <dbl> 1, 2, 0, 3, 1, 1

## subgraph size is 6 which will be an outcome of interest
### if the graph is zero length , I should stop here and pick another job_id that belongs to job_type==1 category

In this example, the graph in not zero length so I proceed to the next step

 ### calculating the measure of interest in respect to job_type==1 category
 
   df_test %>% 
    summarise(job_rate= (nrow(df_test %>% filter(job_type==1)))/(nrow(df_test %>% 
    filter(job_type %in% c(1,2,3)))))
# 0.6

if job_rate > 0.5 , I want to keep the job_rate and rows (corresponding nodes) of the job_type=4 category of the subgraph. in this instance, job_rate was 0.6 so I am keeping the following

 df_final <- as_tbl_graph(node_test[[1]]) %>% 
        activate(nodes) %>%
        filter(!node_is_isolated()) %>% 
        as_tibble() %>% filter(job_type==0)

# A tibble: 1 x 2
   name  job_type
    <chr>    <dbl>
1    4            0

But, I need to assign their corresponding job__rate and some other related columns. So, my favorite outcome would be

    name  job_type    subgraph_origin_id      job_rate  subgraph_size  no_(job_type==0)_in_subgrapgh    no_(job_type==1)_in_subgrapgh   no_(job_type==2)_in_subgrapgh   no_(job_type==3)_in_subgrapgh                                                           
    <chr>    <dbl>
1    4         0             6                  0.6         6

so, I need to do this process and create subgrapghs for all job_type==1 nodes. If the grapgh is not zero length and its job_rate > 0.5 then extract all the corresponding nodes in that subgrapgh along with the job_rate and other columns shown in the favorite outcome.

Alex
  • 245
  • 1
  • 7
  • @Phil, thanks for the edit. Would you have any idea to help out? much appreciated! – Alex May 08 '21 at 05:44
  • 3
    if you can create a minimum reproducible example (example data) I can try to help out. – Brigadeiro May 10 '21 at 04:20
  • @Brigadeiro, Thanks for the feedback. I just provided some data and hope it helps. Much appreciated! – Alex May 10 '21 at 16:13
  • please (1) load the required packages to run your code and (2) state clearly the problem you are trying to solve before stating what you have done to manually solve it. – Brigadeiro May 10 '21 at 18:50

1 Answers1

1

Does this work for you?

dflst <- split(df_node, job_type)
tpe <- as.numeric(names(dflst))
out <- tibble()
for (i in seq_along(dflst)) {
  df <- dflst[[i]]
  node_test_lst <- make_ego_graph(tp_network_subset, order = 1, nodes = df$job_id)
  origin_id <- df$job_id
  jtpe <- tpe[i]
  for (j in seq_along(node_test_lst)) {
    node_test <- node_test_lst[[j]]
    df_test <- as_tbl_graph(node_test) %>%
      activate(nodes) %>%
      filter(!node_is_isolated()) %>%
      as_tibble()
    if (nrow(df_test %>% filter(job_type == 0)) > 0 & any(df_test$job_type %in% 1:3)) {
      job_rate <- with(df_test, sum(job_type == jtpe) / sum(job_type %in% 1:3))
      if (job_rate > 0.5) {
        df_final <- df_test %>%
          filter(job_type == 0) %>%
          mutate(
            subgraph_origin_id = origin_id[j],
            job_rate = job_rate,
            subgraph_size = nrow(df_test)
          ) %>%
          cbind(
            setNames(
              as.list(table(factor(df_test$job_type, levels = 0:3))),
              sprintf("no_(job_type==%s)_in_subgrapgh", 0:3)
            )
          )
        out <- out %>% rbind(df_final)
      }
    }
  }
}

which gives

> out
  name job_type subgraph_origin_id job_rate subgraph_size
1    4        0                  6     0.60             6
2    4        0                  7     1.00             3
3    3        0                  8     0.75             5
  no_(job_type==0)_in_subgrapgh no_(job_type==1)_in_subgrapgh
1                             1                             3
2                             1                             2
3                             1                             3
  no_(job_type==2)_in_subgrapgh no_(job_type==3)_in_subgrapgh
1                             1                             1
2                             0                             0
3                             0                             1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • @ThomaslsCoding, Thanks for providing your answer. so I'm getting `Error in list2DF(as.list(table(factor(df_test$job_type, levels = 0:3)))) : could not find function "list2DF" `. I already installed `library(base)` and `library(base)` but still getting the same error. – Alex May 10 '21 at 22:28
  • @Alex I think you can run the code even removing `list2DF`, see my update – ThomasIsCoding May 10 '21 at 22:31
  • @ThomaslsCoding, Thanks. it generates this now `Error in setNames(as.list(table(factor(df_test$job_type, levels = 0:3)))) : argument "nm" is missing, with no default` – Alex May 10 '21 at 22:42
  • 1
    @Alex Sorry, my bad. Now I have fixed it. Please retry. – ThomasIsCoding May 10 '21 at 22:47
  • @ThomaslsCoding, sorry to bother again, but it says `Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match`. Do you have any clues? – Alex May 10 '21 at 22:51
  • 1
    @Alex Did you try my code on the data in your post or our real data? I didn't see any errors with your post data. – ThomasIsCoding May 10 '21 at 22:58
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232191/discussion-between-alex-and-thomasiscoding). – Alex May 10 '21 at 23:00
  • @ThomaslsCoding, I am getting this `Error in if (nrow(df_test %>% filter(inspection_target == 0)) > 0 & hit_rate > : missing value where TRUE/FALSE needed` I guess I know why I am getting this. because the `sum(job_type %in% 1:3)` in the data may generate `zero` and creates `NA` or `NaN `values in `job_rate`. Hence, the condition `job_rate > 0.5` is producing this error. Could you please advice? – Alex May 11 '21 at 03:44
  • @Alex If that case happened and you don't want to keep it, then you can try my update. – ThomasIsCoding May 11 '21 at 06:55
  • @ThomaslsCoding, thanks! It works but I noticed I am getting wrong values for job_rate. Some of them are greater than 1 which doesn't make sense like `$ job_rate 2.000000, 2.000000, 3.714286`. The code works well on the sample data though. the only difference is that in the real data I have different `job_ids` and they are all `random numbers`. I don't know if it can make a difference. – Alex May 11 '21 at 16:21
  • @ThomaslsCoding, sorry to bother again. I see in the real data `job_rate` is calculated based on `job_rate= (nrow(df_test %>% filter(job_type==0)))/(nrow(df_test %>% filter(job_type %in% c(1,2,3))))` which makes the `job_rate` output wrong and in some cases greater than one. Would you have any ideas what may cause this issue? Thanks! – Alex May 11 '21 at 18:13
  • 1
    @Alex No worries. I think it depends on how you define your `job_rate`. From your code, it seems the ratio of `job_type = 0` to `job_type = 1,2 or 3`. In this case, if you have many rows have value `0` but only few for `1,2 or 3`, it definitely give a rate larger than `1`. I don't think it is a coding issue. It is determined by how you define `job_rate`. – ThomasIsCoding May 11 '21 at 19:27
  • @ThomaslsCoding, thanks! I think I didn't explain it well. The `job_rate` column must turn out between `0` and `1` because its denominator is always greater than the numerator and it works in the sample data. In the real data, it seems it follows a different formula which I shared earlier. Would you be available for a quick chat in the room please? – Alex May 11 '21 at 19:45
  • @ThomaslsCoding, I am trying to add degree to the nodes of each subgraph. and report it in the end. So I added `V(node_test_lst)$degree<- degree(node_test_lst,mode = c("All"))` right after `node_test_lst <- make_ego_graph(tp_network_subset, order = 1, nodes = df$job_id)` . However, I'm getting this `Error in degree(node_test_lst, mode = c("All")) : Not a graph object`. It works outside the loop but not when I put it along with the code. Could you please advice? – Alex May 12 '21 at 06:12
  • 1
    @Alex You should use `node_test_lst <- lapply(node_test_lst, function(g) set_vertex_attr(g, name = "degree", value = degree(g, mode = "all")))` since `node_test_lst` is a list of graphs. – ThomasIsCoding May 12 '21 at 06:27
  • @ThomaslsCoding, Just wonder if we can have a quick chat? thanks! – Alex May 12 '21 at 06:58