0

I have a very large distance matrix (3678 x 3678) currently encoded as a data frame. Columns are named "1", "2", "3" and so on, the same for rows. So what I need to do is to find values <26 and different from 0 and to have the results in a second dataframe with two columns: the first one with index and the second one with the value. For example:

            value
318-516   22.70601
... 

where 318 is the row index and 516 is the column index.

Dubukay
  • 1,764
  • 1
  • 8
  • 13
Nancy
  • 101
  • 1
  • 8
  • Hi Nancy, it sounds like you're asking about a [distance matrix](https://en.wikipedia.org/wiki/Distance_matrix) so I edited your question to provide some clarity. Please let me know if this was your intent and the question still makes sense as you intended it! – Dubukay Jan 26 '21 at 21:40
  • @Dubukay yes thanks! – Nancy Jan 26 '21 at 21:52

1 Answers1

1

Ok, I'm trying to recreate your situation (note: if you can, it's always helpful to include a few lines of your data with a dput command).

You should be able to use filter and some simple tidyverse commands (if you don't know how they work, run them line by line, always selecting commands up to the %>% to check what they are doing):

library(tidyverse)
library(tidylog) # gives you additional output on what each command does
# Creating some data that looks similar
data <- matrix(rnorm(25,mean = 26),ncol=5)
data <- as_tibble(data)
data <- setNames(data,c(1:5))

data %>% 
  mutate(row = row_number()) %>% 
  pivot_longer(-row, names_to = "column",values_to = "values", names_prefix = "V") %>% 
  # depending on how your column names look like, you might need to use a separate() command first
  filter(values > 0 & values < 26) %>% 
  
  # if you want you can create an index column as well
  mutate(index = paste0(row,"-",column)) %>% 
  
  # then you can get rid of the row and column
  select(-row,-column) %>% 
  # move index to the front
  relocate(index)
Moritz Schwarz
  • 2,019
  • 2
  • 15
  • 33
  • this works, but my col names are V1, V2, V3 and so on...is it possible to remove the V and to have only numbers? – Nancy Jan 26 '21 at 21:59
  • Yes, you simply need to add `names_prefix = "V"` to the `pivot_longer()` command. Put together: `pivot_longer(-row, names_to = "column",values_to = "values", names_prefix = "V") %>% ` I'll edit this in the answer as well - if this resolved it, would you mind accepting the answer by chance? :) – Moritz Schwarz Jan 26 '21 at 22:06
  • hello, and what if my data are encoded as a large matrix? – Nancy Jan 27 '21 at 15:13
  • 1
    I'm not sure I fully understand what you mean with the encoding as a large matrix - but you could try converting the matrix to a `data.frame` or `tibble` using `as.data.frame()` or `as_tibble()`. – Moritz Schwarz Jan 27 '21 at 16:14