Turn 2 short columns into 1 longer col

Question

Please help and thanks. The solution should be automatic for 10k rows (20k nations with 10k each group)

Original table

|Group1|Group2|Population1|Pop2|
|--|--|--|--|
|"USA"|  "ENG"|  1000|   502|
|"RUS"|  "FRC"|  1200|   200|
|"UEC"|  "IND"|  120|   100|
| "EST"|  "CAU"|  100|   500|
|"VNE"|  "CAM"|  200|   800|
*with other 10000 rows*

Expected table

|Nation|Population|
|--|--|
|"USA"|1000|
|"RUS"|1200|
|  "ENG"|502|
|  "FRC"|200|
|"ENG"|502|
|"FRC"|200|
|"IND"|100|
|"CAU"|500|
|"CAM"|800|

AnilGoyal · Answer 1 · 2021-02-19T07:57:30.107

Update use rbind from baseR, (assuming your data name is dat else replace it in the following syntax)

#enter column names of final data in a vector
Names <- c("Country", "Population")

#for final output
rbind(setNames(dat[,c(1,3)], Names), setNames(dat[,c(2,4)], Names))

demonstration on a sample data created


dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))
dat
   G1  G2 Pop1 Pop2
1 USA FRA  123  345
2  UK CAN  234  432

> rbind(setNames(dat[,c(1,3)], Names), setNames(dat[,c(2,4)], Names))
  Country Population
1     USA        123
2      UK        234
3     FRA        345
4     CAN        432

anyway doing faster, the actual table is 10k rows – Will N Feb 19 '21 at 06:42 — Will N, Feb 19 '21 at 06:42

Phil · Answer 2 · 2021-02-19T06:50:34.893

0

Using tidyr:

library(tidyr)

dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))

pivot_longer(dat, everything(), names_to = c(".value", NA), names_pattern = "(.*)(1|2)$")

# A tibble: 4 x 2
  G       Pop
  <chr> <dbl>
1 USA     123
2 FRA     345
3 UK      234
4 CAN     432

EDIT: Running it with a larger data frame:

dat_big <- uncount(dat, weights = 100000) # Now with 200,000 rows

bench::mark(time = {
  pivot_longer(dat_big, everything(), names_to = c(".value", NA), names_pattern = "(.*)(1|2)$")
})

expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time   
  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 time        37.5ms  37.5ms      26.6    39.7MB     320.     1    12     37.5ms

edited Feb 19 '21 at 06:50

answered Feb 19 '21 at 06:05

Phil

7,287
3
36
66

anyway doing faster, the actual table is 10k rows – Will N Feb 19 '21 at 06:42
See my edit: with 200,000 rows it takes 37.5 milliseconds. – Phil Feb 19 '21 at 06:50
I mean this code"dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))". Inthe real data set, there are many other nations than USA, UK in col1, and FRC, CAN in col2. If we create dat like previous, we have to list G1=c(USA,...10000th nation). Do we have other way that not create dat – Will N Feb 19 '21 at 06:56
The `dat_big` object I created here has 200,000 rows, or 20 times your actual data. What you are telling me doesn't seem to counter what I have provided. – Phil Feb 19 '21 at 07:00
I just edit the original data, plz have a look, do we have any other method that not create dat step – Will N Feb 19 '21 at 07:15
I'm honestly not understanding what the issue is. Running the `pivot_longer()` line on a data frame with 200,000 rows takes 37.5 milliseconds. I'm just not sure what you are looking for at this point. – Phil Feb 19 '21 at 07:20
Are you talking about the data in itself? Are you asking if there's another way to bring it in to R than typing it out in `data.frame()`? – Phil Feb 19 '21 at 07:23
Yes, there's another way to bring the whole value of each original data's columns out rather than MANUALLY type each value for 4 separate vectors (G1, G2, Pop1,Pop2) – Will N Feb 19 '21 at 07:35
Ok - this is asking a completely different question than what you're asking here. That depends on how and where you have the data. If it's in a csv file on your computer, use `readr::read_csv()`, if it's in a SPSS file, use `haven::read_sav()`, if it's in a Excel file, use `readxl::read_excel()`. If it's online, there are tools from the `rvest` package that allow you to scrape information online. – Phil Feb 19 '21 at 07:36

score 0 · Answer 3 · edited Feb 19 '21 at 15:44

0

x<-c(df2$Group1, df2$Group2)

y<-c(df2$Population1,df2$Pop2)

z<-bind_cols(x,y)

z %>% as.data.frame() %>% rename(nation=...1, Pop=...2)

edited Feb 19 '21 at 15:44

Phil

7,287
3
36
66

answered Feb 19 '21 at 08:31

Will N

1
1

1

The community encourages adding explanations alongisde code, rather than purely code-based answers (see [here](https://meta.stackoverflow.com/questions/300837/what-comment-should-i-add-to-code-only-answers)). – costaparas Feb 20 '21 at 00:11

Turn 2 short columns into 1 longer col

3 Answers3