-3

Please help and thanks. The solution should be automatic for 10k rows (20k nations with 10k each group)

Original table

|Group1|Group2|Population1|Pop2|
|--|--|--|--|
|"USA"|  "ENG"|  1000|   502|
|"RUS"|  "FRC"|  1200|   200|
|"UEC"|  "IND"|  120|   100|
| "EST"|  "CAU"|  100|   500|
|"VNE"|  "CAM"|  200|   800|
*with other 10000 rows*

Expected table

|Nation|Population|
|--|--|
|"USA"|1000|
|"RUS"|1200|
|  "ENG"|502|
|  "FRC"|200|
|"ENG"|502|
|"FRC"|200|
|"IND"|100|
|"CAU"|500|
|"CAM"|800|


Will N
  • 1
  • 1

3 Answers3

0

Update use rbind from baseR, (assuming your data name is dat else replace it in the following syntax)

#enter column names of final data in a vector
Names <- c("Country", "Population")

#for final output
rbind(setNames(dat[,c(1,3)], Names), setNames(dat[,c(2,4)], Names))

demonstration on a sample data created


dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))
dat
   G1  G2 Pop1 Pop2
1 USA FRA  123  345
2  UK CAN  234  432

> rbind(setNames(dat[,c(1,3)], Names), setNames(dat[,c(2,4)], Names))
  Country Population
1     USA        123
2      UK        234
3     FRA        345
4     CAN        432

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
0

Using tidyr:

library(tidyr)

dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))

pivot_longer(dat, everything(), names_to = c(".value", NA), names_pattern = "(.*)(1|2)$")

# A tibble: 4 x 2
  G       Pop
  <chr> <dbl>
1 USA     123
2 FRA     345
3 UK      234
4 CAN     432

EDIT: Running it with a larger data frame:

dat_big <- uncount(dat, weights = 100000) # Now with 200,000 rows

bench::mark(time = {
  pivot_longer(dat_big, everything(), names_to = c(".value", NA), names_pattern = "(.*)(1|2)$")
})

expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time   
  <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 time        37.5ms  37.5ms      26.6    39.7MB     320.     1    12     37.5ms 
Phil
  • 7,287
  • 3
  • 36
  • 66
  • anyway doing faster, the actual table is 10k rows – Will N Feb 19 '21 at 06:42
  • See my edit: with 200,000 rows it takes 37.5 milliseconds. – Phil Feb 19 '21 at 06:50
  • I mean this code"dat <- data.frame(G1 = c("USA", "UK"), G2 = c("FRA", "CAN"), Pop1 = c(123, 234), Pop2 = c(345, 432))". Inthe real data set, there are many other nations than USA, UK in col1, and FRC, CAN in col2. If we create dat like previous, we have to list G1=c(USA,...10000th nation). Do we have other way that not create dat – Will N Feb 19 '21 at 06:56
  • The `dat_big` object I created here has 200,000 rows, or 20 times your actual data. What you are telling me doesn't seem to counter what I have provided. – Phil Feb 19 '21 at 07:00
  • I just edit the original data, plz have a look, do we have any other method that not create dat step – Will N Feb 19 '21 at 07:15
  • I'm honestly not understanding what the issue is. Running the `pivot_longer()` line on a data frame with 200,000 rows takes 37.5 milliseconds. I'm just not sure what you are looking for at this point. – Phil Feb 19 '21 at 07:20
  • Are you talking about the data in itself? Are you asking if there's another way to bring it in to R than typing it out in `data.frame()`? – Phil Feb 19 '21 at 07:23
  • Yes, there's another way to bring the whole value of each original data's columns out rather than MANUALLY type each value for 4 separate vectors (G1, G2, Pop1,Pop2) – Will N Feb 19 '21 at 07:35
  • Ok - this is asking a completely different question than what you're asking here. That depends on how and where you have the data. If it's in a csv file on your computer, use `readr::read_csv()`, if it's in a SPSS file, use `haven::read_sav()`, if it's in a Excel file, use `readxl::read_excel()`. If it's online, there are tools from the `rvest` package that allow you to scrape information online. – Phil Feb 19 '21 at 07:36
0
x<-c(df2$Group1, df2$Group2)

y<-c(df2$Population1,df2$Pop2)

z<-bind_cols(x,y)

z %>% as.data.frame() %>% rename(nation=...1, Pop=...2)
Phil
  • 7,287
  • 3
  • 36
  • 66
Will N
  • 1
  • 1
  • 1
    The community encourages adding explanations alongisde code, rather than purely code-based answers (see [here](https://meta.stackoverflow.com/questions/300837/what-comment-should-i-add-to-code-only-answers)). – costaparas Feb 20 '21 at 00:11