0

Apologies that there is a wealth of information on this site about melting and reshaping data, however, I cannot find the answer to my question on any of the pages I've visited. I have a data set which looks something like:

A Year | A Mean Temp | A Max Temp | A Min Temp | B Year | B Mean Temp | B Max Temp | B Min Temp |

and I want to end up with

Year | A Mean Temp | A Max Temp | A Min Temp |B Mean Temp | B Max Temp | B Min Temp

and fill columns which don't have data for that specific year with 'NA'.

The desired output would be something like:

[Table][1]

I believe the answer lies somewhere in something like:

library(dplyr)
library(tidyr)
library(stringr)
Data %>%
  pivot_longer(cols = contains("Year"), names_to = c("Country", ".value"), 
               names_sep="_", values_drop_na = TRUE) %>%
  rename_with(~ str_c('Country_', .), Rating:Year)```

But as of yet no luck.

Any help would be appreciated.

Thank you

Data


structure(list(Antarctica.Year.CE = 167:172, Antarctica.Temp..C. = c(0.33, 
0.31, 0.18, 0.08, -0.01, -0.11), Antarctica.Min..C. = c(-1.24, 
-1.26, -1.39, -1.48, -1.57, -1.67), Antarctica.Max..C. = c(1.89, 
1.87, 1.74, 1.64, 1.55, 1.45), Arctic.Year.CE = 1:6, Arctic.Temp..C. = c(-1.15, 
-0.96, -0.32, 0.1, -0.18, -0.61), Arctic.Min..C. = c(-1.92, -1.76, 
-1.38, -0.74, -1.08, -1.17), Arctic.Max..C. = c(-0.31, -0.11, 
0.48, 0.83, 0.73, 0.16), Asia.Year.CE = 800:805, Asia.Temp..C. = c(-0.31, 
-0.14, -0.36, -0.67, -0.78, -0.26), Asia.Min..C. = c(-1.4, -1.23, 
-1.45, -1.76, -1.87, -1.35), Asia.Max..C. = c(0.79, 0.96, 0.74, 
0.43, 0.31, 0.83), Australasia.Year.CE = 1001:1006, Australasia.Temp..C. = c(-0.24, 
-0.38, -0.29, -0.33, -0.34, -0.11), Australasia.Min..C. = c(-0.62, 
-0.79, -0.71, -0.73, -0.73, -0.56), Australasia.Max..C. = c(0.15, 
0.03, 0.13, 0.07, 0.05, 0.34), Europe.Year.CE = 1:6, Europe.Temp..C. = c(0.09, 
-0.26, -0.24, 0.22, 0.32, 0.67), Europe.Min..C. = c(-0.69, -1.14, 
-1.18, -0.66, -0.48, -0.11), Europe.Max..C. = c(0.88, 0.56, 0.61, 
1.07, 1.14, 1.5), North.America...Pollen.Year.CE = c(480L, 510L, 
540L, 570L, 600L, 630L), North.America...Pollen.Temp..C. = c(-0.25, 
-0.29, -0.33, -0.34, -0.34, -0.34), North.America...Pollen.Min..C. = c(-0.74, 
-0.7, -0.66, -0.65, -0.64, -0.64), North.America...Pollen.Max..C. = c(0.24, 
0.11, 0, -0.04, -0.04, -0.04), North.America...Trees.Year.CE = c(1204L, 
1214L, 1224L, 1234L, 1244L, 1254L), North.America...Trees.Temp..C. = c(-0.22, 
-0.45, -0.38, -0.87, -0.81, -0.06), North.America...Trees.Min..C. = c(-0.53, 
-0.72, -0.67, -1.12, -1.09, -0.35), North.America...Trees.Max..C. = c(0.04, 
-0.2, -0.11, -0.57, -0.52, 0.18), South.America.Year.CE = 857:862, 
    South.America.Temp..C. = c(-0.3, -0.21, -0.07, -0.38, -0.41, 
    -0.19), South.America.Min..C. = c(-1.12, -1, -0.88, -1.19, 
    -1.22, -0.98), South.America.Max..C. = c(0.53, 0.58, 0.74, 
    0.43, 0.39, 0.61)), row.names = c(NA, 6L), class = "data.frame") ```



  


  [1]: https://i.stack.imgur.com/0sV7a.png
  • Would you please show your desired output clearly? There are various ways to gather this data set. I would also like to know what you would like to do with min max values in column name. Would you like to capture them or just the country name and value. – Anoushiravan R Oct 14 '22 at 10:13
  • 1
    Hi Anoushiravan, thanks for your comment. I tried to add the table however it is quite long and complex so I have instead attached a screenshot. Hopefully you can see this. I would also like to keep the min max values too yes. The goal will be to then pivot this data set to have a facetted graph showing the temperature over time at each site +- the uncertainity from the min/max values – Sophie Williams Oct 14 '22 at 10:35
  • Oh I see, year values differ based on location. Let me see what I can do. – Anoushiravan R Oct 14 '22 at 10:54
  • Hi Anoushiravan, I've tried something like that before and tried that code too but it is still producing replicate years i.e. a year for each continent, I need to just have one year per row and then fill the rest with NAs. – Sophie Williams Oct 14 '22 at 11:03
  • I have an idea. It seems that all year values are like 6 consecutive ones, so we can refactor them to `1:6` values. – Anoushiravan R Oct 14 '22 at 11:11
  • Hi Anoushirvavan, this is just a subset of data as the dataframe is thousands of rows long so unfortunately that wouldn't work – Sophie Williams Oct 14 '22 at 11:15
  • by 1 to 6 I was referring to the range of year values in each continent. If you think that is not the case in your original data set then I cannot help you. Sorry :) – Anoushiravan R Oct 14 '22 at 11:17
  • No problem, thanks for attempting to answer – Sophie Williams Oct 14 '22 at 11:19

1 Answers1

0

For something as small as this, I'd often just go with a more manual approach.

Given your df above, I specify the lists of countries in the columns and then grepl() on the df columns to select those columns. Then, we rename the columns, return the new dataframe. We can then apply the function to the list of countries and then rbind with do.call.

country_list = c('Antarctica', 'Arctic', 'Asia', 'Australasia', 'Europe', 'North.America...Pollen', 'North.America...Trees', 'South.America')

get_cols = function(country) {
  df_new = df[,grepl(country, colnames(df))]
  df_new$Country = rep(country, nrow(df_new))
  colnames(df_new) = c('Year', 'Temp', 'Min_Temp', 'Max_Temp', 'Country')
  
  return(df_new)
}

df_final = do.call(rbind, lapply(country_list, get_cols))

Hope that returns what you're looking for?

bk1n
  • 16
  • 2