I have two dataframes I need to merge together. The first one is called "rain_year" and the second is called "refugee." rain_year (df1) has a column called "sum_precip". Refugee (df2) has a column called "refugees".
Both dataframes have columns "Year" and "origin" in common and this is what I need to match on. df1 is more than twice the size of df2.
Here is an example of df1:
Year origin sum_precip
1951 Afghanistan 315
1952 Afghanistan 300
1953 Afghanistan 280
and so on and so forth for all countries going back to 1951.
Here is an example of df2:
Year origin refugees
1951 Afghanistan 500
1952 Afghanistan 400
1953 Afghanistan 350
and so on for all countries going back to 1951.
The two dataframes differ in size because not all countries in df2 produced refugees in all years from 1951-present. As such, I expect there to be a considerable amount of 'NA' when merging either df1 to df2 or vice versa.
When I run this code:
refugee_rain <- join(refugee, rain_year)
it creates a new dataframe with a vector of "NA" in sum_precip. In other words, the correct column is added, but all the values are missing.
When I run this code:
refugee_rain <- join(refugee, rain_year)
the same thing happens.
When I reverse the order of the dataframes, the column of "refugees" does the same thing. That is, there is a column labeled "refugees," but it is a vector of "NA" instead of numerical values.
I have searched high and low for a solution for hours. I have tried several variations, including "all=T". This does not work. Please help.
This is the closest I can find that someone else has a similar issue. However, the suggested solution did not work for me.
At the suggestion of others, I have run the dput operation which returned the following:
> dput(head(refugee, 5))
structure(list(Year = 1979:1983, origin = c("Afghanistan", "Afghanistan",
"Afghanistan", "Afghanistan", "Afghanistan"), ISO = c("", "",
"", "", ""), asylum = c("-", "-", "-", "-", "-"), asylum_ISO = c("-",
"-", "-", "-", "-"), refugees = c(2116011L, 1952928L, 1847304L,
1749628L, 1717966L), asylum_seekers = c(0L, 0L, 0L, 0L, 0L),
IDP = c(0L, 0L, 0L, 0L, 0L), Ven_displaced = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_)), class = c("data.table",
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x000001ff790e1ef0>)
> dput(head(rain_year, 5))
structure(list(Year = c(1951L, 1951L, 1951L, 1951L, 1951L), origin = c(" Afghanistan",
" Albania", " Algeria", " Andorra", " Angola"), sum_precip = c(314.98117,
1077.5157, 99.31164, 1012.9107, 1181.94411)), row.names = c(NA,
-5L), groups = structure(list(Year = 1951:2016, .rows = structure(list(
1:195, 196:390, 391:585, 586:780, 781:975, 976:1170, 1171:1365,
1366:1560, 1561:1755, 1756:1950, 1951:2145, 2146:2340, 2341:2535,
2536:2730, 2731:2925, 2926:3120, 3121:3315, 3316:3510, 3511:3705,
3706:3900, 3901:4095, 4096:4290, 4291:4485, 4486:4680, 4681:4875,
4876:5070, 5071:5265, 5266:5460, 5461:5655, 5656:5850, 5851:6045,
6046:6240, 6241:6435, 6436:6630, 6631:6825, 6826:7020, 7021:7215,
7216:7410, 7411:7605, 7606:7800, 7801:7995, 7996:8190, 8191:8385,
8386:8580, 8581:8775, 8776:8970, 8971:9165, 9166:9360, 9361:9555,
9556:9750, 9751:9945, 9946:10140, 10141:10335, 10336:10530,
10531:10725, 10726:10920, 10921:11115, 11116:11310, 11311:11505,
11506:11700, 11701:11895, 11896:12090, 12091:12285, 12286:12480,
12481:12675, 12676:12870), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -66L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000001ff790e1ef0>)
I am not sure what I am looking at and apologize if that is not the correct format to post this.