I have a dataframe (df1) that includes abundances of different species in each sample:
> SampleID Sp1 Sp2 Sp3 Sp4 ... Spn
> asb-001 3 0 0 23 9
> asb-002 4 15 10 56 98
> asb-003 8 45 8 453 0
> asb-004 0 5 0 3 6
> asb-005 120 56 0 0 0
...
Each column represents a different species.
I have another dataframe (df2)
Sp Fam
Sp1 Fam1
Sp2 Fam2
Sp3 Fam1
Sp4 Fam3
Sp5 Fam2
Sp6 Fam1
...
There are fewer Family names than there are Species names.
I would like to add columns together depending on the species name to get a total for the family (e.g. Sp1 + Sp3 + Sp6 = Fam1). I don't want to keep the original column with the species name.
If all goes according to plan, my new dataframe (df3) will look something like this:
> SampleID Fam1 Fam2 Fam3
> asb-001 12 0 9
> asb-002 14 18 112
> asb-003 28 58 18
> asb-004 10 12 10
> asb-005 142 65 0
...
I could manually go through and add individual species together, but this seems tedious for a large dataset, and likely to produce errors. I have the feeling that I should melt df1, but I'm not confident in the details. Any advice would be appreciated!