I have a lot of old untidy data in excel files (50 sheets with 400-500 rows each). A part of my data looks like this:
Elements= c("Project name ONE","John","Smith","Sara","Project name TWO","stardust","soil","sunflower","juice","doe","tobacco", "Project name THREE","phi","rho","omega")
Units= c("NA", "3", "5", "6", "NA", "21", "19", "31", "24", "1", "5", "NA", "21", "21", "22")
df= data.frame(Elements, Units)
In my large data set each project has a very different number of rows.
I want to create new column "Group" where each line for every project is defined. For the above sample the results would be like this
Group =c(1,1,1,1,2,2,2,2,2,2,2,3,3,3,3)
df =c(Elements, Units, Group)
But I also would like to sum values from the column "Units" for all values below each "empty" cell into a new column "Sum".
Sum= c("14", "NA", "NA", "NA", "101", "NA", "NA", "NA", "NA", "NA", "NA", "9", "NA", "NA", "NA")
My final product would then look like this:
df =c(Elements, Units, Group, Sum)