I'm new with R and I have tried a lot to solve this problem, if anyone could help me I'd be very grateful! This is my problem:
I have to work with timeseries of a product that are separated by year, type (import or export from the country) and the size of the products in kilograms, something like this:
dat<-data.frame(NAME=c("P1","P1","P2","P2","P1","P2","P1","P1") , YEAR =c(1991,1991,1991,1991,1992,1992,1993,1993), TYPE=c("IMPORT","EXPORT","IMPORT","EXPORT","IMPORT","EXPORT","IMPORT","EXPORT"), VALUE=c(300,200,170,150,150,120,90,100))
dat
# NAME YEAR TYPE VALUE
#1 P1 1991 IMPORT 300
#2 P1 1991 EXPORT 200
#3 P2 1991 IMPORT 170
#4 P2 1991 EXPORT 150
#5 P1 1992 IMPORT 150
#6 P2 1992 EXPORT 120
#7 P1 1993 IMPORT 90
#8 P1 1993 EXPORT 100
So, what I have to do is to get the difference between the importations and exportations of the product for all the years and products in the data. It should look like this
solution<-data.frame(NAME=c("P1","P2","P1","P2","P1"),Year=c(1991,1991,1992,1992,1993),VALUE=c(100,20,150,-120,-10))
solution
# NAME Year VALUE
#1 P1 1991 100
#2 P2 1991 20
#3 P1 1992 150
#4 P2 1992 -120
#5 P1 1993 -10
I used aggregate to solve it but when I do it, the code deletes the product p1 and p2 in 1992 because there's no exportations for P1 or importations for p2 in that year. Does anyone know how to solve it?
This is part of my code:
agg<-sort(data, f= ~ year + name)
agg<-aggregate(size~year + name, data=data, FUN=diff)