2

I began with this innocuous dataframe:

Date          Company     Jobs   
1/1/2012      Company 1    12 
1/1/2012      Company 2    84
1/1/2012      Company 3    239
1/1/2012      Company 4    22

I am dreaming, begging, and fantasizing about this dataframe looking like this:

Date          Company 1   Company 2 Company 3 Company 4
1/1/2012         12          84       239        22
1/2/2012                
1/3/2012                     <other numbers here> 
1/4/2012      

Looking around and thinking about which tools to use, I figured I'd use the reshape2 package.
I started with myDF <- melt(myDF) so I could melt my dataframe. The strategy is to use dcast to reformat it as a long dataframe.

So here's my melted dataframe:

Date          Company     variable   value
1/1/2012      Company 1    Jobs       12 
1/1/2012      Company 2    Jobs       84
1/1/2012      Company 3    Jobs       239
1/1/2012      Company 4    Jobs       22

I tried dcast(myDF, Date ~ Company + value)
and got this:

Date          Company 1   Company 2 Company 3 Company 4
1/1/2012         NA          NA       NA        NA
1/2/2012                
1/3/2012                     <NAs here> 
1/4/2012      

Can someone please help me out and tell me why such a nefarious thing is occurring?

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
  • Has each company at a certain date at maximum one entry? I am assuming not, because otherwise you wouldn't get the `Aggregation function missing: defaulting to length` warning. – Beasterfield Dec 03 '13 at 08:55
  • I'm confused about that warning...I double checked and I indeed have at most one value per company for date. It's true that some values are missing, but I don't think that is the major issue. – tumultous_rooster Dec 03 '13 at 09:17
  • I think that should be `dcast(myDF, Date ~ Company + variable)` – Jan van der Laan Dec 03 '13 at 11:13
  • 1
    Maybe `dcast(unique(myDF), Date ~ Company, value.var = "value")`? A reproducible example would be helpful. It doesn't have to be your real data. Just a small dataset that demonstrates the same problem. – A5C1D2H2I1M1N2O1R2T1 Dec 03 '13 at 13:38
  • @AnandaMahto, this gives me a dataframe with the correct dimensions but every entry is 0! – tumultous_rooster Dec 03 '13 at 21:39

1 Answers1

1

You can use your original data frame inside function dcast() because your data already are in long format. Function will use column Jobs as values.

dcast(df,Date~Company)
      Date Company_1 Company_2 Company_3 Company_4
1 1/1/2012        12        84       239        22

You can also write exactly that you want to use column Jobs as values.

dcast(df,Date~Company,value.var="Jobs")
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • thanks for you help. I tried both versions and got the same thing as my final attempt I wrote above, the only difference is that the NAs were replaced with 0 and 1s. Hummm.... – tumultous_rooster Dec 03 '13 at 08:10
  • Then please add to your question result of dput() of your original data frame because data you supplied works as expected. – Didzis Elferts Dec 03 '13 at 08:11
  • I don't mean to sound paranoid, but the real data is from a work project, and I don't want to put the information out in public. I will have to rework it. In the meantime, I am wondering if the warning I'm getting is giving me any useful information, "Aggregation function missing: defaulting to length" – tumultous_rooster Dec 03 '13 at 08:37
  • It sounds that you have more than one observation per Date per company, so you have to decide what to do with repeated observations (mean, maximal). – Didzis Elferts Dec 03 '13 at 09:02
  • Well, after examining the original dataset, I found that is not the case...however I did notice someone has the same problem here: http://grokbase.com/t/r/r-help/1287e9kena/r-reshape2s-dcast-adds-nas-to-data-frame – tumultous_rooster Dec 03 '13 at 09:06
  • Even if you cannot post the data here, some additional information would be helpful, as for example `lapply( myDF, class )` or `any( duplicated( myDF[ , c("Company", "Date") ] ))`. – Beasterfield Dec 03 '13 at 09:37