1

First time posting, mainly because I got tired of banging my head against the wall.

Thanks in advance for looking at this.

I have a data frame that looks like this:

 state city x y z
1 OR Portland 8 10 1
2 OR Portland 8 10 4
3 OR Portland 8 10 10
4 NY New York 29 15 10
5 NY New York 29 15 18
6 NJ Trenton 8 10 50
7 NJ Trenton 8 10 60
8 NJ Trenton 8 10 70
9 WA Seattle 1 70 6
10 WA Seattle 1 70 7
11 WA Seattle 1 70 8
12 WA Seattle 1 70 9
13 WA Seattle 1 70 10
14 WA Seattle 1 70 11

I have been trying to reshape it to look like this:

state city  x  y z.1 z.2 z.3 z.4 z.5 z.6
OR Portland 8  10 1   4   10
NY New York 29 15 10  18
NJ Trenton  8  10 50  60  70
WA Seattle  1  70 6   7   8   9   10  11

I have been using the package reshape2 and the code looks like this:

df <- melt(data,id.vars = c("state","city","x","y"),measure.vars = "z")
wide <- dcast(df, state + city + x + y ~ variable)

Which returns a count of variable z for each set of id.vars.

I also tried this:

wide <- dcast(df, state + city + x + y ~ value)

Which looks like this:

state city  x  y  1 4 6 7 etc...
OR Portland 8  10 1 1 0 0
NY New York 29 15 0 0 0 0
NJ Trenton  8  10 0 0 0 0
WA Seattle  1  70 0 0 1 1

This is closer to what I'm looking for but would be very difficult to use for looking up information.

Tell me if I'm wrong, but it looks like I need an id variable for each duplicate value of state, city, x, y.

I haven't been able to think up or find anything that will allow me to create column that will number duplicate values like below.

 state city   x  y  z  num
1 OR Portland 8  10 1  1 
2 OR Portland 8  10 4  2
3 OR Portland 8  10 10 3
4 NY New York 29 15 10 1
5 NY New York 29 15 18 2
6 NJ Trenton  8  10 50 1
7 NJ Trenton  8  10 60 2
8 NJ Trenton  8  10 70 3
9 WA Seattle  1  70 6  1
10 WA Seattle 1  70 7  2
11 WA Seattle 1  70 8  3
12 WA Seattle 1  70 9  4
13 WA Seattle 1  70 10 5
14 WA Seattle 1  70 11 6

I would appreciate any help or an idea of where to keep looking for solutions.

Best,

-n

nhol
  • 11
  • 2
  • Hmm am I missing something - I dont think that is a duplicate really. The linked question looks for creating sequential i.ds, whereas this one asks for a list of existing id's by state and city. Using `library(data.table)` , we can use `dt[, .(z=list(z)), by= .(state, city, x, y)]` here – dww Mar 02 '17 at 00:12
  • @dww - it's essentially the same issue - `dat$time <- ave(rownames(dat), dat[c("state","city","x","y")], FUN=seq_along); reshape(dat, idvar=c("state","city","x","y"), direction="wide")` for example just needs a sequential id to be created. – thelatemail Mar 02 '17 at 00:14
  • Thanks for the help! I should be able to piece it together. – nhol Mar 02 '17 at 01:27

1 Answers1

1

If using dplyr is an option you can use:

library(dplyr)
df %>% 
    group_by(state,city, x, y) %>% 
    mutate(n = row_number()) %>% 
    spread(n, z, sep = '')

Note that the ordering is lost tho

GGamba
  • 13,140
  • 3
  • 38
  • 47