1

I have a large excel spreadsheet with 8 columns. Simplified it looks like this:

Identity column1  column2  column3  column4  column5  column6  column7
     1     tk=tr   cont=fb  sourc=a  camp=b    med=c    nm=he   ad=fst
     2     camp=a  cont=fb  sourc=a   med=b     NA        NA     NA
     3     vid=tr  sourc=b  camp=b    med=c   cont=fb     NA     NA

I need to organize it so that all of the cells containing cont are in the same column, all of the cells containing camp are in the same column, and so on. They are only partial matches though, as you can see some say camp=b, and some say camp=a, for example. I need each column to only have one type of string in it, one for cont, one for sourc, one for camp, one for med.

The "identity" column is formatted correctly, and the trk and vid values are unnecessary, but they only appear in some of the rows.

I looked into conditional formatting, advanced filters and sorts, ordering by custom lists, and vba all for excel. I'm happy to convert to an R file and work there.

I would like the output to look like this:

Identity  column1  column2  column3  column4  column5  column6  Column7
   1       cont=fb  sourc=a  camp=b    med=c    NA        NA       NA
   2       cont=fb  sourc=a  camp=a    med=b    NA        NA       NA
   3       cont=fb  sourc=b  camp=b    med=c    NA        NA       NA

The cont=, sourc=, camp=, and med= values are the only ones I need in the final product, so it would be great if I could ignore/delete the other ones in producing this, however not entirely necessary.

Thank you so much for any help you could offer!

jawk23
  • 13
  • 4
  • I see no cells with a content of "campaign=a" or "campaign=b" in your sample data. I do see cells with "camp=a" and "camp=b", which I imagine is what you mean, but it would be better if you don't assume we know what this is all about and remain consistent in how you label things, it can avoid misunderstandings. Your output doesn't seem to match your input. What happened to `med=b` on the second row? Where did `tk=tr` and `vid=tr` disapear to? [this](https://stackoverflow.com/help/how-to-ask) may be useful – cybernetic.nomad Jul 03 '19 at 15:01
  • @cybernetic.nomad thank you for pointing these out. So sorry for the sloppy post. I've edited the question. The `tk=tr` and `vid=tr` are not needed in my final product, but I thought it best to include them to show that the data does not consist of the same four types of values in every row. Some rows have 5 or 6 values and some only have 4 (the 4 that I _am_ interested in), leaving NA's in the other columns – jawk23 Jul 03 '19 at 15:29

1 Answers1

1

What you want can be done with INDEX/MATCH

Use the following formula in each of the four columns:

=INDEX(B2:H2,MATCH("cont=*",B2:H2,))

=INDEX(B2:H2,MATCH("sourc=*",B2:H2,))

=INDEX(B2:H2,MATCH("camp=*",B2:H2,))

=INDEX(B2:H2,MATCH("med=*",B2:H2,))

And populate down

enter image description here

cybernetic.nomad
  • 6,100
  • 3
  • 18
  • 31