0

I have a very large (949,000 obs.) stacked (long form) data set. I would like to select cases based on the presence of data in one variable but keep the other stacked data for that Unique ID. Does that make sense?

Do you have any thoughts?

Jake
  • 3
  • 1
  • Can you please desribe what you mean by "stacked data set"? Or even better, give an example of your data structure and the expected result? – mirirai Sep 18 '14 at 07:21
  • Sure, another term for the data structure is called long form. So for each person there are multiple entries (rows). Lets say that on time 1, person A took a certain measure (Y) but never did again even though there are another twelve observations. I want to only select cases by individuals that took measure Y, but I need to keep that specific persons other observations despite not having data for Y in the row. – Jake Sep 18 '14 at 20:14
  • My formatting didn't hold up, disregard this specific message – Jake Sep 18 '14 at 20:23

2 Answers2

0

You can certainly do this in Statistics. Use AGGREGATE with the person id as the break variable and, say, the mean as the summary statistic. Choose to have the aggregate stat added to the cases. Then just select those cases where the aggregate is not missing.

JKP
  • 5,419
  • 13
  • 5
-1

SPSS is an application package for stats and is not a programming language. I would use a different member of the stackoverflow family.

Try Cross Validated (https://stats.stackexchange.com/)

Community
  • 1
  • 1
Kyle_at_NU
  • 60
  • 3
  • 1) This is not an answer, so better to leave it as a comment. 2) This is off-topic at CrossValidated because it is purely a programming question, so it is on topic here. The OP [cross-posted](http://stats.stackexchange.com/q/115833/1036) and it was closed! – Andy W Sep 18 '14 at 12:31