Reconstruct a database

Question

How do I get the lag/lead function to work on the first case number per patient in my database?

I have a database of 1k+ variables and 800k rows. Each row is an intervention on a segment of a vessel, one patient may have several rows = many interventions on different vesselsegment but all within the same hospitalization. There are 4 vessels and I when I restructure the database I get 4 cases for each row=intervention. I have tried to post the data below:

data list list /id_nr (f6) segment_id_nr (f6) date (date9) C_RCA (f6) C_LM (f6) C_LAD (f6) C_LCx (f6) VESSEL(a3)  max_stenos (f6) Culprit_PCI (f6) Procedure_type (f6).
BEGIN DATA
1, 5, 12-Jun-06, 1.00, .00, .00, .00, RCA, 3.00, 1.00, 2.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LAD, 4.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LCX, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, RCA, 3.00, 1.00, 2.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LAD, 4.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LCX, 1.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, RCA, 3.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LAD, 4.00, 1.00, 2.00
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LCX, 1.00, , 
END DATA.
dataset name OrigData.

It should look something like this: enter image description here

What I want is to combine all data from the last two variables to be contained within the first 4 rows and later remove remaining rows so that I have 4 rows per hospitalization, each row corresponding to intervention in that row see data below that i have copy pasted in excel:

data list list /id_nr (f6) segment_id_nr (f6) date (date9) C_RCA (f6) C_LM (f6) C_LAD (f6) C_LCx (f6) VESSEL(a3)  max_stenos (f6) Culprit_PCI (f6)         Procedure_type (f6) Culprit_PCI2 (f6)  Procedure_type2 (f6).
BEGIN DATA
1, 5, 12-Jun-06, 1.00, .00, .00, .00, RCA, 3.00, 1.00, 2.00,1.00, 2.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LM, 1.00, , , ,
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LAD, 4.00, , ,1.00, 2.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LCX, 1.00, , , ,
END DATA. 
dataset name OrigData.

I have tried with the lag/lead function but i cant get it to work

I have the following code:

compute seq = $casenum.
execute.
SORT CASES BY seq.
CREATE PCI_other_segmentvessel = LAG(Culprit_PCI,4).
CREATE proceduret_type2 = LEAD(procedure_type).
Execute.

Is this possible to do? Is it perhaps a lag

IF sid= lag(sid) and Vessel=lag(Vessel) PCI2 = lag(Culprit_PCI,4).
Execute.

None of these works properly. Could you point me to the right direction? I dont know if data will work but if copy pasted into excel it works.

Kind regards

I find it very difficult to understand the structure of your existing data and the structure you're trying to get to. Please find a way to post your data samples more clearly - (e.g. look up DATA LIST - BEGIN DATA - END DATA) and post a full command here. — eli-k, Jun 14 '16 at 06:52
I am new to this and trying to do my best. I couldnt really get the data list to work but managed to add an image of how the dataset looks like on excel, I hope it may help. — Karatekid, Jun 15 '16 at 15:14
I edited your post with a full data list command for the sample of the original data. Please use this example to create a target data and clarify what you need exactly. — eli-k, Jun 16 '16 at 07:39
Thank you Eli. I got the data list to work on target data. As you can see, I want the results from Culprit_PCI and Procedure_type to be created either as new variables or in the same variables but "lifted" up to the first four rows/cases as each one of them corresponds to a specific coronary artery. The best alternative is to have the results lifted up to the same variables but thats an easy fix later on. — Karatekid, Jun 16 '16 at 14:01

score 0 · Answer 1 · answered Jun 14 '16 at 13:05

0

You can test for a change in the ID in your calculation, but it is easier to use the SHIFT VALUES command along with SPLIT FILES. SHIFT VALUES will not cross split boundaries.

answered Jun 14 '16 at 13:05

JKP

5,419
13
5

Thanks for reply:) Im not sure i understand what you mean with test for a change in the ID. How do I do that? Is there a syntax code for it? Should I split files by idnr or the segment id? How do i make sure the results from the lag are pasted on the first 4 cases of each individual? Sorry for the many questions:) – Karatekid Jun 15 '16 at 15:23
The shift command with split files worked very fine, however I have the issue with having the value in the 4 uppermost cases. split file by sid. SHIFT VALUES VARIABLE = Culprit_PCI RESULT = CPCI2 SHIFT = 4. Is there a way to change the 4? – Karatekid Jun 15 '16 at 16:14

score 0 · Answer 2 · answered Jun 17 '16 at 10:18

You say you've reached this situation after some restructuring - I strongly suggest you start a new question with the original structure and the final structure you are trying to reach, there's a good chance we can find a much simpler shortcut that doesn't go through this point.
Assuming you DO have to go through this point, the following syntax creates something very similar to your target structure, including removing extra lines. I hope this helps tho I'm not absolutely sure I completely understand what you need:

*creating sample data (a bit different from your original sample, to help the demonstration work).
data list list /id_nr (f6) segment_id_nr (f6) date (date9) C_RCA (f6) C_LM (f6) C_LAD (f6) C_LCx (f6) VESSEL(a3)  max_stenos (f6) Culprit_PCI (f6) Procedure_type (f6).
BEGIN DATA
1, 5, 12-Jun-06, 1.00, .00, .00, .00, RCA, 3.00, 1.00, 2.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LAD, 4.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LCX, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, RCA, 3.00, 2.00, 1.00
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LAD, 4.00, , 
1, 5, 12-Jun-06, 1.00, .00, .00, .00, LCX, 1.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, RCA, 3.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LM, 1.00, , 
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LAD, 4.00, 1.00, 2.00
1, 5, 12-Jun-06, .00, .00, 1.00, .00, LCX, 1.00, , 
END DATA.
dataset name OrigData.

* removing empty lines .
select if not missing(Culprit_PCI).

* counting occurrences of each vessel to use as index.
sort cases by id_nr segment_id_nr date VESSEL.
compute tmp=1.
split file by id_nr segment_id_nr date VESSEL.
CREATE ProcNum=CSUM(tmp).
split file off.

* restructuring.
format ProcNum(f6).
sort cases by  ‎‪id_nr‬‎ ‎‪segment_id_nr‬‎ ‎‪date‬‎ ‎‪C_RCA‬‎ ‎‪C_LM‬‎ ‎‪C_LAD‬‎ ‎‪C_LCx‬‎ ‎‪VESSEL  ‎‪max_stenos ProcNum.
casestovars /id= ‎‪id_nr‬‎ ‎‪segment_id_nr‬‎ ‎‪date‬‎ ‎‪C_RCA‬‎ ‎‪C_LM‬‎ ‎‪C_LAD‬‎ ‎‪C_LCx‬‎ ‎‪VESSEL  ‎‪max_stenos/index ProcNum /drop=tmp/separator="_"/groupby=index.

Reconstruct a database

2 Answers2