0

I have a table that looks like below

Loc ID filter P1
A ABC1 GHY 55.6
A DFT1 FGH 67.8
B HJH5 GHY 67
C HKL BHY 78
B GTY FGH 60

I want the output as below. Basically, I want the records with the same Filter to be one row

Filter ID Loc P1 m_ID m_Loc m_p1 total
GHY ABC1 A 55.6 HJH5 B 67 122.6
FGH DFT1 A 67.8 GTY B 60 127.8

Is this achievable using itertools i python. If yes can someone please suggest how can we do it?

Sai
  • 47
  • 1
  • 2
  • 7

2 Answers2

0

Here's a solution using lead and row_number that I think is a little nicer.

select filter
      ,id
      ,loc 
      ,p1
      ,m_id
      ,m_loc
      ,m_p1

from 
      (with t2 as 
      (select row_number () over( partition by filter order by filter desc) as rn
      ,*
       from t)
              select rn,filter, id, loc, p1
             ,lead(id) over( partition by filter order by filter)  as m_id
             ,lead(loc) over( partition by filter order by filter) as m_loc
             ,lead(p1) over( partition by filter order by filter)  as m_p1
              from t2) t
where rn=1  
filter id loc p1 m_id m_loc m_p1
BHY HKL C 78 null null null
FGH DFT1 A 67.8 GTY B 60
GHY ABC1 A 55.6 HJH5 B 67

Fiddle

There should be a better solution to this question, but here is a solution that's based on what you did. I used left join to not lose filters that only appear once and then I used group by to consolidate the results.

select t1.filter
      ,max(t1.id)  as id
      ,max(t1.loc) as loc
      ,max(t1.p1)  as p1
      ,min(t2.id)  as m_id
      ,min(t2.loc) as m_loc
      ,min(t2.p1)  as m_p1
      
from t as t1 left join t as t2 on t2.filter = t1.filter and t2.id <> (t1.id)
group by t1.filter
filter id loc p1 m_id m_loc m_p1
BHY HKL C 78 null null null
FGH GTY B 67.8 DFT1 A 60
GHY HJH5 B 67 ABC1 A 55.6

Fiddle

DannySlor
  • 4,574
  • 1
  • 3
  • 11
  • Thanks, but will the solution work if there are more than 2 matches – Sai Aug 17 '22 at 04:37
  • You have to know in advance how many columns you have in SQL so the join solution won't work, but you can figure out how to do it using the first solution I gave. You will probably have to have a lot of nulls though. – DannySlor Aug 17 '22 at 14:23
0

If the usage of pandas is possible, you can achive a flexible solutiion with the following:

Definition of the data:

df=pd.DataFrame({'Loc': {0: 'A', 1: 'A', 2: 'B ', 3: 'C', 4: 'B'},
 'ID': {0: 'ABC1', 1: 'DFT1', 2: 'HJH5', 3: 'HKL', 4: 'GTY'},
 'filter': {0: 'GHY', 1: 'FGH', 2: 'GHY', 3: 'BHY', 4: 'FGH'},
 'P1': {0: 55.6, 1: 67.8, 2: 67.0, 3: 78.0, 4: 60.0}}) 

Creation of the repetive columns:

cols=["{}_{}".format(N, c) for N in range(0,df.groupby('filter').count()['ID'].max()) for c in df.columns]

Here, I first find the maximum required repitions by looking for the max occurences of each filter df.groupby('filter').count()['ID'].max(). The remaining code is just formating by adding a leading number.

Creation of new dataframe with filter as index and the generated columns cols as columns

df_new=pd.DataFrame(index=set(df['filter']), columns=cols)

Now we have to fill in the data:

for fil in df_new.index:
    values=[val for row in df[df['filter']==fil].values for val in row]
    df_new.loc[fil,df_new.columns[:len(values)]]=values

Here two things are done: First, the selected values based on the filter name fil are flattend by [val for row in df[df['filter']==fil].values for val in row]. Then, these values are filled into the dataframe starting at the left.

The result is as expected:

    0_Loc   0_ID    0_filter    0_P1    1_Loc   1_ID    1_filter    1_P1
GHY     A   ABC1    GHY     55.6    B   HJH5    GHY     67.0
BHY     C   HKL     BHY     78.0    NaN     NaN     NaN     NaN
FGH     A   DFT1    FGH     67.8    B   GTY     FGH     60.0
Jacob
  • 304
  • 1
  • 6