0

My input spark dataframe is;

Year  Month        Client 
2018  1            1        
2018  2            1         
2018  3            1         
2018  4            1         
2018  5            1         
2018  6            1        
2018  7            1        
2018  8            1        
2018  9            1         
2018  10           1          
2018  11           1        
2018  12           1    
2019  1            1        
2019  2            1         
2019  3            1         
2019  4            1         
2019  5            1         
2019  6            1        
2019  7            1        
2019  8            1        
2019  9            1         
2019  10           1          
2019  11           1        
2019  12           1  
2018  1            2        
2018  2            2         
2018  3            2         
2018  4            2         
2018  5            2         
2018  6            2        
2018  7            2        
2018  8            2        
2018  9            2         
2018  10           2        
2018  11           2        
2018  12           2        
2019  1            2        
2019  2            2         
2019  3            2         
2019  4            2         
2019  5            2         
2019  6            2        
2019  7            2        
2019  8            2        
2019  9            2         
2019  10           2        
2019  11           2        
2019  12           2      

Dataframe is ordered by client, year and month. I want to extract the data after 2019-06 for each clients.

I shared the desired output according to the data above;

Year  Month        Client 
2018  1            1        
2018  2            1         
2018  3            1         
2018  4            1         
2018  5            1         
2018  6            1        
2018  7            1        
2018  8            1        
2018  9            1         
2018  10           1          
2018  11           1        
2018  12           1    
2019  1            1        
2019  2            1         
2019  3            1         
2019  4            1         
2019  5            1         
2019  6            1        
2018  1            2        
2018  2            2         
2018  3            2         
2018  4            2         
2018  5            2         
2018  6            2        
2018  7            2        
2018  8            2        
2018  9            2         
2018  10           2        
2018  11           2        
2018  12           2        
2019  1            2        
2019  2            2         
2019  3            2         
2019  4            2         
2019  5            2         
2019  6            2        

Could you please help me about this?

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

mck
  • 40,932
  • 13
  • 35
  • 50
Salih
  • 719
  • 1
  • 6
  • 12

1 Answers1

1

Did you mean before 2019-06? (you wrote after 2019-06)

If so, you can do a filter:

df2 = df.filter('Year < 2019 or (Year = 2019 and Month <= 6)')
mck
  • 40,932
  • 13
  • 35
  • 50