I Want To Filter Data For Excel Files Using Pandas

Question

I am trying to filter Data, for Excel Files in Pandas. Based on the Column Value i.e. String Value.

I Have tried the following to achieve what I want :-

Latest Code shown Below as of 07/08/2019

In [13]: 

import pandas as pd

xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')

data = pd.read_excel(xls, sheet_name="Sheet1")

pd.options.display.max_rows = 1000

df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID'])

df[(df['Venue'].str.contains('[a-zA-Z]') &  (df['DISPLAY/'].str.contains('DISPLAY') &  df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C']  != 'LANC'))] 

Out[13]: 

         Venue      A/C     DISPLAY/                   Date     BID                    

475     SHAWBURY    DAK     DISPLAY     2008-07-24 00:00:00     188
476     SHAWBURY   SPIT     DISPLAY     2008-07-24 00:00:00     188
477     COTTESMORE SPIT     DISPLAY                     NaN     757                
478     COTTESMORE  DAK     DISPLAY                     NaN     757               
484     SUNDERLAND SPIT     DISPLAY                     NaN     333               
487   EAST FORTUNE SPIT     DISPLAY                     NaN     406               
489     WINDERMERE   HS     DISPLAY     2008-07-25 00:00:00     138
490     WINDERMERE  DAK     DISPLAY     2008-07-25 00:00:00     138
504      WIGTON    DHS      DISPLAY     2008-07-26 00:00:00     144
506     WINDERMERE   HS     DISPLAY     2008-07-26 00:00:00     138
507     WINDERMERE  DAK     DISPLAY     2008-07-26 00:00:00     138
508     SUNDERLAND   HS     DISPLAY                     NaN     333                
509     SUNDERLAND  DAK     DISPLAY                     NaN     333

I want for example the SUNDERLAND Lines, to be changed to one line i.e. :-

SUNDERLAND DHS DISPLAY NaN 00:00:00

And for WINDERMERE two Lines i.e. :-

WINDERMERE DHS DISPLAY 2008-07-25 00:00:00
WINDERMERE DHS DISPLAY 2008-07-26 00:00:00

So what I am basically saying, from the Excel Table Data, displayed in Jupyter Notebook, how do I change all off :-

VENUE DAK DISPLAY 2008-09-10 00:00:00 
VENUE HS DISPLAY 2008-09-10 00:00:00

(The Dates in the rows above and below, will be whatever is shown in that row, not the specific date given above and the ones below, except the WINDERMERE ones, shown above which is correct)

Or

VENUE HS DISPLAY 2008-09-06 00:00:00 
VENUE DAK DISPLAY 2008-09-06 00:00:00

Or

VENUE DAK DISPLAY 2008-05-25 00:00:00 
VENUE SPIT DISPLAY 2008-05-25 00:00:00

And the other way around, like with DAK and HS

VENUE same both lines i.e. SUNDERLAND i.e. :-

SUNDERLAND 
SUNDERLAND

To :-

VENUE DHS DISPLAY 2008-09-10 00:00:00

like it says for the VENUE WIGTON

And the DAK and SPIT Ones to :-

VENUE DS DISPLAY 2008-09-10 00:00:00

i.e. SHAWBURY

Also if possible, to change the Year to 2009, in the date because 2008 is wrong.

And get rid of the other Data.

I have noticed that in the BID Column, which I now include. In the Data I wan't to keep, the Bid Numbers are the same in both rows, i.e. the same VENUE . So what should I type, to remove rows, where there is only 1 of a Bid Number i.e. the VENUE EAST FORTUNE.

But keep the Rows, where the Bid Numbers are the same, i.e. both lines directly underneath each other as shown in the picture ? Except where in the A/C Column the values are DHS or DSS ?

Hi Benoit, here is the output I get, which is different to yours for some reason.

[[475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
[476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
[477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],
[478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],
[484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],
[487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],
[489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
[490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
[504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
[506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
[507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
[508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],
[509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]]

Latest Data :-

Venue   A/C     DISPLAY/    Date    BID
25  SHUTTLEWORTH    DAK     DISPLAY     NaN     529
55  KEMBLE  DAK     DISPLAY     NaN     461
69  NORTHWICH   SPIT    DISPLAY     2008-05-10 00:00:00     514
72  POCKLINGTON     SPIT    DISPLAY     2009-05-10 00:00:00     821
75  BERLIN  DAK     DISPLAY     2008-05-12 00:00:00     587
78  MILDENHALL  SPIT    DISPLAY     2009-05-15 00:00:00     920
93  DUXFORD     HS  DISPLAY     NaN     611
103     CRANWELL    HS  DISPLAY     2008-05-20 00:00:00     44
145     SCARBOROUGH     DAK     DISPLAY     2008-05-25 00:00:00     610
150     SCARBOROUGH     SPIT    DISPLAY     2008-05-25 00:00:00     610
151     CORBRIDGE   SPIT    DISPLAY     NaN     353
167     BRIDGEND-CNX    SPIT    DISPLAY     2008-05-31 00:00:00     527
173     TARRANT RUSHDEN     HS  DISPLAY     NaN     132
174     TARRANT RUSHDEN     DAK     DISPLAY     NaN     132
179     NORTHOLT    SPIT    DISPLAY     2009-06-05 00:00:00     870
214     BRIZE NORTON    HS  DISPLAY     NaN     939
218     ROPLEY  HS  DISPLAY     2008-06-13 00:00:00     355
223     THWAITES    HS  DISPLAY     NaN     364
231     ROPLEY  HS  DISPLAY     NaN     355
240     COSFORD     HS  DISPLAY     2008-06-14 00:00:00     667
241     QUORN   HS  DISPLAY     NaN     314
244     COSFORD     DAK     DISPLAY     2008-06-14 00:00:00     NaN
260     REDHILL     SPIT    DISPLAY     NaN     686
269     KEMBLE  DAK     DISPLAY     NaN     316
270     KEMBLE  HS  DISPLAY     NaN     316
280     KEMBLE  SPIT    DISPLAY     2008-06-21 00:00:00     316
285     KEMBLE  DAK     DISPLAY     2008-06-21 00:00:00     316
317     BRUNTINGTHORPE  SPIT    DISPLAY     NaN     106
329     ELDWICK     SPIT    DISPLAY     NaN     430
333     PLYMOUTH HOE    DAK     DISPLAY     2008-06-28 00:00:00     528
340     OLD BUCKENHAM   HS  DISPLAY     NaN     424
344     COSFORD     SPIT    DISPLAY     2008-06-28 00:00:00     68
350     TAIN    SPIT    DISPLAY     NaN     433
355     WITTERING   SPIT    DISPLAY     2008-07-03 00:00:00     376
362     WADDINGTON  HS  DISPLAY     2008-07-04 00:00:00     666
372     ANNAN   DAK     DISPLAY     NaN     606
373     ANNAN   SPIT    DISPLAY     NaN     606
375     WADDINGTON  HS  DISPLAY     2008-07-05 00:00:00     666
389     SHAWBURY    SPIT    DISPLAY     2008-07-05 00:00:00     183
393     CRANWELL    SPIT    DISPLAY     2008-07-07 00:00:00     823
399     SWANSEA     HS  DISPLAY     NaN     335
403     DUXFORD LEGENDS     HS  DISPLAY     2008-07-11 00:00:00     37
405     YEOVILTON   HS  DISPLAY     NaN     549
407     WOODHOUSE   HS  DISPLAY     2008-07-11 00:00:00     545
429     CAPEL-LE-FERNE  HS  DISPLAY     2008-07-12 00:00:00     298
430     ARDINGLY(SUSSEX)    HS  DISPLAY     NaN     189
439     CAERNARFON  SPIT    DISPLAY     2008-07-18 00:00:00     481
445     PENARTH     DAK     DISPLAY     NaN     303
448     FAIRFORD(RIAT)  SPIT    DISPLAY     2008-07-19 00:00:00     625
455     CHOLMONDELEY CASTLE     SPIT    DISPLAY     NaN     494
459     HAMPSTHWAITE    DAK     DISPLAY     NaN     828
465     ODIHAM(CAS COMM)    SPIT    DISPLAY     2009-07-21 00:00:00     NaN
469     ODIHAM FAMS DAY     SPIT    DISPLAY     2008-07-22 00:00:00     6
470     HENLOW FAM DAY  SPIT    DISPLAY     NaN     146
475     SHAWBURY    DAK     DISPLAY     2008-07-24 00:00:00     188
476     SHAWBURY    SPIT    DISPLAY     2008-07-24 00:00:00     188
477     COTTESMORE  SPIT    DISPLAY     NaN     757
478     COTTESMORE  DAK     DISPLAY     NaN     757
484     SUNDERLAND  SPIT    DISPLAY     NaN     333
487     EAST FORTUNE    SPIT    DISPLAY     NaN     406
489     WINDERMERE  HS  DISPLAY     2008-07-25 00:00:00     138
490     WINDERMERE  DAK     DISPLAY     2008-07-25 00:00:00     138
504     WIGTON  DHS     DISPLAY     2008-07-26 00:00:00     144
506     WINDERMERE  HS  DISPLAY     2008-07-26 00:00:00     138
507     WINDERMERE  DAK     DISPLAY     2008-07-26 00:00:00     138
508     SUNDERLAND  HS  DISPLAY     NaN     333
509     SUNDERLAND  DAK     DISPLAY     NaN     333
511     AUDLEM  SPIT    DISPLAY     2008-07-26 00:00:00     706
524     LYNEHAM FAM DAY     SPIT    DISPLAY     NaN     662
525     MALVERN     SPIT    DISPLAY     NaN     26
527     DAMYNS HALL     SPIT    DISPLAY     2008-08-08 00:00:00     766
529     DUXFORD     SPIT    DISPLAY     2008-08-09 00:00:00     612
530     DAMYNS HALL     SPIT    DISPLAY     NaN     766
534     BLACKPOOL   HS  DISPLAY     NaN     698
540     EASTBOURNE  HS  DISPLAY     2008-08-13 00:00:00     407
543     EASTBOURNE  HS  DISPLAY     2008-08-14 00:00:00     407
546     EASTBOURNE  HS  DISPLAY     2008-08-15 00:00:00     407
548     ROUGHAM     SPIT    DISPLAY     2009-08-15 00:00:00     551
551     DUXFORD     DAK     DISPLAY     2008-08-15 00:00:00     613
552     ROUGHAM     DAK     DISPLAY     NaN     551
556     EASTBOURNE  HS  DISPLAY     2008-08-16 00:00:00     407
564     CROMER  SPIT    DISPLAY     2008-08-19 00:00:00     139
569     CRANWELL    SPIT    DISPLAY     2009-08-20 00:00:00     913
578     SHOREHAM    SPIT    DISPLAY     NaN     366
581     RAMSEY  SPIT    DISPLAY     2009-08-22 00:00:00     896
583     WHITBY  DAK     DISPLAY     2008-08-22 00:00:00     125
584     WHITBY  HS  DISPLAY     2008-08-22 00:00:00     125
586     WHITBY  HS  DISPLAY     2008-08-23 00:00:00     125
587     WHITBY  DAK     DISPLAY     2008-08-23 00:00:00     125
591     SHOREHAM    HS  DISPLAY     2008-08-23 00:00:00     366
605     TWINWOOD AIRFIELD   DAK     DISPLAY     2008-08-30 00:00:00     934
608     ROMSEY  DAK     DISPLAY     NaN     175
613     STONELEIGH PARK     HS  DISPLAY     NaN     NaN
614     DUNSFOLD    HS  DISPLAY     NaN     583
622     ROMSEY  DAK     DISPLAY     2008-08-31 00:00:00     175
623     ROMSEY  HS  DISPLAY     NaN     175
625     STONELEIGH PARK     HS  DISPLAY     NaN     563
629     TWINWOOD AIRFIELD   DAK     DISPLAY     NaN     934
633     PLYMOUTH HOE    SPIT    DISPLAY     2008-09-03 00:00:00     295
646     PORTRUSH    DAK     DISPLAY     2008-09-05 00:00:00     76
647     PORTRUSH    HS  DISPLAY     2008-09-05 00:00:00     76
652     DUXFORD     HS  DISPLAY     NaN     615
659     PORTRUSH    HS  DISPLAY     2008-09-06 00:00:00     76
660     PORTRUSH    DAK     DISPLAY     2008-09-06 00:00:00     76
667     GREAT YARMOUTH  HS  DISPLAY     NaN     655
678     ODIHAM  SPIT    DISPLAY     2008-09-09 00:00:00     137
686     GUERNSEY    DAK     DISPLAY     2008-09-10 00:00:00     582
687     GUERNSEY    HS  DISPLAY     2008-09-10 00:00:00     582
688     JERSEY  DAK     DISPLAY     2008-09-10 00:00:00     581
689     JERSEY  HS  DISPLAY     2008-09-10 00:00:00     581
711     WOODSTOCK(BLENHEIM)     HS  DISPLAY     2008-09-12 00:00:00     362
717     MORECAMBE   SPIT    DISPLAY     NaN     368
720     WOODSTOCK   DAK     DISPLAY     2008-09-13 00:00:00     362
721     WOODSTOCK   HS  DISPLAY     NaN     362
744     NORTHOLT    HS  DISPLAY     2008-09-17 00:00:00     532
746     GOODWOOD    HS  DISPLAY     2008-09-18 00:00:00     256
755     GOODWOOD    HS  DISPLAY     2008-09-19 00:00:00     256
756     NORTHOLT    HS  DISPLAY     2009-09-19 00:00:00     863
763     KEMBLE  HS  DISPLAY     NaN     330
766     KEMBLE  HS  DISPLAY     2008-09-20 00:00:00     330
774     SANICOLE    DAK     DISPLAY     NaN     69
776     GOODWOOD    HS  DISPLAY     2008-09-20 00:00:00     256
777     SANICOLE    HS  DISPLAY     NaN     69
790     SOUTHPORT   HS  DISPLAY     NaN     584
791     SOUTHPORT   DAK     DISPLAY     NaN     584
803     DUXFORD     SPIT    DISPLAY     2008-10-11 00:00:00     616

Any help would be appreciated

Regards

Eddie Winch

It's hard to follow your explanation, what's the logic behind going from `DAK` & `HS` to `DHS`? — Erfan, Aug 04 '19 at 14:20
It's easier if you provide an example dataset of 5-10 rows which represents your data and based on your example dataset what your expecetd output is. — Erfan, Aug 04 '19 at 14:21
DHS means Dakota Hurricane and Spitfire but the Excel Data displays DAK with HS underneath, for most of the DHS appearances, at the Venue Displays. — Edward Winch, Aug 04 '19 at 14:23
The Way i have posted the Venue Data, is how the data displays, when I run my Code — Edward Winch, Aug 04 '19 at 14:27
Take a look at [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Erfan, Aug 04 '19 at 14:29
Thanks for that Erfan, I am confused, but will try my best to edit my post accordingly, if someone could run my Code in Jupyter Notebook, that would help. — Edward Winch, Aug 04 '19 at 14:47
You have to put some effort into understanding how to ask a good question. This way you will get quality answers. [Here's](https://stackoverflow.com/questions/56839795/enumerate-columns-with-same-prefix) a question of mine. The content is not important, have a look how I build my question so people can copy paste, test themself en visually see what I want — Erfan, Aug 04 '19 at 19:00
OK Erfan, thanks for your example Question, I included a photoshot, of what I got when I run my Code as part of my question. What I am aiming to achieve, isn't that complex really, I wouldn't say, I nevertheless appreciate your advice. — Edward Winch, Aug 04 '19 at 19:26

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

[Updated] - This is kin of weird but it respects the rules you want to apply

(which are a little weird as well, so it makes sense)

1. Create the Dataframe

In [1]:
import pandas as pd
 
data = [
        [475, 'SHAWBURY', 'DAK', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [476, 'SHAWBURY', 'SPIT', 'DISPLAY', '2008-07-24 00:00:00', 188],
        [477, 'COTTESMORE', 'SPIT', 'DISPLAY', None, 757],                
        [478, 'COTTESMORE', 'DAK', 'DISPLAY', None, 757],               
        [484, 'SUNDERLAND', 'SPIT', 'DISPLAY', None, 333],           
        [487, 'EAST FORTUNE', 'SPIT', 'DISPLAY', None, 406],             
        [489, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [490, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-25 00:00:00', 138],
        [504, 'WIGTON', 'DHS', 'DISPLAY', '2008-07-26 00:00:00', 144],
        [506, 'WINDERMERE', 'HS', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [507, 'WINDERMERE', 'DAK', 'DISPLAY', '2008-07-26 00:00:00', 138],
        [508, 'SUNDERLAND', 'HS', 'DISPLAY', None, 333],                
        [509, 'SUNDERLAND', 'DAK', 'DISPLAY', None, 333]
       ]
df = pd.DataFrame(data, columns=['Index', 'Venue', 'A/C', 'DISPLAY', 'Date', 'BID']).set_index('Index')
df

Out [1]:

       Venue        A/C     DISPLAY     Date                    BID
Index                   
475    SHAWBURY     DAK     DISPLAY     2008-07-24 00:00:00     188
476    SHAWBURY     SPIT    DISPLAY     2008-07-24 00:00:00     188
477    COTTESMORE   SPIT    DISPLAY     None                    757
478    COTTESMORE   DAK     DISPLAY     None                    757
484    SUNDERLAND   SPIT    DISPLAY     None                    333
487    EAST FORTUNE SPIT    DISPLAY     None                    406
489    WINDERMERE   HS      DISPLAY     2008-07-25 00:00:00     138
490    WINDERMERE   DAK     DISPLAY     2008-07-25 00:00:00     138
504    WIGTON       DHS     DISPLAY     2008-07-26 00:00:00     144
506    WINDERMERE   HS      DISPLAY     2008-07-26 00:00:00     138
507    WINDERMERE   DAK     DISPLAY     2008-07-26 00:00:00     138
508    SUNDERLAND   HS      DISPLAY     None                    333
509    SUNDERLAND   DAK     DISPLAY     None                    333

2. Manipulate your dataframe

In [2] :
## Keep BID where we have at least 2 rows
test = df.groupby(by=['BID', 'Venue', 'DISPLAY']).count()
test = test[test['A/C']>1]
bids = test.reset_index().BID.tolist()

# Here if there is already `DHS` and `DS` in the column `A/C`, I want to keep them
df.loc[df['A/C']=='DHS', 'Aircraft'] = 'DHS'
df.loc[df['A/C']=='DS', 'Aircraft'] = 'DS'

# I keep 1 row for each bid that has at least 2 rows, and their Aircraft's value are updated
for bid in bids:
    df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS' 
    df.loc[(df['BID']==bid) & (df['A/C']=='SPIT'), 'Aircraft'] = 'DS' 
    

df = df[df['Aircraft'].notnull()].drop(columns=['A/C'], axis=1)
data

Out [2]:

        Venue       DISPLAY     Date                BID     Aircraft
Index                   
475     SHAWBURY    DISPLAY     2008-07-24 00:00:00 188     DHS
476     SHAWBURY    DISPLAY     2008-07-24 00:00:00 188     DS
477     COTTESMORE  DISPLAY     None                757     DS
478     COTTESMORE  DISPLAY     None                757     DHS
484     SUNDERLAND  DISPLAY     None                333     DS
490     WINDERMERE  DISPLAY     2008-07-25 00:00:00 138     DHS
504     WIGTON      DISPLAY     2008-07-26 00:00:00 144     DHS
507     WINDERMERE  DISPLAY     2008-07-26 00:00:00 138     DHS
509     SUNDERLAND  DISPLAY     None                333     DHS

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 07 '19 at 03:59

Benoit Drogou

969
1
5
15

Hi Benoit, Many thanks for your help, what would be a better shorter way, of typing my Code, how do i encoreperate a groupby in my Code ? I used ;- df.groupby('BID').filter(lambda x : len(x)>1) in my code before and those rows were excluded when I ran my Code. But the other data I want excluding from the A/C column i.e. LANC and LHS, were now displaying. – Edward Winch Aug 07 '19 at 16:00
Can you print the head of your dataframe and the code you run ? It would help me a lot to understand what you are doing – Benoit Drogou Aug 07 '19 at 16:14
It would be great to edit your question with: 1- Copy and past your `dataframe.head()` instead of a picture. 2- Write down clearly what you what to do 3- (Optional) Talk about what you have tried and what is not working properly – Benoit Drogou Aug 07 '19 at 16:19
Hi Benoit, I have edited my Thread, could you help me out again ? – Edward Winch Aug 07 '19 at 18:58
I am working on it, but it is really hard to me to understand what you want to do with this DHS/DSS/DAK/SPIT/HS. It seems like you want to have a different rule for each one of them.. – Benoit Drogou Aug 07 '19 at 19:47
Can you explain properly all the logic behind the `A/C` column ? What should stay the same, what should be "merged" with an other row ? – Benoit Drogou Aug 07 '19 at 20:10
The A/C Column is the Aircraft Allocation Column for a VENUE, there are two rows of the Venue SUNDERLAND for example, one below the other, in one of the rows A/C Column it says HS, and in the other one DAK. I want those two rows, to be changed to one row saying :- SUNDERLAND DHS DISPLAY NaN 333. Basically how the Venue WIGTON Row looks like. – Edward Winch Aug 07 '19 at 21:02
For the Venue SHAWBURY, the same criteria applies, as for SUNDERLAND, except one row says DAK in the A/C Column and the other SPIT, so I want those two rows to be changed, into one saying :- SHAWBURY DS DISPLAY 2009-07-24 00:00:00 188 – Edward Winch Aug 07 '19 at 21:05
The year 2008 needs changing to 2009. And the same applies for the other, same occurences when I do a filtering of the Excel File Data, they will have the same BID Number, I wan't other data not to be displayed, i.e. thats why I typed :- df[(df ['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] Could you come up with a better way of writing the Code ? i.e. an alternative to writing several codes containing str.contains ? – Edward Winch Aug 07 '19 at 21:06
With the Venue WINDERMERE, there are Four Lines i.e. WINDERMERE HS DISPLAY 2008-07-25 00:00:00 138 WINDERMERE DAK DISPLAY 2008-07-25 00:00:00 138 WINDERMERE HS DISPLAY 2008-07-26 00:00:00 138 WINDERMERE DAK DISPLAY 2008-07-26 00:00:00 138 I want those changing to two lines i.e. WINDERMERE DHS DISPLAY 2008-07-25 00:00:00 138 + WINDERMERE DHS DISPLAY 2008-07-26 00:00:00 138 – Edward Winch Aug 07 '19 at 21:21
Here is the link to the Excel File, if you need it Benoit :- http://web.archive.org/web/20090804234934/http://www.raf.mod.uk/bbmf/rafcms/mediafiles/F0ED6EA8_1143_EC82_2E4534A1036AA506.xls – Edward Winch Aug 07 '19 at 21:51
Thankyou so much, for taking the time to help me Benoit, it's very much appreciated. Just wondering, why for the Venues COTTESMORE and SHAWBURY do they have two rows, both should have just the row where A/C is DS ? i.e. the Hurricane didn't display at those two venues, i.e. shouldn't be DHS. The other data is correct though )) – Edward Winch Aug 07 '19 at 22:33
There is a slight error in my code, I will update it on a laptop in hours. In the loop, I wrote `df.loc[(df['BID']==bid) & (df['A/C']=='DAK'), 'Aircraft']= 'DHS'`. Replace `DAK` by `HS ` and you are good – Benoit Drogou Aug 07 '19 at 22:42
Hi Benoit, only if you have time, how could you search and sort through the Excel File, I have in question ? to get the same results, rather than using, the method of pasting all the data to start with ? Because the data I posted, was a small part, of the data that displayed, when I ran my Original Code. – Edward Winch Aug 09 '19 at 15:09
Well, what is the logic with `LANC`, `SS`, `HURRI`, `LSSD`, `PARA`, `LHS` that are in your file but not in your exemple ?? Do you want to "merge" them weirdly like the other one ? How ? – Benoit Drogou Aug 12 '19 at 15:28
Yes, merge them like you did, but having the data displayed, from the Excel File, when the code is run. I would want to exclude the LANC and LHS, and LSSD Data, basically everything with LANC in, like before. – Edward Winch Aug 12 '19 at 18:30
So at the end in the 'Aircraft' column you only have `DHS` and `HS` values ? – Benoit Drogou Aug 12 '19 at 19:51
Yes DHS, DSS or DS I don't think there are any DH occurances, but maybe wrong. – Edward Winch Aug 12 '19 at 20:59
And what creates `DSS` ? – Benoit Drogou Aug 12 '19 at 21:01
df['A/C'].str.contains("DHS|DSS|DAK|HS|SPIT")) – Edward Winch Aug 12 '19 at 21:02
Sorry for the miss-understanding. Is DSS concatenation of 2 things like DHS and DS? If so, a concatenation of what ? – Benoit Drogou Aug 12 '19 at 21:05
(df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] – Edward Winch Aug 12 '19 at 21:29
sorry but I do not think it answers the last question – Benoit Drogou Aug 12 '19 at 21:30
Yes DAK and SS but also can be DSS allready like with WIGTON DHS – Edward Winch Aug 12 '19 at 21:37
Wait, I thought `DAK` becomes `DHS`. And now you say it can become `DSS` as well ? – Benoit Drogou Aug 12 '19 at 21:44
DAK and HS becomes DHS, I also noticed, in another of my Excel Files, that there are DAK and SS appearances at a Venue, so that becomes DSS – Edward Winch Aug 12 '19 at 22:17
I am going to bed now, but will speak to you again tomorrow. Many thanks for contacting me again Benoit. Eddie – Edward Winch Aug 12 '19 at 22:21
Hi Benoit, did you get my message, about DSS ? – Edward Winch Aug 13 '19 at 13:40
I did, but I have many things to do and I think you have all the clues here to solve your problem. What I am thinking of is only add some line for DSS – Benoit Drogou Aug 13 '19 at 13:45
OK Benoit I understand. – Edward Winch Aug 13 '19 at 13:54
Edward, if you still have questions about that, please make sure you understand the code that Benoit kindly provided. If you do, and there is still something you need help with, please let us know what exactly it is you are struggling with, or what is not clear in Benoits code. Else, please mark his answer as accepted, see https://stackoverflow.com/help/someone-answers. – user7217806 Sep 01 '19 at 11:14
Hi user7217806, How in the Create The Dataframe, part of Benoit's Code, How did he get all that Data typed like it is, after the data = [ part. I am assuming he didn't type all that data out ? manually as it is shown with brackets etc ? And also how when I run my Original Code in Pandas, do I copy the output Data i.e. after out [1]: for example, to post on here if need be ? – Edward Winch Sep 01 '19 at 13:18
Also is there a way I can filter the data, the way Benoit shows, by modifying my Original Python Code ? So that the same output, Benoit gets to, is displayed in the Excel Document output, rather than separately ? What would I need to type ? – Edward Winch Sep 01 '19 at 13:33
You should be able to replace the `data` variable of the answer with your first two lines after the import (so the xls = pd.ExcelFile ..., data = pd.read_excel....) to use your own data. – user7217806 Sep 04 '19 at 06:12
Thanks user7217806, for your help. Benoit, I get the following output, when I run your Code step by step, in Jupyter Notebook. :- – Edward Winch Sep 09 '19 at 15:04
See after the line, in my edited Code, which says :- Hi Benoit, here is the output I get, which is different to yours for some reason. Any ideas why I don't get the output you show ? Eddie – Edward Winch Sep 09 '19 at 15:10
Hi Benoit, Also how I change all the latest data, to how you have the data presented, in your Create The Dataframe Section. After the data = i.e. how you get all the brackets, in the correct position and commas, or did you do that manually ? See my Edited Post for the latest Data. – Edward Winch Sep 09 '19 at 15:20
Can you indent A block of Text, by 4 spaces, without having to manually indent every line in that block ? It takes ages to get the text, as yours is shown. As you can see from the Latest data, for me to get the text as yours done manually, will take ages. Is there are quicker automatic method to do this Benoit ? I simply copied the text from Jupyter Notebook, by scrolling over all the text, then pasting, then doing the indentation manually. – Edward Winch Sep 09 '19 at 15:34
Can anyone please help me ? – Edward Winch Oct 08 '19 at 10:19

I Want To Filter Data For Excel Files Using Pandas

1 Answers1

1. Create the Dataframe

2. Manipulate your dataframe

Linked