0

I'm trying to add new columns into a new excel with addition to existing column using pandas chained assignment. But this is so slow and memory consumption to high and showing below warning

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stab
le/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data_frame.loc[self.data_frame.index[self.row_index], config.col4] = self.col4
extract_comparsion.py:71: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

For better explanation

        src_file = source.xlsv
        data = pd.read_excel(src_file, header=1)
        df = pd.DataFrame(data, columns=['col1', 'col2'])
        result = df.loc[(df['col1'] != 'con') & (df['col2'] != 'con2')]
        self.data_frame = result.head(10)

        for index, row in self.data_frame.iterrows():
            self.row_index = index


        self.data_frame.loc[self.data_frame.index[self.row_index], config.status_label] = response
        self.data_frame.loc[self.data_frame.index[self.row_index], config.col4] = self.col4
        self.data_frame.loc[self.data_frame.index[self.row_index], config.col5l] = self.col5
        self.data_frame.loc[self.data_frame.index[
                                self.row_index], config.ipact_col6 = self.col6
       

        writer = pd.ExcelWriter(config.output_file, engine='xlsxwriter')
        print(writer)

        self.data_frame.to_excel(writer, sheet_name='Sheet1', index=False)
        writer.save()

           
        

May I know is there any other alternative for this implementation. If so please share your methods Thanks in advance

#UPDATE Suppose this source excel data

COL 1   COL2    COL3
AAA BBB CCC
DDD EEE FFF
GGG HHH III
JJJ KKK LLL
MMM NNN OOO

Based on the sql query I will get some result for col AAA is 1 and Similarly for DDD will get 2. So I need to form a final excel like below

COL 1   COL2    COL3 COL4
AAA BBB CCC 1
DDD EEE FFF 2
GGG HHH III 3
JJJ KKK LLL 4
MMM NNN OOO 5

The thing I need to pass the value of COL1 & COL2 to get query result. that is why am using this loop. Hope you got clear idea

Aaditya R Krishnan
  • 495
  • 1
  • 10
  • 31
  • Iterrows will be slow. For loops within pandas will be slow compared to column oriented calculations which are vectorised. My suggestion will be to rewrite your code by dropping those iterrows sections - if you share some data, someone can help out on that – sammywemmy Aug 13 '20 at 20:51
  • @sammywemmy I have modified the question with dummy. Sorry I couldn't share real data due to other issue. Hope now you can help – Aaditya R Krishnan Aug 13 '20 at 21:37
  • 1
    I'm like 98% sure this: `self.data_frame.loc[self.data_frame.index[self.row_index], config.status_label]` could be: `self.data_frame.loc[self.row_index, config.status_label]` – Paul H Aug 13 '20 at 21:39
  • @PaulH observation is spot on! – sammywemmy Aug 13 '20 at 21:40
  • Even after changing like that it still showing same warning – Aaditya R Krishnan Aug 13 '20 at 21:46
  • @sammywemmy Even after this am still getting this warning. May I know how can I resolve the warning and improve the execution speed – Aaditya R Krishnan Aug 13 '20 at 21:50
  • get rid of attribute access ``self.col4`` and use ``self.loc[:, "col4"]``. For speed, iterrows wont help. Since you are pulling data from sql, maybe write the code in sql? that much i can offer, with regards to the data provided – sammywemmy Aug 13 '20 at 21:52

0 Answers0