Skip some columns between two columns when appending dataframe to existing empty dataframe

Question

Currently I'm extracting data from pdf's and putting it in a csv file. I'll explain how this works.

First I create an empty dataframe:

ndataFrame = pandas.DataFrame()

Then I read the data. Assume for simplicity reasons the data is the same for each pdf:

data = {'shoe': ['a', 'b'], 'fury': ['c','d','e','f'], 'chaos': ['g','h']}
dataFrame = pandas.DataFrame({k:pandas.Series(v) for k, v in data.items()})

Then I append this data to the empty dataframe:

ndataFrame = ndataFrame.append(dataFrame)

The is the output:

  shoe fury chaos
0    a    c     g
1    b    d     h
2  NaN    e   NaN
3  NaN    f   NaN

However, now comes the issue. I need some columns (let's say 4) to be empty between the columns fury and chaos. This is my desired output:

  shoe fury                        chaos
0    a    c                         g
1    b    d                         h
2  NaN    e                         NaN
3  NaN    f                         NaN

I tried some stuff with reindexing but I couldn't figure it out. Any help is welcome.

By the way, my desired output might be confusing. To be clear, I need some columns to be completely empty between fury and chaos(this is because some other data goes in there manually).

Thanks for reading

`this is because some other data goes in there manually`. Is this data input via Pandas or externally in CSV / Excel? — jpp, Oct 04 '18 at 21:58
Completely externally. This manual data goes in by hand after the script is done. — teller.py3, Oct 04 '18 at 21:58

user3483203 · Accepted Answer · 2018-10-04T22:04:24.830

4

This answer assumes you have no way to change the way the data is being read in upstream. As always, it is better to handle these types of formatting changes at the source. If that is not possible, here is a way to do it after parsing.

You can use reindex here, using numpy.insert to add your four columns:

dataFrame.reindex(columns=np.insert(dataFrame.columns, 2, [1,2,3,4]))

  shoe fury   1   2   3   4 chaos
0    a    c NaN NaN NaN NaN     g
1    b    d NaN NaN NaN NaN     h
2  NaN    e NaN NaN NaN NaN   NaN
3  NaN    f NaN NaN NaN NaN   NaN

edited Oct 04 '18 at 22:04

answered Oct 04 '18 at 21:55

user3483203

50,081
9
65
94

2

This is a pretty clever approach! instead of [1,2,3,4] I used 4 * [''] and it worked perfectly – teller.py3 Oct 04 '18 at 22:14

Skip some columns between two columns when appending dataframe to existing empty dataframe

1 Answers1