4

Currently I'm extracting data from pdf's and putting it in a csv file. I'll explain how this works.

First I create an empty dataframe:

ndataFrame = pandas.DataFrame()

Then I read the data. Assume for simplicity reasons the data is the same for each pdf:

data = {'shoe': ['a', 'b'], 'fury': ['c','d','e','f'], 'chaos': ['g','h']}
dataFrame = pandas.DataFrame({k:pandas.Series(v) for k, v in data.items()})

Then I append this data to the empty dataframe:

ndataFrame = ndataFrame.append(dataFrame)

The is the output:

  shoe fury chaos
0    a    c     g
1    b    d     h
2  NaN    e   NaN
3  NaN    f   NaN

However, now comes the issue. I need some columns (let's say 4) to be empty between the columns fury and chaos. This is my desired output:

  shoe fury                        chaos
0    a    c                         g
1    b    d                         h
2  NaN    e                         NaN
3  NaN    f                         NaN

I tried some stuff with reindexing but I couldn't figure it out. Any help is welcome.

By the way, my desired output might be confusing. To be clear, I need some columns to be completely empty between fury and chaos(this is because some other data goes in there manually).

Thanks for reading

teller.py3
  • 822
  • 8
  • 22
  • `this is because some other data goes in there manually`. Is this data input via Pandas or externally in CSV / Excel? – jpp Oct 04 '18 at 21:58
  • Completely externally. This manual data goes in by hand after the script is done. – teller.py3 Oct 04 '18 at 21:58

1 Answers1

4

This answer assumes you have no way to change the way the data is being read in upstream. As always, it is better to handle these types of formatting changes at the source. If that is not possible, here is a way to do it after parsing.


You can use reindex here, using numpy.insert to add your four columns:

dataFrame.reindex(columns=np.insert(dataFrame.columns, 2, [1,2,3,4]))

  shoe fury   1   2   3   4 chaos
0    a    c NaN NaN NaN NaN     g
1    b    d NaN NaN NaN NaN     h
2  NaN    e NaN NaN NaN NaN   NaN
3  NaN    f NaN NaN NaN NaN   NaN
user3483203
  • 50,081
  • 9
  • 65
  • 94