0

I have this data set in an Excel file. I want to keep the data which have only length 6 and delete rest and export it in the split of single values stored in a separate column.

Please tell me if we have any function to split the numeric values in the file to read it and split

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • what did you get when you read in the file? – sammywemmy Jan 25 '20 at 10:26
  • It’s big data like 9.8 million error related to memory so before reading to dataframe I want to grab only values which have 6 digits in the row and drop other –  Jan 25 '20 at 12:27
  • IIUC, your data is in an excel file, so maybe openpyxl can help. You iterate through the rows and keep only those with length 6. hopefully this post here can guide you : https://stackoverflow.com/questions/35823835/reading-excel-file-is-magnitudes-slower-using-openpyxl-compared-to-xlrd – sammywemmy Jan 26 '20 at 07:32

1 Answers1

2

From your shared data it seems it has spaces between numbers so they will already be in str

you can try below code:

your df looks like this:

    a
0   11
1   2
2   3 2 4 
3   5
4   1
5   6
6   1 1
7   6
8   6 7 7 7 6 6 8 8 8
9   6 8 7 9 5 2 1 44 6 55
10  6 8 7 9 5 2 1 44 6 55 4 4 4 4

filter rows with len equal to 6

df=df[df['a'].str.len()==6]

then split them using split() method like this

df['a'].str.split(" ", expand = True)

output:

    0   1   2   3
2   3   2   4   

EDIT:

for having trouble with memory while reading a large file you can refer to this SO post OR read the file in chunks and append/save the output in new file

reader = pd.read_csv(filePath,chunksize=1000000,low_memory=False,header=0)
M_S_N
  • 2,764
  • 1
  • 17
  • 38
  • chunksize keyword of read_excel is not implemented what is the issue now> –  Jan 27 '20 at 05:09
  • refer to this SO post https://stackoverflow.com/questions/45345506/chunksize-keyword-of-read-excel-is-not-implemented – M_S_N Jan 28 '20 at 07:06