1

I am working with RapidMiner at the moment and am trying to copy my RapidMiner results which are in xlsx files to txt files in order to do some further processing with python. I do have plain text in column A (A1-A1500) as well as the according filename in column C (C1-C1500). Now my question: Is there any possibility (I am thinking of the xlrd module) to read the content of every cell in column A and print this to a new created txt file with the filename being given in corresponding column C?

As I have never worked with the xlrd module before I am a bit lost at the moment...

Florian Schramm
  • 333
  • 3
  • 15

3 Answers3

4

I can recommend openpyxl for every tasks concerning .xlsx handling.

For your requirements:

from openpyxl import *
import os    

p = 'path/to/the/folder/with/your/.xlsx'
files = [_ for _ in os.listdir(p) if _.endswith('.xlsx')]

for f in files:

     wb = load_workbook(os.path.join(p, f))
     ws = wb['name_of_sheet']
     for row in ws.rows:
         with open(row[2].value+'.txt', 'w') as outfile:
              outfile.write(row[0].value)
corinna
  • 629
  • 7
  • 18
  • Thanks for your help. This works just perfect! Perhaps one more thing. Assuming I have a directory containing multiple excel files, all with the same structure, is there any chance (using os.chdir and a for loop) to iterate over all excel files and redirect my output files to a specific directory that will contain all my txt files in the end? – Florian Schramm Sep 16 '16 at 08:46
  • Thanks for editing your post, now iteration works perfect. I only appended os.chdir(r"F:\Results") to get my output into the same directory as my Excel files. – Florian Schramm Sep 16 '16 at 09:22
0

Good day! So, I'm not sure I understand your question correctly, but have you tried a combination of Read Excel operator with the Loop Examples operator? Your loop subprocess could then use Write CSV operator or similar.

Guy Davis
  • 128
  • 5
  • Good idea. Haven't thought of doing this task in RM. But it is possible assigning the outout files the same name as the input files? – Florian Schramm Sep 16 '16 at 07:15
0

Thanks to @corinna the final code is:

from openpyxl import *
import os


p = r'F:\Results'
files = [_ for _ in os.listdir(p) if _ .endswith('.xlsx')]
os.chdir(r"F:\Results")
for f in files:

    file_location = load_workbook(os.path.join(p, f))
    sheet = file_location['Normal']
    for row in sheet.rows:
        with open(row[2].value + '.txt', "w") as outfile:
            outfile.write(row[0].value)
Florian Schramm
  • 333
  • 3
  • 15