-1

I need to extract a list of xml files that are in a tar.gz file that I'm trying to read.

I tried this:

import os
from ftplib import FTP

def writeline(data):
    filedata.write(data)
    filedata.write(os.linesep)

ftp = FTP('ftp.my.domain.com')
ftp.login(user="username",passwd="password")
ftp.cwd('inner_folder')
filedata = open('mytargz.tar.gz', 'w')
ftp.retrlines('RETR %s' % ftp.nlst()[0], writeline)

I used ftp.nlst()[0] because I have a list of tar.gz files in my ftp. It looks like the data that I'm receiving in my writeline callback is some weird symbols, and than the filedata.write(data) is throwing an error: {UnicodeEncodeError}'charmap' codec can't encode character '\x8b' in position 1: character maps to <undefined>. I can really use some help here..

Hadas
  • 41
  • 1
  • 11
  • what you want is the [tarfile](https://docs.python.org/3/library/tarfile.html) notice that you can pass it a file object so read your ftp into a byteIO `io.BytesIO(my_bytes)` – Nullman May 12 '19 at 10:46
  • yes, I assumed It's wrong, but the ```data``` I'm receiving is a string of strange symbols, so I'm getting an error ```a bytes-like object is required, not 'str'```..it seems that something is wrong there, before I'm unzipping my file. – Hadas May 12 '19 at 10:47
  • nothing is wrong with the "string". you are receiving bytes, its not text, you cant print like it was text – Nullman May 12 '19 at 10:48
  • the ```io.BytesIO(data)``` is throwing the error ```a bytes-like object is required, not 'str'``` – Hadas May 12 '19 at 10:51
  • i see the issue, you use `retrlines` instead of `retrbinary` and when you open your file open it with `'wb'` for write binary and not `'w'` – Nullman May 12 '19 at 10:59
  • I did use ```retrlines``` and I tried to change the ```'w'``` to ```'wb'```.. still the same error – Hadas May 12 '19 at 11:02
  • use `retrbinary` – Nullman May 12 '19 at 11:03
  • ```io.BytesIO(data)``` is reading my bytes to the file? – Hadas May 12 '19 at 11:12

1 Answers1

2

I dont have an ftp server to try this with, but this should work:

import os
from ftplib import FTP

def writeline(data):
    filedata.write(data)

ftp = FTP('ftp.my.domain.com')
ftp.login(user="username",passwd="password")
ftp.cwd('inner_folder')
filedata = open('mytargz.tar.gz', 'wb')
ftp.retrbinary('RETR %s' % ftp.nlst()[0], writeline)



note that we open the file with write binary 'wb' and we ask the ftp to return binary and not text and that our callback function only write without adding seperators

Nullman
  • 4,179
  • 2
  • 14
  • 30
  • After doing all this, in the ```writeline``` method I added : ```tarfile.open('my_tar_gz.tar.gz', "r:gz").extractall("xmls")``` for extracting my files from the```my_tar_gz.tar.gz``` file. I saw that the file has data - a few xml files, do you know maybe what can be the reason for the ```tarfile.open``` command to throw ```ReadError: empty file```? – Hadas May 13 '19 at 07:01
  • how do you know the file has data? you opened it with an external program? – Nullman May 13 '19 at 07:28
  • if the error is coming from tarfile.open() then the only thing i can think of is that you are accidentally opening the wrong file. to the best of my knowledge that error will not pop up on a valid, non-empty file – Nullman May 13 '19 at 07:38