1

I am writing a Python script (for testing purposes) that, downloads an xml file from a directory, converts it into json, converts it back to xml and uploads it again to a different directory, as long as there is an xml file left in the source directory.

def start():
    now = str(datetime.now().strftime("%d%m%Y%H%M%S"))
    try:
        pysftp.Connection(HOST, username=AGENT, password=PW, private_key=".ppk", cnopts=CNOPTS) 
    except:
        print('Connection error')
        return
    xml_data = []
    new_xml = ''
    with pysftp.Connection(HOST, username=AGENT, password=PW, private_key=".ppk", cnopts=CNOPTS) as sftp:
        for filename in sftp.listdir(SOURCE_FOLDER):
            if fnmatch.fnmatch(filename, WILDCARD) and 'cancel' not in filename:
                doc_type = return_doc_type(filename)
                sftp.cwd(SOURCE_FOLDER)
                file_object = io.BytesIO()
                sftp.getfo(filename, file_object)
                xml_file = file_object.getvalue()
                new_xml = xmltodict.parse(xml_file) 
                if new_xml == '':
                    return 
                xml_data.append(new_xml)
                json_data = json.dumps(xml_data)
                new_xml_file = '<?xml version="1.0" encoding="utf-8" standalone="yes"?>' + dict2xml(json.loads(json_data))
                new_xml_file = indent(new_xml_file, indentation = '    ',newline = '\r\n')
                with pysftp.Connection(HOST, username=AGENT, password=PW, private_key=".ppk", cnopts=CNOPTS) as sftp2:
                    with sftp2.cd(DEST_FOLDER):  
                        with sftp2.open(f'test-{AGENT}-{doc_type}-{now}.xml', mode='w+', bufsize=32768) as f:
                            f.write(new_xml_file) 
                            print('xml file deployed on server: ', now, '\n')    
            file_count = len(sftp.listdir(SOURCE_FOLDER))
            if file_count > 3:   
                start()             
            else:
                print('no new files')
                return

The SOURCE_FOLDER is like 'somefolder/out/'.

I have tested it with one file and it works, but when I try to make it recursive, I get this error after the 2nd iteration:

Exception in thread django-main-thread:
Traceback (most recent call last):
  File ".../app/views.py", line 232, in start
    file_count = len(sftp.listdir(SOURCE_FOLDER))
  File ".../lib/python3.7/site-packages/pysftp/__init__.py", line 592, in listdir
    return sorted(self._sftp.listdir(remotepath))
  File ".../lib/python3.7/site-packages/paramiko/sftp_client.py", line 218, in listdir
    return [f.filename for f in self.listdir_attr(path)]
  File ".../lib/python3.7/site-packages/paramiko/sftp_client.py", line 239, in listdir_attr
    t, msg = self._request(CMD_OPENDIR, path)
  File ".../lib/python3.7/site-packages/paramiko/sftp_client.py", line 813, in _request
    return self._read_response(num)
  File ".../lib/python3.7/site-packages/paramiko/sftp_client.py", line 865, in _read_response
    self._convert_status(msg)
  File ".../lib/python3.7/site-packages/paramiko/sftp_client.py", line 894, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file

The original file is in the source directory, so I don't know what "No such file" is referring to.

Thank you for any suggestions

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Pypax
  • 83
  • 10

1 Answers1

2

Your SOURCE_FOLDER is a relative path somefolder/out/. So say, you start in /home/path. Then when you cwd to somefolder/out/, you end up in /home/path/somefolder/out/. If you then ls somefolder/out/, you are actually referring to /home/path/somefolder/out/somefolder/out/, what most probably is not what you want.

And that actually means that even your for filename loop cannot work. As you cwd to somefolder/out/ in every iteration, in the second one, it must fail, as it will try to cwd to /home/path/somefolder/out/somefolder/out/.

You better use absolute paths to avoid this mess.


Your code has other issues. For example:

  • I do not understand, why you open the second connection to the same host for the upload. You can use the connection you already have.

  • xmltodict.parse can take a file-like object. There's no need to waste memory by copying BytesIO to a string. And actually you do not even need to waste the memory with the BytesIO, you can use:

    with sftp.open(filename, bufsize=32768) as f:
        new_xml = xmltodict.parse(f)
    
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992