1

I am using snowflake PUT command from Python to move files from local system to snowflake staging.

I have 400 (40 MB each) files so I am using command like -> put file:///Path/file_name*

It is working and loading all the files but its taking around 30 mins.

I want to know the progress so that I can be sure its progressing, is there a way to print logs after each file is loaded ( file 1 is moved to staging, file 2 is moved to staging etc.)

PythonDeveloper
  • 289
  • 1
  • 4
  • 24
  • No way that I know of. SnowSQL has a little progress bar but it might go off of data as opposed to certain files loaded. – Suzy Lockwood Jun 23 '20 at 16:27

1 Answers1

3

Is there a way to print logs after each file is loaded?

While the statement execution is non-interactive when used from a library, the Snowflake python connector does support logging its execution work.

Here's a shortened snippet that incorporates the example from the link above:

# Assumes a 'con' object pre-exists and is connected to Snowflake already
import logging

for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(funcName)s() - %(message)s'))
    logger.addHandler(ch)

con.cursor().execute("put file:///Path/file_name* @stage_name")

# Optional, custom app log:
# logging.info("put command completed execution, exiting")

con.close()

Watching the output (to stderr) while this program runs will yield the following (filtered for just upload messages):

~> python3 your_logging_script.py 2>&1 | grep -F "upload_one_file()"
[…]
2020-06-24 04:57:06,495 - upload_one_file() - done: status=ResultStatus.UPLOADED, file=/Path/file_name1, (…)
2020-06-24 04:57:07,312 - upload_one_file() - done: status=ResultStatus.UPLOADED, file=/Path/file_name2, (…)
2020-06-24 04:57:09,121 - upload_one_file() - done: status=ResultStatus.UPLOADED, file=/Path/file_name3, (…)
[…]

You can also configure the python logger to use a file, and tail the file instead of relying on the stderr (from logging.StreamHandler) used for simplicity above.

If you need to filter the logging for only specific messages, the logging python module supports attaching your own filters that decide on each record emitted. The following filters for just the upload_one_file() function call messages (use record.message field to filter over the log message instead of on the function name used in example below):

class UploadFilter(logging.Filter):
    def filter(self, record):
        # Only tests one condition, but you could chain conditions here
        return "upload_one_file" in record.funcName.lower()

for logger_name in ['snowflake.connector', 'botocore', 'boto3']:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)

    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(funcName)s() - %(message)s'))

    ch.addFilter(UploadFilter())
    # ch.addFilter(AnyOtherFilterClass())

    logger.addHandler(ch)

Note: If you are changing your handlers (stream to file), ensure you add the filter to the actual new handler too, and the handler to the logger. You can read the tutorial on logging in Python to understand its mechanism better.

  • I tried using the above logger setting. Its giving me lot of logs, I wrote it in a file. Is there a way to filter this inside the Python code to write only specific logs? In this case I want to write logs which start with upload and ignore rest. 2020-06-25 12:41:42,678 - execute() - executing SQL/command 2020-06-25 12:41:42,678 - execute() - query: [put file://C:\Users\path\file.csv @%table] 2020-06-25 12:41:45,858 - upload_one_file() - compressing file – PythonDeveloper Jun 25 '20 at 18:17
  • Thanks Kirby. I tried using above code to filter message only by this function but still it gives me all the logs , for some reason filter is not working. – PythonDeveloper Jun 26 '20 at 02:44
  • this is perfect. One last question, how can I add more then one function in this filter? – PythonDeveloper Jun 26 '20 at 13:29
  • One more question Actually, I tried writing these logs to File but then it wrote all the logs to file. Filter did not work. Also, Is it possible to add my own comments in log file? I want to use logs generated by logging only for particular functions and for rest I want to write my own comments in logs. For example -> after a particular line I can say print('This command is starting now') is it possible? – PythonDeveloper Jul 01 '20 at 01:56