0

Can you please help on providing python script to capture count of records in a file that is on GCS. Im trying to connect from linux server to GCS Bucket and capture the count of records/size of file.

Prashanth
  • 109
  • 1
  • 5
  • What's a record for you? (line?) and what did you try until now? – guillaume blaquiere Mar 13 '22 at 14:34
  • Record means, number of lines in the file........I havent made any progress in creating the script, going through GCP and python commands to connect to GCS bucket.....If you have any thoughts , Please share.....Thanks for your response – Prashanth Mar 13 '22 at 17:13
  • This is the one, im looking into to get some idea and thoughts around implementing this requirement https://github.com/GoogleCloudPlatform/appengine-gcs-client/blob/HEAD/python/demo/main.py – Prashanth Mar 13 '22 at 17:22
  • 1
    Yes, read file and count the line. – guillaume blaquiere Mar 13 '22 at 17:50
  • Hi, have you tried guillaume's suggestion? also what is the file type from where you want to count the number of records? – Zeenath S N Mar 14 '22 at 10:22
  • Hello @ZeenathSN - Thats for read file only or count number of files, we need record count in a fie stored in a variable, it is a mainframe cobol file. – Prashanth Mar 14 '22 at 13:15
  • Please post a minimum reproducible code, see [How do I ask a good question](https://stackoverflow.com/help/how-to-ask)? – Gourav B Mar 15 '22 at 07:38

1 Answers1

0

I am using the following script and it’s working for me, I hope you can get an idea through this on how to do it.

import os
from flask import Flask
from google.cloud import storage
 
app = Flask(__name__)
 
 
storage_client = storage.Client()
file_data = 'file_name'
bucket_name = 'bucket_name'
temp_file_name = 'file_ma,e'
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.get_blob(file_data)
blob.download_to_filename(temp_file_name)
 
temp_str=''
with open (temp_file_name, "r") as myfile:
   for count, line in enumerate(myfile):
       pass
print('Total Lines', count + 1)
 
if __name__ == "__main__":
   app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))
 

I have first downloaded the file in my environment using download_to_filename() and then later I have read the file using open(). I have used enumerate() inside the for loop that adds a counter, you can read more about the enumerate here.

Zeenath S N
  • 1,100
  • 2
  • 8