I am using Google Colab to do the following tasks, but it just didn't work. My scripts worked well when I tested on small folders having fewer than 10 files; however, they didn't work for larger files having thousands of files. On a side note note, I can't tell the size of my folders because Google Drive doesn't have such option.
Hope to know the reason why and how to fix it. Thank you so much!
Task #1: Moving all json files from one folder to another folder on Google Drive. When I tested on smaller folders. All files are moved as expected. However, when used on "real folders" having much larger size, it looked as if it worked. No timeout. But when I looked at the folders on Google Drive, the files were still there. Nothing changed.
source = glob.glob('/path_to_source_folder/*.json')
destination = '/path_to_destination_folder/'
for json_file in source:
id = os.path.basename(json_file)
file = '/path_to_destination_folder/{}'.format(id)
if os.path.exists(file):
print('The file {} already exists'.format(id))
os.remove(json_file)
else:
shutil.move(json_file, destination)
Task #2: Obtain statistics info of folders and json files. I tested on smaller folders and it worked well. Side note: the json files on smaller folders have the same structure with json files on larger files. When it comes to larger folders, it didn't timeout. It resulted in "0". Like "0 users", "0 posts", etc. These are definitely wrong.
files = glob.glob('/path_to_reference_folder/*.json')
total_users = 0
not_empty_users = 0
total_posts_by_users = []
for file in files:
total_users += 1
with open(file, 'r') as f:
tmp = f.readlines()
if len(tmp) > 0:
not_empty_users += 1
total_posts_by_users.append(len(tmp))
print("total {} users".format(total_users))
print("total {} posts by users".format(np.sum(total_posts_by_users)))
print("total {} users not empty".format(not_empty_users))
print("total {} average posts per users".format(np.mean(total_posts_by_users)))
Notes: Early steps - Mounting Drive and importing libraries
# Mounting Drive
from google.colab import drive
drive_mounting = drive.mount('/content/drive')
# Importing libraries
import numpy as np
import os
import glob
import json
import shut