I have trained and saved a model with doc2vec in colab as
model = gensim.models.Doc2Vec(vector_size=size_of_vector, window=10, min_count=5, workers=16,alpha=0.025, min_alpha=0.025, epochs=40)
model.build_vocab(allXs)
model.train(allXs, epochs=model.epochs, total_examples=model.corpus_count)
The model is saved in a folder not accessible from my drive but which I can see as:
from os import listdir
from os.path import isfile, getsize
from operator import itemgetter
files = [(f, getsize(f)) for f in listdir('.') if isfile(f)]
files.sort(key=itemgetter(1), reverse=True)
for f, size in files:
print ('{} {}'.format(size, f))
print ('({} files {} total size)'.format(len(files), sum(f[1] for f in files)))
The output is:
79434928 Model_after_train.docvecs.vectors_docs.npy
9155086 Model_after_train
1024 .rnd
(3 files 88591038 total size)
To move the two files in the same shared directory as the notebook
folder_id = FolderID
for f, size in files:
if 'our_first_lda' in f:
file = drive.CreateFile({'parents':[{u'id': folder_id}]})
file.SetContentFile(f)
file.Upload()
The problem that I am facing now are two: 1) gensim creates two files when saving the model. Which one should I load?
2) when I try to load a file or the other with:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
file_id = FileID
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
_, done = downloader.next_chunk()
model = doc2vec.Doc2Vec.load(downloaded.read())
I am not able to load the model getting the error:
TypeError: file() argument 1 must be encoded string without null bytes, not str
Any suggestion?