0

I'm having a rough time in trying to construct a dataset for Named Entity Recognition in Google NLP API, via this script provided by Google input_helper_v2.py

The problem comes with the function _DownloadGcsFile, as it throws this error:

gsutil_cp_cmd = ' '.join(['gsutil', 'cp', gcs_file, local_filename])
TypeError: sequence item 2: expected str instance, bytes found

I've tried to put b' '.join(['gsutil', 'cp', gcs_file, local_filename]), but it yields to similar problems.

In searching for information, I noticed that it could be the script being developed in python 2.7 what is causing this.

I'll appreciate any help, as I'm a complete beginner. Thank you so much.

Daniel Mejia
  • 115
  • 4
  • How do you call this provided scipt? looks like variable `gcs_file` for some case has binary format, so it's interesting how do you call it? – ujlbu4 Sep 16 '20 at 23:11
  • Thanks for replying. Well, I'm running this script in a virtual machine from Google App Engine. Maybe I'm mistaken, but is that what you asked? If not, I'm willing to clarify. – Daniel Mejia Sep 16 '20 at 23:20
  • When you are running this script in Google App Engine what command do you use to run this script, for example: `python input_helper_v2.py ....` ? Could you check is python2 is available on your virtual machine (call like: `python2 --version`)? – ujlbu4 Sep 16 '20 at 23:31
  • I'm using this `python3 input_helper_v2.py gs:// -t gs:///output` and no, the VM has preinstalled python 3. I've tried to install python 2.x without success. – Daniel Mejia Sep 16 '20 at 23:47
  • Yep, highly likely problem is that you call it with python3 instead of python2. Take a look on `Prerequisites` in top of [script file](https://cloud.google.com/natural-language/automl/docs/scripts/input_helper_v2.py?hl=de), it requires `python2`. Is it available other virtual machines in google engine with preinstalled 2.x? – ujlbu4 Sep 16 '20 at 23:52

1 Answers1

1

Well it means that gcs_file has type bytes. So you need to make it a string (str) type. For example:

gsutil_cp_cmd = ' '.join(['gsutil', 'cp', gcs_file.decode('utf-8'), local_filename])
pael
  • 81
  • 7
  • That solved the issue. But it further displays this error: File "/usr/lib/python3.7/genericpath.py", line 50, in getsize return os.stat(filename).st_size FileNotFoundError: [Errno 2] No such file or directory: "/tmp/tmp7anr1toh/1_b'wikiner_ancora_conll.train.txt'" Do you have any idea of what is causing this problem? Thank you in advance. – Daniel Mejia Sep 16 '20 at 23:28
  • That simply states it can't find given file. I assume you run "gsutil_cp_cmd"? Have a look how the command looks like, just print it. Anyway check if passed path is correct. It might have something to with quotation marks. Or, I see it looks up tmp directory so file you generated (?) might be already deleted. You just need to check if file exists when script is running. Post your code if that doesn't help – pael Sep 16 '20 at 23:45