3

I have a project that uses Google Dataflow. I have been successfully using the following command (and commands like it) for months to deploy templates.

.venv/bin/python -m dataflow.registry_files.delimited_file --runner=DataflowRunner --region=us-central1 --project=myproject --staging_location=gs://mybucket-staging/staging/gr265 --template_location=gs://mybucket-code/templates/gr265 --temp_location=gs://mybucket-staging/temp/gr265 --no_use_public_ips --save_main_session --setup_file=dataflow/setup.py --projectId=myproject --datasetId=padl_staging --tableId=gr265 --configFile=gs://mybucket-code/registry/registry_files.yaml --fileType=gr265

This command continues to work on windows 10 and Debian machines in my team.

Since I upgraded to catalina (10.15.1, with python 3.7.5 and apache-beam==2.16.0 ) I get the following error:

[libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already exists in database: 
[libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
libc++abi.dylib: terminating with uncaught exception of type google::protobuf::FatalException: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
Abort trap: 6

I have done all of the following, with many reboots:

  1. Run xcode-select --install

  2. Run brew update-reset, brew update, brew upgrade, and brew reinstall python all to no effect (except after brew update-reset, brew doctor works again)

  3. Run brew uninstall protobuf and brew install protobuf

  4. Run pip3 uninstall protobuf outside of the virtual environments

  5. Deleted and re-created my virtual environments from their requirements files.

  6. Several bits of voodoo involving /usr/local/include that I located elsewhere on stack overflow that didn't help.

I wondered if this was just my machine, and unfortunately was able to reproduce it on the other macOS Catalina laptop in my team, but not the laptop still running macOS Mojave.

Steven Ensslen
  • 1,164
  • 9
  • 21
  • Just to clarify, do you only get this only when trying to deploy templates or do you also get this when trying to run Dataflow pipelines directly ? – chamikara Dec 03 '19 at 21:36
  • @chamikara That's a really good question. I had not tried. However, the error is the same when I try to just run a dataflow using the `DirectRunner` or the `DataflowRunner` – Steven Ensslen Dec 03 '19 at 23:46
  • Unfortunately I don't have a MacOS Catalina machine to check this. But can this be some kind if a linking issue in your local setup ? For example, https://github.com/tensorflow/tensorflow/issues/32015 – chamikara Dec 04 '19 at 18:48
  • I had almost the same issue. In mine case I've an intermediate library which was linked to the protobuf via the `find_package`, when mine libray via the `add_subdirectory`, which triggered the protobuf library rebuild. You must ensure you are linking versus the same library version. Additionally, you should link directly to the cmake target: `protobuf::libprotobuf` instead of a path to the library file. That will add all required dependent paths and definitions like `PROTOBUF_USE_DLLS` (related to another issue with `google::protobuf::internal::ArenaStringPtr::GetNoArena already defined`). – Andry May 13 '20 at 15:08

2 Answers2

4

According with the Apache Beam Issue 8368 this issue is related with a pyarrow version, you have to test with pyarrow 0.15.1 beam since is the one which works on MacOS 10.15 as is mentioned in this link.

Please let us know how it works.

Enrique Zetina
  • 825
  • 5
  • 16
0

I just leave it here, because it is a common problem has not resolved by maintainers and I didn't found the closest convenient solution in the time.

https://github.com/protocolbuffers/protobuf/issues/1941

https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=721791;filename=protobuf-2.4.1-3.1.debdiff;msg=5

The patch actually quite does not help, because in the next step the protobuf does hang in the Run function and call does not return.

The closest solution for myself was to completely avoid of any double linkage with the protobuf at any cost.

Andry
  • 2,273
  • 29
  • 28