i am trying to use Beam to read a csv and send data to postgres.
But the pipeline is failing due to a conversion mismatch. note that this pipeline work when the 2 column are of type int and fail when the type of column contains a string.
here one of the things that i tried.
from past.builtins import unicode
ExampleRow = typing.NamedTuple('ExampleRow',[('id',int),('name',unicode)])
beam_df = (pipeline | 'Read CSV' >> beam.dataframe.io.read_csv('path.csv').with_output_types(ExampleRow))
beam_df2 = (convert.to_pcollection(beam_df) | beam.Map(print) |
WriteToJdbc(
table_name=table_name,
jdbc_url=jdbc_url,
driver_class_name = 'org.postgresql.Driver',
statement="insert into tablr values(?,?);",
username=username,
password=password,
)
)
result = pipeline.run()
result.wait_until_finish()
I tried also to add an urn to convert the str python type to varchar or unicode but this don't seems to work also
from apache_beam.typehints.schemas import LogicalType
@LogicalType.register_logical_type
class db_str(LogicalType):
@classmethod
def urn(cls):
return "beam:logical_type:javasdk:v1"
@classmethod
def language_type(cls):
return unicode
def to_language_type(self, value):
return unicode(value)
def to_representation_type(self, value):
return unicode(value)
ADD: this is the print result :
BeamSchema_f0d95d64_95c7_43ba_8a04_ac6a0b7352d9(id=21, nom='nom21')
BeamSchema_f0d95d64_95c7_43ba_8a04_ac6a0b7352d9(id=22, nom='nom22')
BeamSchema_f0d95d64_95c7_43ba_8a04_ac6a0b7352d9(id=21, nom='nom21')
BeamSchema_f0d95d64_95c7_43ba_8a04_ac6a0b7352d9(id=22, nom='nom22')
the problem comes from the WriteToJdbc function and the 'nom' column.
any idea how to make this work ?