1

I'm trying to insert rows into a table after changing its schema in Cassandra with the CQLEngine python library. Before the change, the model looked like:

class MetricsByDevice(Model):
    device = columns.Text(primary_key=True, partition_key=True)
    datetime = columns.DateTime(primary_key=True, clustering_order="DESC")

    load_power = columns.Double()
    inverter_power = columns.Double()

I've changed the schema to this, adding four columns (DSO, node, park and commercializer):

class MetricsByDevice(Model):
    device = columns.Text(primary_key=True, partition_key=True)
    datetime = columns.DateTime(primary_key=True, clustering_order="DESC")

    DSO = columns.Text(index=True, default='DSO_1'),
    node = columns.Text(index=True, default='Node_1'),
    park = columns.Integer(index=True, default=6),
    commercializer = columns.Text(index=True, default='Commercializer_1'),

    load_power = columns.Double()
    inverter_power = columns.Double()

Then, I've synced the table with a script containing the line

sync_table(MetricsByDate)

I've checked the database and the four columns have been created. The existing rows has these fields with value NULL (as expected).

Then I've modified the script in charge of inserting in batch rows including the values corresponding to the new fields. It looks like:

        batch = BatchQuery()
        for idx, message in enumerate(consumer):

            data = message.value
            ts_to_insert = dateutil.parser.parse(data['timestamp'])

            filters = get_filters(message.partition_key)

            MetricsByDate.batch(batch).create(
                device=device,
                date=str(ts_to_insert.date()),
                time=str(ts_to_insert.time()),
                created_at=now,
                DSO=str(filters['DSO']),
                node=str(filters['node']),
                park=int(filters['park']),
                commercializer=str(filters['commercializer']),
                load_power=data['loadPower'],
                inverter_power=data['inverterPower'],
            )

            if idx % 100 == 0: # Insert every 100 messages

                batch.execute()

                # Reset batch
                batch = BatchQuery()

I've already checked that the values corresponding to the new fields aren't None and have the correct type. Nevertheless, it's inserting all the row correctly but the values in the new fields, that are NULL in Cassandra.

The batch insertion does not return any errors. I don't know if I'm missing something, or if I need to do an extra step to update the schema. I've been looking in the docs, but I can't find anything that helps.

Is there anything I'm doing wrong?

EDIT

After Alex Ott suggestion, I've inserted the lines one by one. Changing the code to:

for idx, message in enumerate(consumer):

            data = message.value
            ts_to_insert = dateutil.parser.parse(data['timestamp'])

            filters = get_filters(message.partition_key)

            metrics_by_date = MetricsByDate(
                device=device,
                date=str(ts_to_insert.date()),
                time=str(ts_to_insert.time()),
                created_at=now,
                DSO=str(filters['DSO']),
                node=str(filters['node']),
                park=int(filters['park']),
                commercializer=str(filters['commercializer']),
                load_power=data['loadPower'],
                inverter_power=data['inverterPower'],
            )

            metrics_by_date.save()

If before executing the line metrics_by_date.save() I add these print statements:

print(metrics_by_date.DSO)
print(metrics_by_date.park)
print(metrics_by_date.load_power)
print(metrics_by_date.device)
print(metrics_by_date.date)

The output is:

(<cassandra.cqlengine.columns.Text object at 0x7ff0b492a670>,)
(<cassandra.cqlengine.columns.Integer object at 0x7ff0b492d190>,)
256.99
SQ3-3.2.3.1-70-17444
2020-04-22

In the fields that are new I'm getting a cassandra object, but in the others I get their values. It maybe is a clue, because it continues to insert NULL in the new column.

ainsausti
  • 705
  • 1
  • 7
  • 25
  • 2
    One comment - don't use batches if you're inserting data for multiple devices - your execution will be much slower & put more load onto the nodes - batches are different in Cassandra than in RDBMS – Alex Ott Apr 22 '20 at 11:33
  • Thank you for the suggestion @AlexOtt, I'll try to change to insert the rows without batches. However, I think the null values will still be a problem. – ainsausti Apr 22 '20 at 12:59

1 Answers1

2

Finally I got It.

It was something stupid, in the model definition, for not knwon reasons, I've added commas to separate fields instead of linebreaks... So correcting the model definition to:

class MetricsByDevice(Model):
    device = columns.Text(primary_key=True, partition_key=True)
    datetime = columns.DateTime(primary_key=True, clustering_order="DESC")

    DSO = columns.Text(index=True, default='DSO_1')
    node = columns.Text(index=True, default='Node_1')
    park = columns.Integer(index=True, default=6)
    commercializer = columns.Text(index=True, default='Commercializer_1')

    load_power = columns.Double()
    inverter_power = columns.Double()

It works!!

ainsausti
  • 705
  • 1
  • 7
  • 25