1

We have a scenario where we would send telemetry from thousands of machines (to Azure IoT Hub with Mqtt). An ideal storage place for the data would be Cosmos Db since the machines send their messages in json format. The message contains a lot of numeric data. The problem is that the keys in the json message are taking up a lot of storage since they are repeated in every message.

In our json message, the values are 150 bytes and envelope and keys are 450 bytes.

If we have 1000 machines x 5 hours/day x 21 days/month x 60 min x 60 sec x 600 B = 226800000 = 216 GB/month.

Is there anything that can be done to compress repeating data, other than abbreviating our key names?

Mathias Rönnlund
  • 4,078
  • 7
  • 43
  • 96
  • Are you concerned about indexing space? If so, you can exclude specific properties from indexing. Or are is it something else? – David Makogon Jan 27 '20 at 18:11

1 Answers1

2

It looks like compression is a feature request, but not in the product yet.. https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/19164487-compress-stored-data

How do you query this data and how often? It would be far more cost effective to offload some or all of it to Azure Data Lake Storage. You could have the "hot" data in CosmosDB and export to ADLS as the data ages.

Nate
  • 200
  • 5