0

when I am trying to insert the data from cosmos to Azure DWH , it is inserting well for most of the databases but for some it is giving some strange issues.

Later we found out that it is due to the size of the Cosmos DB document.

Like we have 75GB of size of one of our cosmos DB.

Then if we are trying to insert all the data in initial load , it gives Null Pointer error. But if we try to limit the rows say , first 3000 and then increment the count of records by 3000 then it is able to insert but it takes significant amount of time.

Also, this is our ACC data , we are not sure of our PRD data. and now for some of the DBs we need to set it to 50000 rows per load and for some we have set 3000(like for above example).

So to load the data iterative way is the only solution ? or is there any other way?

Also, how can we determine the incremental value to load in each iteration for new DBs to be added?

P.S. I also tried increasing DWUs and IR cores to maximum but no luck.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
  • When you say Azure DWH are you referring to Azure Synapse? If so, did you know you can use Synapse Link and have the data loaded into Synapse automatically? https://learn.microsoft.com/azure/cosmos-db/synapse-link – Mark Brown Nov 28 '22 at 18:41
  • Curious why you tagged [tag:azure-data-factory]. Also, it's unclear what your specific issue is. Are you being throttled (`429` errors)? Also, Cosmos DB doesn't have DWUs. Just RU/second. – David Makogon Nov 28 '22 at 21:31
  • Hi Mark, I know , we have first used Synapse link but it has limitations and it does not cater our needs. So , we also used change feed of cosmos but that also did not work due to some issues which we already discussed with Microsoft product team. but this is something more strange. If it is possible for you to have a short Teams call so that I can explain in detail rather than typing. – Shailendra Kad Nov 29 '22 at 21:09
  • Hi @DavidMakogon, I tagged Azure-data-factory , because I am using synapse Pipelines,dataflows. These are similar to using data factory pipelines in fact it is similar. SO, if someone form that team already faced the issue, then it could help me. I know DWU are for DWH and not for Cosmos. I set the DWU in Synapse SQL pool. – Shailendra Kad Dec 08 '22 at 08:30
  • Can you be more specific on the limitations you experienced and why it does not meet your needs? – Mark Brown Dec 08 '22 at 23:57
  • @MarkBrown, First of all there are limitations on the number of columns when synapse link is enabled. In our cosmosDB there ar emore than 1020 columns.Also these columns are nested , like array inside complex data type. or complex data type inside another complex data type. These levels are more than 6. So some data were not able to replicate. Other thing is I found it very slow when I query the data from synapse link enabled DB. What we need to achieve is , we need to create a proper DWH from our operatonal DB which is comsos. So even I enable synapse, I need to run pipelines on OLAP model ? – Shailendra Kad Dec 10 '22 at 05:45

0 Answers0