0

I am trying to use Polybase in Azure SQL Data Warehouse (SQLDW) to ingest data (persisted in Parquet format from a Hadoop cluster in a VNET) on Azure Data Lake Store (ADLS) Gen 1. The process is working fine but the throughput I am getting is quite poor i.e. approximately 10MBps. My assumption is that the traffic is going via Internet and not via Azure backbone network. To address this, I've enabled VNET service end-point as follows: VNET to ADLS (as per this link) VNET to Azure SQL Data Warehouse (as per this link)

However, even after doing so, there is no performance gain. My understanding is that after enabling this, the traffic should go through Azure backbone network but there is no difference. Am I missing anything in this whole workflow?

rh979
  • 657
  • 1
  • 5
  • 13
  • Just to add on that: I've tried this with low (500) and high (2000) DWUs and the throughput is still below expectations. – rh979 Nov 22 '18 at 06:27
  • Irfan, is this for the Australian customer with whom I think you've been engaged for the last year? I've asked the customer's MS team to engage, as customer-specific network issues may be at play. – Ron Dunn Nov 22 '18 at 22:50
  • The issue got sorted out. It appeared that the Parquet data was compressed (using SNAPPY though in Impala it showed uncompressed). After factoring in that, we got around 130+MBps which is relatively better. – rh979 Dec 17 '18 at 00:56

0 Answers0