Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
5
votes
5 answers

Read from ADLS gen 2 with SSIS

Does anyone know which connection and Data Flow Component to use for ADLS (Azure Data Lake Store) gen2? I've managed to use the blob connector in the connection manager and successfully connect to ADLS Gen2, but when I try to use the blob source…
JanKo
  • 77
  • 1
  • 5
5
votes
0 answers

Azure DataLake with DVC

We are thinking to use DVC for versioning input data for DataScience project. my data resides in Azure DataLake Gen1. how do i configure DVC to push data to Azure DataLake using Service Principal? i want DVC to store cache and data into Azure…
Radhi
  • 6,289
  • 15
  • 47
  • 68
5
votes
2 answers

How to run PowerShell from Azure Data Factory

I have PowerShell script which splits a complex CSV file to a smaller CSV file for every 1000 records. Here is the code: $i=0;Get-Content C:\Users\dell\Desktop\Powershell\Input\bigsizeFile.csv -ReadCount 1000 | %{$i++; $_ | Out-File…
5
votes
1 answer

convert Csv file into Xml on Azure data lake store using Azure Powershell runbook

I want to convert CSV file into XML on Azure data lake store using Azure Powershell. I was using this code on Runbook of azure automation and it worked fine but No XML is being generated $cred = Get-AutomationPSCredential…
5
votes
0 answers

Error: Could not find ADLA Account in any resource group - DataLakeStoreGen1

I am trying to check whether the DataLake Analytics account State is active or not with the below power shell script through Service Principal authentication. Application is given the access to datalake analytics account. And it is present in one of…
ravi kiran
  • 371
  • 1
  • 5
  • 17
5
votes
3 answers

Folder Statistics in Azure Data Lake

I'm trying to summarize how much data has been written to a folder in my Data Lake. What is the best way to do this? Should I use a U-SQL job? HDInsights?
BadRaabutation
  • 113
  • 1
  • 3
  • 10
5
votes
3 answers

What is the point of a table in a data lake?

I thought the whole point of using a Data Lake versus a Data Warehouse was to invert the ETL (Extract, Transform, Load) process to LET (Load, Extract, Transform). Doesn't extracting this data, transforming and loading it into a table get us right…
Chris B. Behrens
  • 6,255
  • 8
  • 45
  • 71
5
votes
1 answer

Export Azure application Insight log files to Azure Data Lake storage

I am beginner of the azure portal , I configured the Azure Application insight in front-end side (Angular 2) and Back-end side (Asp.net core) I can track my application log file through azure application insight,and export the xls sheet also…
5
votes
3 answers

How to connect Azure Data lake storage to Azure ML?

Hi i am started to learning the azure data lake and azure machine learning ,i need to use the azure data lake storage as a azure machine learning studio input data .There have a any options are there, i gone through the azure data lake and machine…
5
votes
1 answer

Most efficient way to access binary files on ADLS from worker node in PySpark?

I have deployed an Azure HDInsight cluster with rwx permissions for all directories on the Azure Data Lake Store that also serves as its storage account. On the head node, I can load e.g. image data from the ADLS with a command like: my_rdd =…
mewahl
  • 795
  • 6
  • 20
5
votes
1 answer

U-SQL - Extract data from json-array

Already tried the suggested JSONPath option, but it seems the JSONExtractor only recognizes root level. In my case I have to deal with a nested json-structure, with an array as well (see example below). Any options for extracting this without…
Sander
  • 51
  • 1
  • 2
5
votes
1 answer

What is the maximum allowed size for String in U-SQL?

while processing a CSV file, I am getting an error about maximum string size. "String size exceeds the maximum allowed size".
Ahsan Abbas
  • 155
  • 7
5
votes
2 answers

Azure Data Lake Store Benchmarks

To Developers, I am doing benchmarks for Azure Data Lake and I am seeing about ~7.5 MB/S for a read of an ADL Store and a write to a VHD all in the same region. This is the case for PowerShell and C# with the code taken from the following…
4
votes
1 answer

Azure Data Lake Gen2 - How do I move files from folder to another folder using C#

I have provisioned Datalake gen2 and in C# I am looking for an option on how to move a file from one folder to another folder? With blob storage it's simple but with Datalake I am getting confused about which SDK to use and how it can be done in…
Raju Rh
  • 127
  • 1
  • 9
4
votes
1 answer

Use Azure Python Function and Managed Identity to Download from Storage Account

I've created an Azure Function called "transformerfunction" written in Python which should upload and download data to an Azure Data Lake / Storage. I've also turned on System assigned managed identity and gave the function the role permissions…