Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
3
votes
2 answers

Error writing a file to Azure Data Lake from an Azure function using MSI

I am trying to create an Azure function that writes to Azure Data Lake Store. I am using Managed Service Identity to, well, manage the authentication stuff. I have enabled MSI on the Function app. I have also enabled the Function app to access the…
MV23
  • 285
  • 5
  • 17
3
votes
3 answers

Loading millions of small files from Azure Data Lake Store to Data Bricks

I've got a partitioned folder structure in the Azure Data Lake Store containing roughly 6 million json files (size couple of kb's to 2 mb). I'm trying to extract some fields from these files using Python code in Data Bricks. Currently I'm trying the…
Simon Zeinstra
  • 795
  • 8
  • 19
3
votes
1 answer

Column Name also appear as a row when querying the external table for specific column

I have a file in azure data lake store. I am using polybase to move data from data lake store to data warehouse. I followed all the steps which are mentioned here. Let's say I have created external table as External_Emp which has 3 columns : ID,…
3
votes
1 answer

Access Azure Data Lake Analytics Tables from SQL Server Polybase

I need to export a multi terabyte dataset processed via Azure Data Lake Analytics(ADLA) onto a SQL Server database. Based on my research so far, I know that I can write the result of (ADLA) output to a Data Lake store or WASB using built-in…
3
votes
2 answers

Is there any need of Data Warehouse when using Azure Data Lake?

I am exploring Azure Data Lake and I am new to this field. I explored many things and read many articles. Basically I have to develop Power BI dashboard from data of different sources. In classic SQL Server stack I can write an ETL (Extract,…
Waqas Idrees
  • 1,443
  • 2
  • 17
  • 36
3
votes
3 answers

Azure Data Lake Loop

Does Azure Data Lake Analytics and U-SQL support use While or For to Loop and create multiple outputs? I want output to multiple files using one USQL execution. This is what i want: Foreach @day in @days @dataToSave = SELECT day AS…
Jorge Ribeiro
  • 1,128
  • 7
  • 17
3
votes
2 answers

u-sql: filtering out empty// Null strings (microsoft academic graph)

I am new to u-sql of azure datalake analytics. I want to do what I think is a very simple operations but ran into trouble. Basically: I want to create a query which ignore empty string. using it in select works, but not in WHERE statement. Below…
user1043144
  • 2,680
  • 5
  • 29
  • 45
3
votes
1 answer

Azure Data Factory Pipeline + ML

I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. I implemented it using a blob storage as input and output and it worked. However, I am not trying to change the input and output to a folder in…
Ziad Halabi
  • 964
  • 11
  • 31
3
votes
2 answers

Azure Data Lake - HDInsight vs Data Warehouse

I'm in a position where we're reading from our Azure Data Lake using external tables in Azure Data Warehouse. This enables us to read from the data lake, using well known SQL. However, another option is using Data Lake Analytics, or some variation…
MMartin
  • 172
  • 2
  • 10
3
votes
4 answers

Azure Data Lake Store - AccessControlException as Owner

Under the Access blade in the portal it shows that I am the Owner but it also says Your Permissions me@domain.onmicrosoft.com's effective permissons on this folder are: None In AAD I can see that that me@domain.onmicrosoft.com is associated with…
AMZ
  • 540
  • 4
  • 15
3
votes
2 answers

How can I log something in USQL UDO?

I have custom extractor, and I'm trying to log some messages from it. I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in…
arghtype
  • 4,376
  • 11
  • 45
  • 60
3
votes
1 answer

Upload to ADLS from file stream

I am making a custom activity in ADF, which involves reading multiple files from Azure Storage Blobs, doing some work on them, and then finally writing a resulting file to the Azure Data Lake Store. Last step is where I stop, because as far as I…
Anders
  • 894
  • 2
  • 10
  • 25
3
votes
3 answers

U-SQL Paralell reading from SQL Table

I have a scenario in which I am ingesting data from a MS SQL DB into Azure Data Lake using U-SQL. My table is quite big, with over 16 millions records (soon it will be much more). I just do a SELECT a, b, c FROM dbo.myTable; I realized, however,…
candidson
  • 516
  • 3
  • 18
3
votes
1 answer

Value too long failure when attempting to convert column data

Scenario I have a source file that contains blocks of JSON on each new line. I then have a simple U-SQL extract as follows where [RawString] represents each new line in the file and the [FileName] is defined as a variable from the @SourceFile…
Paul Andrew
  • 3,233
  • 2
  • 17
  • 37
3
votes
1 answer

Cannot access Azure Key Vault from Azure Data Lake Analytics

I have an U-SQL script with custom extractor, which access Azure Key Vault to get some credentials. I followed this tutorial. And I have equivalent code to get token from AD and then to call provided URI for actual credentials: public static async…
arghtype
  • 4,376
  • 11
  • 45
  • 60