Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
4
votes
1 answer

Unable to install party package ( R) in ADLA

I am trying to install 'party' package in ADLA. We have…
Arpit Sisodia
  • 570
  • 5
  • 18
4
votes
3 answers

Is it possible to mount Azure Data Lake Store or Azure Blob Storage as a drive on a Windows or Linux VM

My task is to migrate our data store which is currently located on a network drive to Azure Data Lake Store or Blob Storage, as well as to migrate the ingestion and postprocessing software. If I can mount Azure Data Lake Store or Blob Storage as a…
4
votes
3 answers

Processing Event Hub Capture AVRO files with Azure Data Lake Analytics

I'm attempting to extract data from AVRO files produced by Event Hub Capture. In most cases this works flawlessly. But certain files are causing me problems. When I run the following U-SQL job, I get the error: USE DATABASE Metrics; USE SCHEMA…
Marc Jellinek
  • 538
  • 5
  • 19
4
votes
2 answers

USQL Nesting TVFs and Queries is giving horrendous results

I 'think' that this problem is relating to the query optimization that Azure Data Lake Analytics does; but let's see... I have 2 separate queries (TVFs) doing aggregations, and then a final Query to join the 2 together for final results. So…
SimonB
  • 962
  • 1
  • 14
  • 36
4
votes
1 answer

USQL Job failing due to exceeding the path length limit

I am running my jobs locally using the Local SDK. However, I get the following error message: Error : 'System.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qualified file name must be less than 260…
Moiz Sajid
  • 644
  • 1
  • 10
  • 20
4
votes
3 answers

CSV to AVRO conversion in Azure

I am trying to convert csv files stored in azure data lake store into avro files with created scheme. Is there any kind of example source code which has same purpose?
emkay
  • 187
  • 12
4
votes
0 answers

What is wrong when azure.datalake.store commands gives LISTERROR response?

Hi I am trying to access my Data Lake Store from a python program locally from my desktop. I get this strange error while executing the line adl.ls('/') The output is Traceback (most recent call last): File "C:\Users\StefanFrost\adltest.py",…
Stefan Frost
  • 51
  • 2
  • 6
4
votes
2 answers

Azure Data Lake Store and Azure SQL with WebJob/Azure Function

I need to upload WEB API response files into Azure Data Lake. Then I have to dump those files into Azure SQL tables. Above both processes must be scheduled to execute on hourly basis. Should I use Azure Web Jobs or Azure Function.
AnshuK
  • 55
  • 1
  • 2
4
votes
2 answers

Can i have any books about Azure Data Lake Internals?

I dont wanna use the ADL and ADLA as a black box. I need to understand how the gears rotate underhood to use it in an efficient way. Where i can find an information that describe internals: how U-SQL query is processed how parallelism is…
churupaha
  • 325
  • 2
  • 10
4
votes
1 answer

Is there any way to minimize U-SQL preparation time?

The preparation time on my U-SQL job is approximately 30 seconds. Is it possible to lower that at all? My code is as follows: USE DATABASE x; USE SCHEMA y; @results = SELECT RowKey FROM y.tableName WHERE…
Justin Borromeo
  • 1,201
  • 3
  • 13
  • 26
4
votes
2 answers

How to use subquery in USQL?

I am getting complilation error while using follwowing query in u-sql: @CourseDataExcludingUpdatedCourse = SELECT * FROM @CourseData AS cd WHERE cd.CourseID NOT IN (SELECT CourseID FROM @UpdatedCourseData); It is not allowing me to use NOT IN…
Jai
  • 416
  • 6
  • 20
4
votes
2 answers

U-SQL build error, equijoin have different types

I'm trying to create a USQL job and defined my columns from the CSVs they will be retrieved from, however I'm always having issues on the JOIN portion, because the columns I am matching are of a different type. This is weird because I have defined…
AnimaSola
  • 7,146
  • 14
  • 43
  • 62
4
votes
2 answers

Authorizing Data Lake linked services through Visual Studio Data Factory Project

I have an Azure Data Factory Visual Studio Project in which I am using Azure Data Lake linked services. When I create them, I have to authorize them initially. But the given authorization expires after a certain time period, which is in days. I…
Tayyab Anwar
  • 319
  • 1
  • 10
4
votes
3 answers

Unit testing for usql applier and scripts

I have a custom USql applier which extends the IApplier class. [SqlUserDefinedApplier] public class CsvApplier : IApplier { public CsvApplier() { //totalcount = count; } public override IEnumerable Apply(IRow input,…
4
votes
1 answer

Azure Data Lake Store concurrency

I've been toying with Azure Data Lake Store and in the documentation Microsoft claims that the system is optimized for low-latency small writes to files. Testing it out I tried to perform a big amount of writes on parallel tasks to a single file,…
evilpilaf
  • 1,991
  • 2
  • 21
  • 38