Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
3
votes
1 answer

Can't we create Managed Table with Foreign key in Azure data lake

I was looking into documentation while creating tables but so far example didn't mentioned ,how to add foreign key while creating table. Also I check documentation for Alter statement too, but same goes with this too CREATE TABLE Brand ( …
Harsimranjeet Singh
  • 514
  • 2
  • 6
  • 19
3
votes
0 answers

Error while getting the OAuth token from AAD for AppPrincipalId

I am creating HDInsight using Data Lake Store using service principal via Template Deployment(shell script). While I run the deployment script I got the following error after the initial cluster creation of Spark HDInsight cluster. Error is : At…
sathya
  • 1,982
  • 1
  • 20
  • 37
3
votes
1 answer

U-SQL job performance

Could you help me with the job performance? I runned it with 10 AUs. And at first part of time they are used almost all. But from the second half of the execution time it uses only 1 AU. I see in the plan a one supervertex consists from only one…
churupaha
  • 325
  • 2
  • 10
3
votes
2 answers

U-SQL Split a CSV file to multiple files based on Distinct values in file

I have the Data in Azure Data Lake Store and I am processing the data present there with Azure Data Analytic Job with U-SQL. I have several CSV files which contain spatial data, similar to this: File_20170301.csv longtitude| lattitude | date …
FeodorG
  • 178
  • 2
  • 10
3
votes
2 answers

Metadata management for (Azure) data-lake

To my understanding, the data-lake solution is used for storing everything from raw-data in the original format to processed data. I have not able to understand the concept of metadata-management in the (Azure) data-lake though. What are…
3
votes
1 answer

U SQL: direct output to SQL DB

Is there a way to output U-SQL results directly to a SQL DB such as Azure SQL DB? Couldn't find much about that. Thanks!
candidson
  • 516
  • 3
  • 18
3
votes
1 answer

Is it possible to delete a completed job from Azure Data Lake Analytics?

I have a lot of completed jobs piling up, so I would like to clean them up. The answer to Should we delete DataLake Analytic Job after completion? seems to indicate that it's possible to delete jobs, but I am unable to figure out how to do this. I…
3
votes
2 answers

Does ROWCOUNT hint works for EXTRACT in U-SQL

I want to allocate more vertexes to the extraction job, tried using ROWCOUNT hint, it doesn't seem to work, no matter what value I use for ROWCOUNT, U-SQL always allocate the same number of vertexes. EXTRACT xxxx FROM @"Path" USING new…
lidong
  • 556
  • 1
  • 4
  • 20
3
votes
2 answers

Are we able to use Snappy-data to Update a record in Azure Data lake ? OR is Azure data lake append only?

I am currently working on azure data lake with snappy-data integration,I have a query on snappy-data are we able to update the data in the snappy-data to azure data lake storage, or we can append only on the azure data lake storage i searched in…
3
votes
1 answer

u-sql job is very slow, when i add a .NET call

The code performs very fast over 2000 small files (~10-50 Kb) ~ 1 min. Parallelizm = 5. @arenaData = EXTRACT col1, col2, col3 FROM @in USING Extractors.Tsv(quoting : true, skipFirstNRows : 1, nullEscape : "\\N",…
churupaha
  • 325
  • 2
  • 10
3
votes
1 answer

What do priority and parallelism value mean in Azure Data Lakes (Hadoop)?

In other words, what does a parallelism value of 5 and a priority value of 1000 mean?
Justin Borromeo
  • 1,201
  • 3
  • 13
  • 26
3
votes
1 answer

Consistency of Azure Data Lake Store

What is the consistency guarantees of Azure Data Lake Store? Has anyone found technical documentation describing it? I am in particular interested in whether directory moves are atomic, whether directory listings are consistent, and whether files…
3
votes
1 answer

Is it possible to use U-SQL managed tables as output datasets in Azure Data Factory?

I have a small ADF pipeline that copies a series of files from an Azure Storage Account to an Azure Data Lake account. As a final activity in the pipeline I want to run a U-SQL script that uses the copied files as inputs and outputs the result to a…
soderstromOlov
  • 384
  • 1
  • 5
  • 11
3
votes
2 answers

Config file for input and output folder location

I have multiple U-SQL scripts and I am using filename variable at the top of each U-SQL script. Is there any way we can define input and output folder to any config file and read that variable, constant or any thing to use them with Extract and…
Ajay
  • 783
  • 3
  • 16
  • 37
3
votes
1 answer

How do I partition a large file into files/directories using only U-SQL and certain fields in the file?

I have an extremely large CSV, where each row contains customer and store ids, along with transaction information. The current test file is around 40 GB (about 2 days worth), so partitioning is an absolute must for any reasonable return time on…
Travis Manning
  • 320
  • 1
  • 12