Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
3
votes
1 answer

What is clientId, authTokenEndpoint, clientKey for accessing Azure Data Lake?

I am writing a test application to read file from AzureData Lake. I have created the account and the resource, as well as uploading the file. I am trying to create a client using the following code (as described in the documentation…
erol yeniaras
  • 3,701
  • 2
  • 22
  • 40
3
votes
2 answers

Python PermissionError accessing Azure Datalake folder

I'm trying to upload files from a shared folder to an Azure Datalake gen 1 folder. For now, i am just testing the connection, and listing folders under the root directory: adlCreds = lib.auth(tenant_id = tenant_id, client_secret = client_secret,…
3
votes
1 answer

How can I efficiently prevent duplicated rows in my facts table?

I have built a Data Factory pipeline which ETL the data from a Data Lake into a Datawarehouse. I chose the SCD type 1 for my dimensions. My pipeline contains the following activities: [Stored Procedure] Clear staging tables; [Stored Procedure] Get…
Kzryzstof
  • 7,688
  • 10
  • 61
  • 108
3
votes
1 answer

Azure Data Lake Child Folder permissions using Powershell

I have Azure Data Lake Gen1 and I am using Powershell scripts to give the access permissions to the folders and the script works fine. With a change in the requirements, I have a few child folders created dynamically under the root folder in the…
Satya Azure
  • 459
  • 7
  • 22
3
votes
2 answers

Can I change the datatype of the Spark dataframe columns that is being loaded to SQL Server as a table?

I am trying to read a Parquet file from Azure Data Lake using the following Pyspark code. df= sqlContext.read.format("parquet") .option("header", "true") .option("inferSchema", "true") .load("adl://xyz/abc.parquet") df =…
sri sivani charan
  • 399
  • 1
  • 6
  • 21
3
votes
2 answers

How do I rename the file that was saved on a datalake in Azure

I tried to merge two files in a Datalake using scala in data bricks and saved it back to the Datalake using the following code: val df =sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema",…
sri sivani charan
  • 399
  • 1
  • 6
  • 21
3
votes
1 answer

Azure Data Factory throws 'Length Required" error on copy from SQL to ADLS

I am trying to copy data from on-prem SQL server to Azure Data Lake Storage (ADLS) via Azure Data Factory (ADF). Everything seems to work, except when I run (debug or trigger) the pipeline, I get the error: { "errorCode": "2200", …
3
votes
1 answer

Azure ADLS Gen2 not available

I am trying to create an Storage v2 Account with Data Lake Gen2 preview, but it is disabled from the Azure Wizard. As far as I have read it should be available for this setup?
John
  • 79
  • 8
3
votes
2 answers

Execute R inside U-SQL

I'm trying to use U-SQL and R to forecast, so i need to pass from U-SQL to R a list of values, and return forecast from R to U-SQL All examples i found uses a reducer, so will process 1 row…
Jorge Ribeiro
  • 1,128
  • 7
  • 17
3
votes
1 answer

How to trigger a pipeline in Azure Data Factory v2 or a Azure Databricks Notebook by a new file in Azure Data Lake Store gen1

I am using a Azure Data Lake Store gen1 for storing JSON files. Based on these files i have Notebooks in Azure Databricks for processing them. Now i want to trigger such a Azure Databricks Notebook when a new file is creating in Azure Data Lake…
STORM
  • 4,005
  • 11
  • 49
  • 98
3
votes
4 answers

U-sql referenced assembly built with 4.5.1? Whereas documented 4.5 is needed

It's documented that U-SQL uses .net 4.5 https://learn.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-programmability-guide#use-assembly-versioning So to ensure that our own custom assemblies are built to that runtime. When…
Alex KeySmith
  • 16,657
  • 11
  • 74
  • 152
3
votes
1 answer

Using U-SQL to eliminate duplicate and null values in one specific column while keeping a 2nd column properly aligned

I am trying to use U-SQL to remove duplicate, null,'',and Nan cells in a specific column called "Function" of a csv file. I also want to keep the Product column correctly aligned with the Function column after the blank rows are removed. So i would…
Royale_w_cheese
  • 297
  • 2
  • 9
3
votes
2 answers

Spark on HDInsights - No FileSystem for scheme: adl

I am writing an application that processes files from ADLS. When attempting to read the files from the cluster by running the code within spark-shell it has no problem accessing the files. However, when I attempt to sbt run the project on the…
Leyth G
  • 1,103
  • 2
  • 15
  • 38
3
votes
1 answer

Deep link to text file in Azure Data Lake Store

I am trying to quickly access text files via URL. The Azure portal (http://portal.azure.com) can (at best) link to the explore view of a specific folder, but I have not found any way to deep link into a specific file. I also tried Azure Storage…
aaronsteers
  • 2,277
  • 2
  • 21
  • 38
3
votes
1 answer

Data Lake Analytics U-SQL EXTRACT speed (Local vs Azure)

Been looking into using the Azure Data Lake Analytics functionality to try and manipulate some Gzip’d xml data I have stored within Azures Blob Storage but I’m running into an interesting issue. Essentially when using U-SQL locally to process 500…