Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
3
votes
1 answer

Slow Azure Data Factory Pipeline

I am using Azure Data Factory V2 to transfer some csv files from Azure Data Lake to Azure Synapse I have a loop to find all files in special folder on my DataLake. After i have a DataFlow to transfer data from staging to main table. In my for-each…
3
votes
0 answers

Automating On-Premise Tabular Model refresh with Azure DataLake Gen1 Connection

We are using Azure Data Lake Gen1 as source in our Tabular model. We have deployed this model on on-prem server. Now, every time we have to manually refresh the credentials to process the model. We want to automate that. I have already tried…
chetan S
  • 31
  • 1
3
votes
1 answer

Azure ADLSGEN2 - API Error 400 - DatalakeStorageException The request URI is invalid

I'm using Azure SDK (Java) to create directory, upload files, move files in ADLSGEN2. My input are very simple, it look like: path : /path/to/fileOrFolder But I got following Error: com.azure.storage.file.datalake.models.DatalakeStorageException:…
tdebroc
  • 1,436
  • 13
  • 28
3
votes
1 answer

Azure Data Factory copy from folder onwards

I am trying to create a copy activity between two Azure Data Lakes GEN1. I don't need to copy all the folders from the source Data Lake, for example if I have the following directory…
Mikel Laburu
  • 157
  • 1
  • 12
3
votes
1 answer

What is the correct way to identify if the folder exist on ADLS gen 2 account or not

I am working in scala and spark environment where I want to read parquet file. Before I read, I want to check if the file exists or not. I am writing the following code in jupyter notebook but it does not work - meaning it does not show any frame…
user10360768
  • 225
  • 3
  • 14
3
votes
0 answers

Kusto (KQL) Join on Multiple columns

I'm producing two pivoted data sets: Data set 1: let T1 = data | where col1 == "blah" | evaluate pivot(col2, count(col2), col3, col4); Data set 2: let T2 = data | where col1 == "blahblah" | evaluate pivot(col2, count(col2), col3, col4); Both of…
Louis
  • 71
  • 1
  • 1
  • 5
3
votes
1 answer

Does Azure Data Lake Gen2 provides WebHDFS REST API's?

If no, is it possible to use WebHDFS API from HDInsight to connect with ADL Gen2?
3
votes
1 answer

Error connecting to DataLake(ADLS Gen2) store from databricks

I am trying to connect to dataLake Gen2 storage from databricks python, unfortunately I am running into error. Code: dbutils.fs.ls("abfss://@.dfs.core.windows.net/") Error Message: Configuration property…
Idleguys
  • 325
  • 1
  • 7
  • 18
3
votes
1 answer

Spark Predicate Push Down, Filtering and Partition Pruning for Azure Data Lake

I had been reading about spark predicates pushdown and partition pruning to understand the amount of data read. I had the following doubts related to the same Suppose I have a dataset with columns (Year: Int, SchoolName: String, StudentId: Int,…
3
votes
1 answer

Azure ADLS gen2 listing folders given authentication error using rest api and service principal

I have a storage account in azure with ADLS gen2 (hierarchy enabled). I have created a app and a service principal. I have also created a container in the storage and a folder inside that container. Using ACL I have given execute permission to…
Nipun
  • 4,119
  • 5
  • 47
  • 83
3
votes
1 answer

how to merge two csv files in azure data factory

I want to update the Target csv file (Located in Azure Data Lake Store) with delta records updated every day (delta file sit in blob). If existed record updated, then I want to update the same in Target file or if the delta records is new one, then…
3
votes
1 answer

Why can't Databricks Python read from my Azure Datalake Storage Gen1?

I am trying to read a file mydir/mycsv.csv from Azure Data Lake Storage Gen1 from a Databricks notebook, using the syntax (inspired by the documentation) configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential", …
Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
3
votes
0 answers

How to save my Pandas DataFrame to Azure Data Lake Gen2 account in "XLSX" excel format?

Currently i am importing data from azure data lake gen2 using pandas in Azure Data Bricks which is working fine. But after i am done with data processing, i want to export pandas data frame to azure data lake gen2 account which is still working…
user2066958
  • 57
  • 1
  • 1
  • 11
3
votes
2 answers

How to create a file or upload a file to Azure Data Lake Storage Gen2

I have created a Azure Data Lake Storage Gen2 account through azure portal. How can I create a file in that account through c# code. I googled a lot but didn't find any samples. Update 1: In order to call REST api, I tried to generate token using…
S.Chandra Sekhar
  • 453
  • 3
  • 11
  • 22
3
votes
1 answer

Azure Data Lake error when inserting a file: The access control list value is invalid

We use Azure Data Lake (Gen2) to store files. For the authorization we use a bearer token. Most call succeed. But some calls fail with this error: Response code 400. The access control list value is invalid Some other calls fail with this…
Max
  • 2,529
  • 1
  • 18
  • 29