Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
9
votes
3 answers

azure solutions for analysing xml data

we are looking at developing a BI solution in Azure to analyse client airline search requests to our system. The requests are stored as xmls and there are around 50 million generated each day. What azure solutions would you recommend to load these…
9
votes
4 answers

How to query Azure Data Lake?

Coming from the database world, when we have something related to Data we use a ui tool to query data. Be it big or small. Is there anything like SSMS, SQL WorkBench (For Big Data Redshift), Athena (Query Big Data S3) for Azure Data Lake? I see Data…
Kannaiyan
  • 12,554
  • 3
  • 44
  • 83
8
votes
5 answers

ARM Template - auto approval of managed private endpoint

I am developing an ARM template for Azure Data Factory with managed private endpoints to SQL Server and Azure Datalake. However, when the ARM template completes execution, the managed private endpoints are in "Pending" state. How can I provision the…
8
votes
2 answers

Databricks Prints Only Around 280 lines of data

I'm running some large jobs in Databricks, which for now, include inventorying the data lake. I'm trying to print all blob names within a prefix (sub-folder). There are a lot of files in these sub-folders, and I'm getting about 280 rows of file…
ASH
  • 20,759
  • 19
  • 87
  • 200
8
votes
2 answers

Can an System assigned managed service identity be added to an AAD group?

I have an Azure Data Factory V2 service running with an MSI identity. This service needs to access a Data Lake Gen 1 with thousands of folders and millions of files. For efficiency, we have a group assigned to the root of the data lake which has RX…
MarkD
  • 1,511
  • 18
  • 32
8
votes
3 answers

Is There a Local Emulator for the Azure Data Lake Store

When developing for Azure storage accounts, I can run the Microsoft Storage Emulator to locally keep Blobs, Queues, and Tables without having to connect to Azure online. Is there something equivalent for the Azure Data Lake Store? It would be nice…
HaveSpacesuit
  • 3,572
  • 6
  • 40
  • 59
8
votes
2 answers

Write Python Dataframe to CSV file directly in Azure Datalake

I have imported an excel file into a pandas dataframe and have completed the data exploration and cleaning process. I now want to write the cleaned dataframe to csv file back to Azure DataLake, without saving it first as a local file. I am using…
Juanita Smith
  • 169
  • 3
  • 5
  • 9
7
votes
1 answer

Read data from Azure blob storage in Azure Function in python

Please how do I read in data from my Azure Storage account when I launch my Function app. I need to read the saved weights for my machine learning model at runtime. I want to read the model directly from the storage account because the model is…
7
votes
3 answers

How to get the last modification time of each files present in azure datalake storage using python in databricks workspace?

I am trying to get the last modification time of each file present in azure data lake. files = dbutils.fs.ls('/mnt/blob') for fi in files: print(fi) Output:-FileInfo(path='dbfs:/mnt/blob/rule_sheet_recon.xlsx', name='rule_sheet_recon.xlsx',…
7
votes
3 answers

How to loop through Azure Datalake Store files in Azure Databricks

I am currently listing files in Azure Datalake Store gen1 successfully with the following command: dbutils.fs.ls('mnt/dbfolder1/projects/clients') The structure of this folder is - client_comp_automotive_1.json [File] -…
STORM
  • 4,005
  • 11
  • 49
  • 98
7
votes
0 answers

Checkpoint function error in R- arguments imply differing number of rows: 1, 38, 37

I need to create libraries for HighDimout and it's dependencies. This will be used to keep my code on ADLA ( Azure data lake Analytics). I am using checkpoint function from checkpoint package created by microsoft. mm <-…
Arpit Sisodia
  • 570
  • 5
  • 18
7
votes
3 answers

Azure Data Lake Analytics Vs Azure SQL Data Warehouse

I am using ADF to connect to sources and get data into Azure Data Lake store. After getting data into Data Lake Store, I want to do some transformation, aggregation and use that data in SSRS reports and also for creating Cubes. Can anyone suggest…
Naga
  • 71
  • 1
  • 5
7
votes
2 answers

Parse json file in U-SQL

I'm trying to parse below Json file using USQL but keep getting error. Json…
Saz
  • 75
  • 1
  • 6
7
votes
3 answers

Reasons to use Azure Data Lake Analytics vs Traditional ETL approach

I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years. I think of Data Lake as something very linked to big…
Carlos Moreno
  • 141
  • 2
  • 6
7
votes
3 answers

U-SQL Output in Azure Data Lake

Would it be possible to automatically split a table into several files based on column values if I don't know how many different key values the table contains? Is it possible to put the key value into the filename?
peterko
  • 503
  • 1
  • 6
  • 18