Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
7
votes
3 answers

Moving - not copying - data in Azure Data Factory

I'd like to set up an Azure Data Factory pipeline which performs a move (i.e. copy, verify, delete) operation rather than just a copy operation between Blob Storage and a Data Lake Store. I cannot seem to find any detail on how to do this.
Sam
  • 71
  • 1
  • 1
  • 2
7
votes
1 answer

How to avoid "Data Lake" menu item appearing in menu bar of Visual Studio 2015?

When I start Visual Studio Professional 2015 Update 2 and open my vb.net project, Data Lake item is not present in the main menu bar. But it appears later on, during development work. I did not notice so far on which particular action it appears.…
miroxlav
  • 11,796
  • 5
  • 58
  • 99
6
votes
2 answers

How to do undo in Azure Data Factory

I am new to Azure Data factory. While developing the pipeline I could not find undo operation in Azure Data Factory. ctrl+z did not work. What is the keyboard shortcut for the undo?
Nidi
  • 61
  • 1
  • 3
6
votes
4 answers

Transfer the output of 'Set Variable' activity into a json file [Azure Data Factory]

In Data Factory, can we have the output from 'Set Variable' activity being logged as a json file?
OreoFanatics
  • 818
  • 4
  • 15
  • 32
6
votes
3 answers

Intermittent HTTP error when loading files from ADLS Gen2 in Azure Databricks

I am getting an intermittent HTTP error when I try to load the contents of files in Azure Databricks from ADLS Gen2. The storage account has been mounted using a service principal associated with Databricks and has been given Storage Blob Data…
Amit Sukralia
  • 950
  • 1
  • 5
  • 13
6
votes
3 answers

Make sure the ACL and firewall rule is correctly configured in the Azure Data Lake Store account

I'm coping CSV files from Azure blob to Azure Data Lake using Azure data factory using Copy data tool. I'm following this link: https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-copy-data-tool Fron Copy data tool my…
AskMe
  • 2,495
  • 8
  • 49
  • 102
6
votes
2 answers

Configure standalone spark for azure storage access

I have a need to be able to run spark on my local machine to access azure wasb and adl urls, but I can't get it to work. I have a stripped down example here: maven pom.xml (Brand-new pom, only the dependencies have been…
absmiths
  • 1,144
  • 1
  • 12
  • 21
6
votes
2 answers

30Mb limit uploading to Azure DataLake using DataLakeStoreFileSystemManagementClient

I am receiving an error when using _adlsFileSystemClient.FileSystem.Create(_adlsAccountName, destFilePath, stream, overwrite) to upload files to a datalake. The error comes up with files over 30Mb. It works fine with smaller files. The error…
Tom Armstrong
  • 75
  • 1
  • 7
6
votes
2 answers

How to schedule a U-SQL Query in Azure Data Lake?

I want to execute a query in azure data lake daily. Can we schedule a U-SQL query in azure data lake?
Jai
  • 416
  • 6
  • 20
6
votes
2 answers

U- SQL Unable to extract data from JSON file

I was trying to extract data from a JSON file using USQL. Either the query runs successfully without producing any output data or results in "vertex failed fast error". The JSON file looks like: { "results": [ { "name": "Sales/Account", …
Sarath Rachuri
  • 2,086
  • 2
  • 18
  • 18
5
votes
1 answer

Azure Data Lake storage Gen2 permissions

I am currently building a data lake (Gen2) in Azure. I use Terraform to provision all the resources. However, I ran into some permission inconsistencies. According to the documentation, one can set permissions for the data lake with RBAC and…
Cloudkollektiv
  • 11,852
  • 3
  • 44
  • 71
5
votes
2 answers

Connecting C# Application to Azure Databricks

I am currently working on a project where we have data stored on Azure Datalake. The Datalake is hooked to Azure Databricks. The requirement asks that the Azure Databricks is to be connected to a C# application to be able to run queries and get the…
Ryan Falzon
  • 329
  • 4
  • 15
5
votes
2 answers

Transfer from ADLS2 to Compute Target very slow Azure Machine Learning

During a training script executed on a compute target, we're trying to download a registered Dataset from an ADLS2 Datastore. The problem is that it takes hours to download ~1.5Gb (splitted into ~8500 files) to the compute target with the following…
5
votes
1 answer

How to run python egg (present in azure databricks) from Azure data factory?

So I created a small pyspark application and converted it to an egg. Uploaded it to dbfs:/FileStore/jar/xyz.egg. In ADF I used jar activity. But in Main Class Name textbox i am confused what to provide. My Pycharm application has three files, two of…
Bilal Shafqat
  • 689
  • 2
  • 14
  • 26
5
votes
4 answers

Sending an event on creating a new file in azure data lake gen 1

I want to send an event or a notification to external NIFI flow once a new file has been added to azure data lake gen 1. Any one worked or has any information about this use case?