Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
0
votes
1 answer

Custom USQL extractor - How to process more than 4 MB json object

We use a custom USQL extractor to flatten a json structure. The below sample code works fine if line(json object) of json is less than 4 MB. If the line size is above 4 MB, then we get error "A record in the input file is longer than 4194304 bytes."…
0
votes
1 answer

data lake file to blob poor performance

I'm using azcopy to upload local files to a blob storage. I'm using command: azcopy copy "localpath" "destinationpath(with SAS)" --include="*.csv" --recursive=true I also tried azcopy sync "localpath" "destinationpath(with SAS)"…
mrdeadsven
  • 744
  • 1
  • 9
  • 22
0
votes
2 answers

Saving a dataframe as a csv file(processed in databricks) and uploading it to azure datalake blob storage

I had a csv file stored in azure datalake storage which i imported in databricks by mounting the datalake account in my databricks cluster, After doing preProcessing i wanted to store the csv back in the same datalakegen2 (blobstorage) account.Any…
inr
  • 1
  • 1
0
votes
1 answer

403 error when trying to access file system in Azure data lake storage Gen 2 via REST API

I am trying to access file system in azure data lake storage gen 2 via REST API using java. this is how I am building my request: public static void main(String[] args) throws Exception { String urlString = "https://" + account +…
bora
  • 39
  • 1
  • 5
0
votes
1 answer

data factory loses permissions when copying from data lake (gen1) to blob storage

Data factory gives me this error when attempting to copy from data lake gen1 to blob storage: "message": "Failure happened on 'Sink' side.…
0
votes
1 answer

Azure Data Lake Gen 2 - How to opt in to "Multi-protocol access on Azure Data Lake Storage"

I'm trying to use the dotnet blob API to connect to an Azure Data Lake Gen 2. I have added the Microsoft.Azure.Storage.Blob 11.0.1 to my project (.Net Core 2.2). When I try to list blob containers in the storage account, I get the following error: …
Connell.O'Donnell
  • 3,603
  • 11
  • 27
  • 61
0
votes
1 answer

U-Sql, how do i join without getting cartesian product

I have a large file with rows for each day per ID. There can be more then one record per ID per day but only the newest value is valid. DailyValues: ID int, date datetime, version datetime, value1 float, value2 float, value3 float, value4 float, I…
0
votes
1 answer

Bulk upload to Azure Data Lake Gen 2 with REST APIs

In another related question I had asked how to upload files from on-premise to the Microsoft Azure Data Lake Gen 2, to which an answer was provided via REST APIs. For the sake of completeness, the proposed code can be found below. Since for large…
AlexGuevara
  • 932
  • 11
  • 28
0
votes
1 answer

how to orchestrate data lake activities?

How do we orchestrate the execution of stored procedures in data lake? Example 1. execute sproc dbo.abc 2. execute sproc dbo.xyz 3. execute sproc dbo.aaa The question could be more specifically restated: what integrations does Azure provide in…
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062
0
votes
1 answer

azure functions integrations available with azure data lake

I have some USQL scripts that will generate files: @output= EXTRACT.. OUTPUT @output TO "/myFirstFunction_{myId}.txt" USING Outputters.Tsv(); For every generation of one of these files: /myFirstFunction_{myId}.txt I would like to trigger an Azure…
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062
0
votes
1 answer

Intermittent errors using C# Azure Datalake Gen1 Client "The underlying connection was closed"

I am logging some data to a Gen1 Azure Datalake Store, using the Microsoft.Azure.DataLake.Store driver. I am authenticating and creating a client like so: var adlCreds = await ApplicationTokenProvider.LoginSilentAsync(tenant, clientId, secret); var…
QTom
  • 1,441
  • 1
  • 13
  • 29
0
votes
1 answer

Azure Data lake port connection

We are working on a POC to load data from various datasources to Azure Data lake Gen2 using Azure Data Factory. Is there any default port that we can use in ADF to connect to ADL? Please let me know that.
0
votes
1 answer

Uploading Data(csv file) using Azure Functions(Nodejs) To Azure DataLakeGen2

I am currently trying to send a csv file using Azure Function with NodeJs to Azure Data Lake gen2 but unable to do the same, Any suggestions regarding the same would be really helpful. Thanks. I have tried to use Credentials of blob storage present…
0
votes
1 answer

How do we insert data into a table?

I'm attempting to insert data into a table: @one_files = EXTRACT //all columns FROM "/1_Main{suffixOne}.csv" USING Extractors.Text(delimiter : '|'); CREATE TABLE A1_Main (//all cols); INSERT INTO A1_Main SELECT * FROM @one_files; Within the…
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062
0
votes
1 answer

When I try to write on a file in Azure DataLake I get method not supported

I'm trying to whrite a file in the system, in azure but I get one error I can't figure out. The code works on my enviroment, but not on the zure. I tried my code in different enviroments, they all work except for azure. Below is my code. public…
Pavel
  • 153
  • 1
  • 4
  • 14