Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
4
votes
1 answer

ADF triggered ADL jobs failing with syntax error

I am trying to run a job that runs successfully from within Visual Studio. I'd like to run this in my ADF pipeline but the job fails with a syntax error. ERRORID: E_CSC_USER_SYNTAXERROR SEVERITY: Error COMPONENT: CSC SOURCE: USER MESSAGE: …
chi
  • 471
  • 3
  • 18
4
votes
5 answers

Connect Azure Event Hubs with Data Lake Store

What is the best way to send data from Event Hubs to Data Lake Store?
irriss
  • 742
  • 2
  • 11
  • 22
3
votes
0 answers

What causes the same data to have different sizes in two different SQL server databases?

I have a table with 339 million rows and twenty-one columns of which seventeen are varchar(100) types, two are integer types and one is a float and a DateTime type. It is in an Azure SQL server database. The table has no indices and only the primary…
3
votes
1 answer

Extracting data lake data

I have a data lake path as follows: SYSTEM/Data/Year/Month/Date/Hist/hist.parquet hist.parquet is present in every Year/Month/Date/Hist folder structure I want to append all the parqets for all years as a single parquet file in pyspark in a differnt…
Scope
  • 727
  • 4
  • 15
3
votes
1 answer

Endpoint doesn't support BlobStorageEvents or softdelete exception

While trying to do a DataPreview or debug a pipeline, i am getting the below error stating "endpoint doesn't support blobstorage events or soft delete" I do not want to disable the soft delete
3
votes
0 answers

How to read delta table inside Azure Functions using python

I'm Currently working on Azure Functions where I need to read delta table from ADLS GEN2 directly. is there any way that I can use it like Azure SDK's or other alternatives ?
3
votes
1 answer

Delta lake and ADLS Gen2 transactions

We are running a Delta lake on ADLS Gen2 with plenty of tables and Spark jobs. The Spark jobs are running in Databricks and we mounted the ADLS containers into DBFS (abfss://delta@.dfs.core.windows.net/silver). There's one…
3
votes
1 answer

Apache Spark/Azure Data Lake Storage - Process the file exactly once, tag the file as processed

I have an Azure Data Lake Storage container which acts as a landing area for JSON files to process by Apache Spark. There are tens of thousands of small (up to a few MB) files there. The Spark code reads these files on a regular basis and performs…
BuahahaXD
  • 609
  • 2
  • 8
  • 24
3
votes
1 answer

How to add date to filename during copy activity in Azure Data Factory?

I am pulling a folder from an SFTP in Azure Data Factory, this folder will always have the same name, so I specified it explicitly in my copy activity, but I am trying to figure out how to add the date that it is being copied over to the current…
user14791234
3
votes
1 answer

Azure Data Factory - extracting information from Data Lake Gen 2 JSON files

I have an ADF pipeline loading raw log data as JSON files into a Data Lake Gen 2 container. We now want to extract information from those JSON files and I am trying to find the best way to get information from said files. I found that Azure Data…
3
votes
1 answer

Export pandas data frame to Azure Data Lake Storage as a CSV file?

This may be an uncommon question as I believe it has never been asked before, but is it possible to export a Pandas data frame straight to an Azure Data Lake Storage as a CSV file? To add some context, I have a pandas dataframe which gets exported…
jcoke
  • 1,555
  • 1
  • 13
  • 27
3
votes
3 answers

How to move files from one folder to another on databricks

I am trying to move the file from one folder to another folder using databricks python notebook. My source is azure data lake gen 1. Suppose, my file is present adl://testdatalakegen12021.azuredatalakestore.net/source/test.csv and I am trying to…
amikm
  • 65
  • 1
  • 2
  • 11
3
votes
3 answers

writing appending text file from databricks to azure adls gen1

I want to write kind of a log file back to azure adls gen1 I can write (not append) using dbutils.fs.put(filename,"random text") but i cant append it using with open("/dbfs/mnt/filename.txt","a"): f.write("random text") it give me error 1 with …
3
votes
1 answer

How do I build a Docker image representing Azure's Data Lake (gen 2)?

I'm using the following Docker image for a MS Sql Server ... version: "3.2" services: sql-server-db: image: mcr.microsoft.com/mssql/server:latest ports: - 1433:1433 env_file: ./tests/.my_test_env How do I construct a Docker…
Dave
  • 15,639
  • 133
  • 442
  • 830
3
votes
2 answers

Connect to Azure Data Lake Storage Gen 2 with SAS token Power BI

I'm trying to connec to to ADLS Gen 2 container with Power BI, but I've only found the option to connect with the key1/2 from the container (active directory is not an option in this case). However, I don't want to use those keys since they are…
Rodrigo A
  • 657
  • 7
  • 23