9

we are looking at developing a BI solution in Azure to analyse client airline search requests to our system. The requests are stored as xmls and there are around 50 million generated each day. What azure solutions would you recommend to load these to the cloud, analyse this data...trends across time (so keeping historical data), real time system performance/errors..the data is included in the xmls?

I assume we would also need something like express route to transfer the data to the cloud

Alberto Morillo
  • 13,893
  • 2
  • 24
  • 30
DC07
  • 293
  • 5
  • 18
  • How frequently do these reports need to be ran? What would be the maximum amount of time from when the xmls are generated to when they need to be processed? – Preston Martin Feb 01 '18 at 15:57
  • for the current day xmls, real time would be an hour latency and this would generate some kind of alert/kpi data based on requests of that day. where as the historical data would be available via SSAS for example where the user has complete freedom to query what they want when they want – DC07 Feb 01 '18 at 16:27

3 Answers3

7

You can use Azure Blob Storage to upload that data to Azure. If that XML data is complex to analyze it you may have to transform it to a much simpler format like CSV format.

You can use Azure Data Factory to transform XML files to CSV files, then you can load all generated CSV files into corresponding tables in Azure SQL Data Warehouse via Polybase and a developed Stored Procedure activity on Azure Data Factory.

If you want to have a folder on BLOB storage for XML files not processed, another folder for XML files processed and another folder for the CSV files created and you want Azure Data Factory to move files from one folder to another during all the process, then you need to use HDInsight and create a map reduce activity for that purpose.

PolyBase allows to use T-SQL statements to access data stored in Hadoop or Azure Blob Storage and query it in an adhoc fashion. It also can let you query semi-structured data and join the results with relational data sets stored in SQL Data Warehouse. You should be able to create external tables pointing to data in Blob Storage using PolyBase. Then using a simple stored procedure you can select all the data in the external table and load them into the corresponding tables in SQL Data Warehouse.

Once all the transformed data is loaded into the corresponding tables, you can write stored procedures to generate the data for aggregation tables which you can then use to feed PowerBI dashboards.

Hope this helps.

Alberto Morillo
  • 13,893
  • 2
  • 24
  • 30
  • 7
    Hi @Alberto, You mentioned that "You can use Azure Data Factory to transform XML files to CSV files". How to do that? – Fang Liu Oct 11 '18 at 09:48
  • Yes, how can ADF transform XML? – SteveC Jun 25 '19 at 04:41
  • 1
    hey, Alberto, you are quite often online, care to come back to this answer and revise it so people know what you meant\how to achieve what you said? – 4c74356b41 Jul 01 '19 at 15:12
  • whats the usage of polybase vs regular u-sql? – Alex Gordon Sep 08 '19 at 17:46
  • XML is finally planned to come to ADF. the status was changed today https://feedback.azure.com/forums/270578-data-factory/suggestions/17508058-xml-file-type-in-copy-activity-along-with-xml-sc – DC07 Jan 09 '20 at 10:52
6

Official documentation for Azure Data Lake's U-SQL say:

Parse unstructured data such as Web pages and emails, or semi-unstructured data such as XML/JSON.

Then you can combine that with Azure Stream Analytics to create alerts or connect to other Azure services like EventHub, etc..

Expensive, but quite reliable solution would be incorporate also Logic App with some connector to create workflow-like experience.

krs
  • 543
  • 2
  • 17
2

At the moment (2020-06), XML is still not supported by Azure Synapse (Azure DW).

A feasible approach is using Azure Functions to either parse the XML, or translate it to JSON (and then use SQL json_query/value). The second approach does mean losing Xquery functionality.

Classic Azure SQL Server (or it's hyperscale version) does support regular XML queries.

Karel
  • 21
  • 4