2

I have a table with an XML column. The XML structure of each entry is completely flat, without even the parent tag - this is an example of one entry:

<tag1>1.22</tag1>
<tag3>5</tag3>
<tag12>-1.22</tag>

So far, I've been able to do things like this:

SELECT CAST(xml_column AS NVARCHAR(MAX)) as XML_text

And parse the XML on my end. Or I apparently I can write xpath queries to select tags into columns, which is what I want, except there is like 1000 possible tag names, and I don't want to write them all out (and possibly miss one).

In short, how do I go from this:

| ID | XML type column                |
| 1  | <tag1>1</tag1><tag2>2</tag2>   |
| 2  | <tag2>8</tag2><tag34>1</tag34> |

To this:

| ID | tag1 | tag2 | tag34 |
| 1  | 1    | 2    | NULL  |
| 2  | NULL | 8    | 1     |

for any tags I could find in my dataset, without knowing them in advance? I would settle for this as well:

| ID | tag   | value | 
| 1  | tag1  | 1     |
| 1  | tag2  | 2     |
| 2  | tag2  | 8     |
| 2  | tag34 | 1     |
Liudvikas Akelis
  • 1,164
  • 8
  • 15
  • I would also go for a long representation, like this: | ID | tag | value | | 1 | tag1| 1 | | 1 | tag2 | 2 | |2 | tag2 | 8 | |2 | tag34| 1 | – Liudvikas Akelis May 12 '20 at 15:42

2 Answers2

6

The following will transform your flat XML without a root node into a classical EAV list:

DECLARE @tbl TABLE(ID INT IDENTITY, YourXml XML);
INSERT INTO @tbl VALUES
 ('<tag1>1</tag1><tag2>2</tag2>')
,('<tag3>3</tag3><tag4>4</tag4><tag5>5</tag5>');

--The query

SELECT t.ID
      ,AllNodes.value('local-name(.)','nvarchar(max)') AS TagName
      ,AllNodes.value('text()[1]','nvarchar(max)') AS TagValue
FROM @tbl t
CROSS APPLY t.YourXml.nodes('/*') A(AllNodes);

The XQuery function local-name() allows to query for meta data, the text() node represents the element's content.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
1

It is relatively simple to do in MS SQL Server by using XML data type methods and XQuery. SQL Server can handle XML fragments without a root element without any problem.

What you are trying to achieve is called XML shredding. Though you do need to know (1) XML element names or (2) their sequential position.

Check it out method #1.

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, xmldata XML);
INSERT INTO @tbl (xmldata) VALUES
(N'<tag1>1</tag1><tag2>2</tag2>')
,(N'<tag2>8</tag2><tag34>1</tag34>');
-- DDL and sample data population, end

-- Shred XML and convert it into a rectangular format
SELECT ID
    , col.value('(tag1/text())[1]','VARCHAR(10)') AS tag1 
    , col.value('(tag2/text())[1]','VARCHAR(10)') AS tag2
    , col.value('(tag3/text())[1]','VARCHAR(10)') AS tag3
    , col.value('(tag34/text())[1]','VARCHAR(10)') AS tag34
FROM @tbl AS tbl
    CROSS APPLY tbl.xmldata.nodes('.') AS tab(col);

Output

+----+------+------+------+-------+
| ID | tag1 | tag2 | tag3 | tag34 |
+----+------+------+------+-------+
|  1 | 1    |    2 | NULL | NULL  |
|  2 | NULL |    8 | NULL | 1     |
+----+------+------+------+-------+
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • Your answer, much like MSSQL itself, is functioning and correct, but is there a more scalable way? In my case, I don't even have all the tag names in advance, and even when I do, its > 1000 tags. I am aware of generating and evaluating SQL strings, but this would seem more of a hack than a solution, especially manipulating strings in SQL. (A solution for long-form table would suffice as well) – Liudvikas Akelis May 12 '20 at 17:30
  • Could you please provide more details on how you are planning to use "...> 1000 tags..."? – Yitzhak Khabinsky May 12 '20 at 18:56
  • The rows are forms, and the tags are fields. There are >30 different types of form, with their fields sometimes overlapping, but no single type of form has all the fields, so just to get all the valid field names, I have to manually parse an unspecified number of records in some manual way, if I am to use Xquery. Of course, there should be specs for this, but I am very far downstream from the data source and would hate to miss entries in case they update the spec. – Liudvikas Akelis May 12 '20 at 19:21
  • What about @Shnugo idea to retrieve the data as a key/value pairs? – Yitzhak Khabinsky May 12 '20 at 20:26