0

Pretty straightforward. I have an array-like column encoded as a string (varchar) and want to cast it to array (so I can then explode it and manipulate the elements in "long" format).

The two most natural approaches don't seem to work:

-- just returns a length-1 array with a single string element '[1, 2, 3]'
select array('[1, 2, 3]')

-- errors: DataType array is not supported.
select cast('[1, 2, 3]' as array)

The ugly/inelegant/circuitous way to get what I want is:

select explode(split(replace(replace('[1, 2, 3]', '['), ']'), ', '))
-- '1'
-- '2'
-- '3'

(regexp_replace could subsume the two replace but regex with square brackets are always a pain; ltrim and rtrim or trim(BOTH '[]'...) could also be used)

Is there any more concise way to go about this? I'm on Spark 2.3.1.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • I don't think there's a more concise way- use `regexp_replace()` as in [this answer](https://stackoverflow.com/a/52995057/5858851). In your case you can use: `explode(split(regexp_replace("a", r"(^\[)|(]$)", ""), ", ").cast("array"))` – pault Feb 25 '19 at 14:51
  • @pault thanks... I promise I scoured the `[apache-spark-sql]` tag for about 20 minutes and couldn't find that... I added `cast` there for good measure. Anyway it seems I'm out of luck :\ – MichaelChirico Feb 25 '19 at 15:11

1 Answers1

-1

I am assuming here that the elements are digits. But you get the idea

>>> s = '[1,2,3]'
>>> list(c for c in s if c.isdigit())    
['1', '2', '3']

>>> map(int, list(c for c in s if c.isdigit()))
[1, 2, 3]
Bala
  • 11,068
  • 19
  • 67
  • 120