Python - Making words from characters separated by space

Question

I have a JSON file which I converted to string to remove HTML tags, but the function returns unicode values as shown below:

[u'', u'', u'', u'c', u'i', u's', u' ', u'b', u'y', u' ', u'd', u'e', u'l', u'o', u'i', u't', u't', u'e', u'']

I want to extract the words from above output cis by deloitte. Let me know how to resolve this. The code I have tried is shown below:

def cleaning_data(input_json_data):
   jd = input_json_data['description']    
   jd = [x.lower() for x in jd]
   jd = str(jd)
   jd = re.sub('<[^>]*>', '', jd)
   print jd

Why are you converting the `jd` list into a string with `jd = str(jd)`? — PM 2Ring, Jan 22 '17 at 12:45
Since the re module works only on buffer or string. I had to convert it into string. Please let me know if there is any other way as well.. — Rishabh Rusia, Jan 22 '17 at 13:54
Is `input_json_data['description']` a string, or is it a list of strings? If it's a single string you should've converted it to lowercase with `jd = input_json_data['description'].lower()`. But you can join a list of strings into a string with `''.join(jd)`, as shown in the answer below and in [the linked question](http://stackoverflow.com/questions/12453580/concatenate-item-in-list-to-strings). — PM 2Ring, Jan 22 '17 at 16:12
@PM 2Ring input_json_data is a json file from which I am taking description key data. The type for 'input_json_data['description']' is unicode. It is therefore converted to string. If there is way convert Json data into DataFrame, do let me know, it will be helpful for my task — Rishabh Rusia, Jan 22 '17 at 17:13
The re module works perfectly fine with a unicode object (aka unicode string). There's no need to convert it to `str`. — lenz, Jan 23 '17 at 14:33

score 1 · Accepted Answer · answered Jan 22 '17 at 12:28

Simply join the elements in the list on empty string if its a list.

a = [u'', u'', u'', u'c', u'i', u's', u' ', u'b', u'y', u' ', u'd', u'e', u'l', u'o', u'i', u't', u't', u'e', u'']
print(''.join(a))

If it's not a list and is a string, then you can eval it first like so:

from ast import literal_eval

a = """[u'', u'', u'', u'c', u'i', u's', u' ', u'b', u'y', u' ', u'd', u'e', u'l', u'o', u'i', u't', u't', u'e', u'']"""
a = literal_eval(a)
print(''.join(a))

Output:

u'cis by deloitte'

Thanks !! was helpful @MYGz – Rishabh Rusia Jan 22 '17 at 17:15 — Rishabh Rusia, Jan 22 '17 at 17:15

Python - Making words from characters separated by space

1 Answers1