I have a text that contains words and numbers. I'll give a representative example of the text:
string = "This is a 1example of the text. But, it only is 2.5 percent of all data"
I'd like to convert it to something like:
"This is a 1 example of the text But it only is 2.5 percent of all data"
So removing punctuation (can be .
,
or any other in string.punctuation
) and also put a space between digits and words when it is concatenated. But keep the floats like 2.5 in my example.
I used the following code:
item = "This is a 1example of the text. But, it only is 2.5 percent of all data"
item = ' '.join(re.sub( r"([A-Z])", r" \1", item).split())
# This a start but not there yet !
#item = ' '.join([x.strip(string.punctuation) for x in item.split() if x not in string.digits])
item = ' '.join(re.split(r'(\d+)', item) )
print item
The result is :
>> "This is a 1 example of the text. But, it only is 2 . 5 percent of all data"
I'm almost there but can't figure out that last peace.