1

I have 3 variables that they are text. It can be a sentence, not just a word.

Example:

Variable1: Hey, how are you?. Variable1: I am fine. Variable1: You are nice

I am running this:

LOWER(CONCAT(COALESCE(title," "), COALESCE(hotel_positive," "), COALESCE(hotel_negative,""))) 

Output:


Hey, how are you?.I am fine.You are nice

This is not right as if I have count the words (final goal here), "you?.I" or "fine.you" will come as 1 word.

Any idea on how to improve this?

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Robbie
  • 121
  • 11
  • I answered your question about words count here: https://stackoverflow.com/a/59856125/2700344. Punctuation characters like ? or . should not be considered as word characters And how it hapened that LOWER returns Camel-case in your example?? – leftjoin Jan 22 '20 at 10:12
  • Thanks @leftjoin. It is a bit different as steps are not the same. – Robbie Jan 22 '20 at 11:23
  • Thanks @leftjoin. It is a bit different as steps are not the same. In this case, I need first to to count all the concatenated words by row (Example: "Hey, how are you?.I am fine.You are nice" output is 10 . Due to "you?.I" or "fine.you" scenarios result will be 8. The other case you answered it is to count the frequency of words in the whole dataset, but by row. So, for this post, I need to be able to concatenate avoiding that the end of a sentence and the beginning of the next one comes joined. "you?.I" should be "you?. I" Hope it is clear and thanks for the help! – Robbie Jan 22 '20 at 11:32
  • It is not clear. Do you need to count words in sentences? Please provide the desired result set. Also your output is not what actually should be returned by LOWER function – leftjoin Jan 22 '20 at 11:34
  • For this specific post i just need, that having 'Hey, how are you?' Variable1, 'I am fine.' Variable2, 'You are nice' Variable3, that the output is 10, as I have 10 words putting this 3 variables per row together . I have added Lower function to lowercase all words, nothing with the counting goal itself. – Robbie Jan 22 '20 at 11:38
  • And how it happens that lower function returns camel-case as you stated in your question? – leftjoin Jan 22 '20 at 11:42
  • 1
    "fine.you" will come as 1 word. - this is wrong statement. It will come as two words – leftjoin Jan 22 '20 at 11:43
  • Thanks for the help mate. I see you answer below about counting. Back to the original question related to concatenate 3 text variables using:
     
    
     LOWER(CONCAT(COALESCE(title," "), COALESCE(hotel_positive," "), COALESCE(hotel_negative,"")))  
    
    
    How can I get the following output so "you?.I" or "fine.you" comes with a space after the dot... "Hey, how are you?.I am fine.You are nice"
    – Robbie Jan 22 '20 at 11:52
  • 1
    Well, I already answered it. Look it is in my answer: `concat_ws(' ', Variable1,Variable2,Variable3)` - this inserts spaces between 3 variables – leftjoin Jan 22 '20 at 12:14
  • This works well, thanks!! – Robbie Jan 22 '20 at 14:01
  • Hey @leftjoin, does that solution also add a white space at the end of the last variable? Example "Variable1 Variable2 Variable3 ". After concatenating, I would like to remove white spaces at the end of the output. – Robbie Jan 22 '20 at 16:08
  • No it does not add – leftjoin Jan 22 '20 at 18:43
  • And of course add coalesce or NVL. like in your question. because if variables are nullable, the whole expression will be null also – leftjoin Jan 22 '20 at 18:46
  • Thanks for all the help mate! Do you mean that when "CONCAT(COALESCE(title," "), COALESCE(positive," "), COALESCE(negative,""))", if title is NULL , the output of the whole expression will be NULL? – Robbie Jan 23 '20 at 08:37
  • No. I mean It will be fine with your COALESCE(title," ")... and NULL without coalesce – leftjoin Jan 23 '20 at 10:59

1 Answers1

0
with your_data as(
select 'Hey, how are you?' Variable1, 'I am fine.' Variable2, 'You are nice' Variable3
)

select w.word, count(*) cnt
from
(
select sentences(lower(concat_ws(' ', Variable1,Variable2,Variable3))) sentences from your_data
)d lateral view explode(sentences) s as sentence
   lateral view explode(s.sentence) w as word
group by w.word;

Result:

word    cnt
am      1
are     2
fine    1
hey     1
how     1
i       1
nice    1
you     2

See also this answer: https://stackoverflow.com/a/59856125/2700344

Total count is 10 indeed:

with your_data as(
select 'Hey, how are you?' Variable1, 'I am fine.' Variable2, 'You are nice' Variable3
)

select  count(*) cnt
from
(
select sentences(lower(concat_ws(' ', Variable1,Variable2,Variable3))) sentences from your_data
)d lateral view explode(sentences) s as sentence
   lateral view explode(s.sentence) w as word
;

Result:

10
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • For this specific post i just need, that having 'Hey, how are you?' Variable1, 'I am fine.' Variable2, 'You are nice' Variable3, that the output is 10, as I have 10 words putting this 3 variables per row together – Robbie Jan 22 '20 at 11:33
  • @Robbie Added total word count – leftjoin Jan 22 '20 at 11:40