Before posting, I tried the hive sentences function and did some search but couldn't get a clear understanding, my question is based on what delimiter hive sentences function breaks each sentence? hive manual says "appropriate boundary" what does that mean? Below is an example of my tries, I tried adding period (.) and exclamatory sign(!) at different points of the sentence. I'm getting different outputs, can someone explain on this?
with period (.)
select sentences('Tokenizes a string of natural language text into words and sentences. where each sentence is broken at the appropriate sentence boundary and returned as an array of words.') from dummytable
output - 1 array
[["Tokenizes","a","string","of","natural","language","text","into","words","and","sentences","where","each","sentence","is","broken","at","the","appropriate","sentence","boundary","and","returned","as","an","array","of","words"]]
with '!'
select sentences('Tokenizes a string of natural language text into words and sentences! where each sentence is broken at the appropriate sentence boundary and returned as an array of words.') from dummytable
output - 2 arrays
[["Tokenizes","a","string","of","natural","language","text","into","words","and","sentences"],["where","each","sentence","is","broken","at","the","appropriate","sentence","boundary","and","returned","as","an","array","of","words"]]