-2

I have a string that contains the words "e.g." and "Mrs.". I need to split into two sentences using the period as a delimiter. However, if I use the period character, e and g are splitted into different index in an array instead of having an array with two sentences.

string wholeSentence = @"Each paragraph may have a number of sentences, depending on the topic. I can now write topics on sports e.g. basketball, football, baseball and submit it to Mrs. Smith.";
            string[] collection = wholeSentence.Split('.');

            foreach(string sentence in collection)
            {
                Console.WriteLine(sentence);
            }

            Console.ReadLine();

Output

Each paragraph may have a number of sentences, depending on the topic
 I can now write topics on sports e
g
 basketball, football, baseball and submit it to Mrs
 Smith

May I know how can this be corrected?

Joseph
  • 47
  • 8

1 Answers1

0

As you want to check for e.g. and Mrs. then you can simply, temporarily replace the e.g. to something else, such as e*g* and Mrs. to Mrs*. Then, once the splitting has taken place, you can check if the sentence contains your replace chars, then you can again, replace them back with the previous terms. Something like below:

string wholeSentence = @"Each paragraph may have a number of sentences, depending on the topic. I can now write topics on sports e.g. basketball, football, baseball and submit it to Mrs. Smith.";
string[] collection = wholeSentence.Replace("e.g.", "e*g*").Replace("Mrs.", "Mrs*").Split('.');

foreach (string sentence in collection)
{
    if (sentence.Contains("e*g*") || sentence.Contains("Mrs*"))
    {
        Console.WriteLine(sentence.Replace("*", "."));
        continue;
    }
    Console.WriteLine(sentence);
}
Jamshaid K.
  • 3,555
  • 1
  • 27
  • 42
  • you have hardcoded Replace eg and Replace Mrs in your solution. How can it be dynamic for other prefixes like Mr and natural language like AM or P.M.? – Joseph Nov 03 '20 at 14:00
  • The thing is, it won't. It was for those two words that are in the sentence. How else do you think user will detect the words as there could be many words with abbreviations in a sentence one cannot list. – Jamshaid K. Nov 03 '20 at 14:25
  • And you cannot say natural language for AM PM, as we all know these are the abbreviations of `ante merīdiem` and `post meridiem` respectively and there are thousands if not millions of abbreviations in English language. – Jamshaid K. Nov 03 '20 at 14:28