23

I need to take a string, and capitalize words in it. Certain words ("in", "at", etc.), are not capitalized and are changed to lower case if encountered. The first word should always be capitalized. Last names like "McFly" are not in the current scope, so the same rule will apply to them - only first letter capitalized.

For example: "of mice and men By CNN" should be changed to "Of Mice and Men by CNN". (Therefore ToTitleString won't work here.)

What would be the best way to do that?

I thought of splitting the string by spaces, and go over each word, changing it if necessary, and concatenating it to the previous word, and so on. It seems pretty naive and I was wondering if there's a better way to do it. I am using .NET 3.5.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nir
  • 3,963
  • 8
  • 37
  • 51
  • How should the program understand that CNN should remain all uppercase? – Øyvind Bråthen Nov 30 '10 at 15:48
  • 4
    @Chris: I think he means ToTitleCase...? See here http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx and 'remarks' for why it isn't applicable. – David Hedlund Nov 30 '10 at 15:49
  • @Øyvind: the program could potentially respect existing uppercase letters, and only ever change casing to uppercase, not lowercase. the only place in the example where that wouldn't work, would be where "By" is turned into "by". When "by" is encountered in all lowercase, I think it should be matched towards a dictionary of words to leave as lowercase, but when it's cased "By", I really think it should be left that way, because we couldn't distinguish it from the potential city (or whatever) called "By". – David Hedlund Nov 30 '10 at 15:51
  • Looks like ToTitleCase could do it with some minor tweaks, assuming you just want english and no other cultures. – Chris S Nov 30 '10 at 15:54
  • 2
    possible duplicate of [How to Capitalize names](http://stackoverflow.com/questions/5057793/how-to-capitalize-names) – Druid Jun 09 '15 at 14:25

10 Answers10

23

Use

Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase("of mice and men By CNN");

to convert to proper case and then you can loop through the keywords as you have mentioned.

Liam
  • 27,717
  • 28
  • 128
  • 190
dhinesh
  • 4,676
  • 2
  • 26
  • 44
  • 1
    This is close but if you look at the OP's example the "And" should not be capitalized. But this could be the first step in the algorithm. Do the ToTitleCase conversion and the look for special case words based on a dictionary and revert them back to lower case? – Paul Sasik Nov 30 '10 at 15:55
19

Depending on how often you plan on doing the capitalization I'd go with the naive approach. You could possibly do it with a regular expression, but the fact that you don't want certain words capitalized makes that a little trickier.

You can do it with two passes using regular expressions:

var result = Regex.Replace("of mice and men isn't By CNN", @"\b(\w)", m => m.Value.ToUpper());
result = Regex.Replace(result, @"(\s(of|in|by|and)|\'[st])\b", m => m.Value.ToLower(), RegexOptions.IgnoreCase);

This outputs Of Mice and Men Isn't by CNN.

The first expression capitalizes every letter on a word boundary and the second one downcases any words matching the list that are surrounded by white space.

The downsides to this approach is that you're using regexs (now you have two problems) and you'll need to keep that list of excluded words up to date. My regex-fu isn't good enough to be able to do it in one expression, but it might be possible.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jonnii
  • 28,019
  • 8
  • 80
  • 108
  • 1
    This has a small bug. For a string like "Tom's thing", the result is "Tom'S Thing". – TMC Jun 21 '12 at 05:49
8

An answer from another question, How to Capitalize names -

CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;

Console.WriteLine(textInfo.ToTitleCase(title));
Console.WriteLine(textInfo.ToLower(title));
Console.WriteLine(textInfo.ToUpper(title));
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vasyl Senko
  • 1,779
  • 20
  • 33
4

Use ToTitleCase() first and then keep a list of applicable words and Replace back to the all-lower-case version of those applicable words (provided that list is small).

The list of applicable words could be kept in a dictionary and looped through pretty efficiently, replacing with the .ToLower() equivalent.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
funkymushroom
  • 2,079
  • 2
  • 26
  • 39
  • 1
    I know the answer is old, but ToTitleCase() doesn't exist anymore in .NET Windows Store. You will need to rely on ToUpper() only and possibly some Regex for future applications. – Cœur Jun 21 '13 at 13:24
  • Thank you for the update, much appreciated. – funkymushroom Oct 04 '21 at 19:01
2

A slight improvement on jonnii's answer:

var result = Regex.Replace(s.Trim(), @"\b(\w)", m => m.Value.ToUpper());
result = Regex.Replace(result, @"\s(of|in|by|and)\s", m => m.Value.ToLower(), RegexOptions.IgnoreCase);
result = result.Replace("'S", "'s");
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ilan
  • 576
  • 3
  • 6
2

Try something like this:

public static string TitleCase(string input, params string[] dontCapitalize) {
   var split = input.Split(' ');
   for(int i = 0; i < split.Length; i++)
        split[i] = i == 0 
          ? CapitalizeWord(split[i]) 
          : dontCapitalize.Contains(split[i])
             ? split[i]
             : CapitalizeWord(split[i]);
   return string.Join(" ", split);
}
public static string CapitalizeWord(string word)
{
    return char.ToUpper(word[0]) + word.Substring(1);
}

You can then later update the CapitalizeWord method if you need to handle complex surnames. Add those methods to a class and use it like this:

SomeClass.TitleCase("a test is a sentence", "is", "a"); // returns "A Test is a Sentence"
Răzvan Flavius Panda
  • 21,730
  • 17
  • 111
  • 169
m0sa
  • 10,712
  • 4
  • 44
  • 91
1

The easiest obvious solution (for English sentences) would be to:

  • "sentence".Split(" ") the sentence on space characters
  • Loop through each item
  • Capitalize the first letter of each item - item[i][0].ToUpper(),
  • Remerge back into a string joined on a space.
  • Repeat this process with "." and "," using that new string.
Chris S
  • 64,770
  • 52
  • 221
  • 239
1

You can have a Dictionary having the words you would like to ignore, split the sentence in phrases (.split(' ')) and for each phrase, check if the phrase exists in the dictionary, if it does not, capitalize the first character and then, add the string to a string buffer. If the phrase you are currently processing is in the dictionary, simply add it to the string buffer.

npinti
  • 51,780
  • 5
  • 72
  • 96
1

A non-clever approach that handles the simple case:

var s = "of mice and men By CNN";
var sa = s.Split(' ');
for (var i = 0; i < sa.Length; i++)
    sa[i] = sa[i].Substring(0, 1).ToUpper() + sa[i].Substring(1);
var sout = string.Join(" ", sa);
Console.WriteLine(sout);
D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
0

You should create your own function like you're describing.

slim
  • 2,545
  • 1
  • 24
  • 38