Converting string to proper title case

Question

I have this exercise:

Write a Title class which is initialized with a string.

It has one method -- fix -- which should return a title-cased version of the string:

Title.new("a title of a book").fix = A Title of a Book
You'll need to use conditional logic - if and else statements - to make this work.
Make sure you read the test specification carefully so you understand the conditional logic to be implemented.

Some methods you'll want to use:

String#downcase String#capitalize Array#include?

Also, here is the Rspec, I should have included that:

describe "Title" do
describe "fix" do
it "capitalizes the first letter of each word" do
  expect( Title.new("the great gatsby").fix ).to eq("The Great Gatsby")
end
it "works for words with mixed cases" do
  expect( Title.new("liTTle reD Riding hOOD").fix ).to eq("Little Red Riding Hood")
end
it "downcases articles" do
  expect( Title.new("The lord of the rings").fix ).to eq("The Lord of the Rings")
  expect( Title.new("The sword And The stone").fix ).to eq("The Sword and the Stone")
  expect( Title.new("the portrait of a lady").fix ).to eq("The Portrait of a Lady")
end
it "works for strings with all uppercase characters" do
  expect( Title.new("THE SWORD AND THE STONE").fix ).to eq("The Sword and the Stone")
end
end
end

Thank you @simone, I incorporated your suggestions:

class Title
attr_accessor :string

def initialize(string)
@string = string
end

IGNORE = %w(the of a and)

def fix
s = string.split(' ')
s.map do |word|
  words = word.downcase
  if IGNORE.include?(word)
    words
  else
    words.capitalize
  end
end
s.join(' ')
end
end

Although I'm still running into errors when running the code:

expected: "The Great Gatsby"
 got: "the great gatsby"

(compared using ==)

exercise_spec.rb:6:in `block (3 levels) in <top (required)>'

From my beginner's perspective, I cannot see what I'm doing wrong?

Final edit: I just wanted to say thanks for all the effort every one put in in assisting me earlier. I'll show the final working code I was able to produce:

class Title
attr_accessor :string

def initialize(string)
@string = string
end

def fix
word_list = %w{a of and the}

a = string.downcase.split(' ')
b = []

a.each_with_index do |word, index|
  if index == 0 || !word_list.include?(word)
    b << word.capitalize
  else
    b << word
  end
end
b.join(' ')
end
end

I do not like this "invent circle once again" tasks. This is why we have http://apidock.com/rails/String/titleize — Dawid Gosławski, Feb 02 '15 at 19:42
Are you sure `include?` is what you want? For example, should `Theodore` be upper or lowercase? Right now you are saying lowercase. — ptd, Feb 02 '15 at 19:43
don't replace the question with your answer, it renders the question useless — vol7ron, Feb 02 '15 at 22:42
See the highest-voted answer on this article: http://stackoverflow.com/questions/15078964/ruby-titleize-how-do-i-ignore-smaller-words-like-and-the-or-etc — Eddie Prislac, Feb 02 '15 at 22:46
@Matt is a self-admitted Ruby newbie. Recommending that he use `titleize` here reminds me of the argument that children don't need to understand arithmetic because they have calculators for that. — Cary Swoveland, Feb 03 '15 at 01:39
Title case is not computable, all you can achieve is an approximation. I'm just mentioning this because you asked about _proper_ title case. Here's an example: _Born on the 4th of July_, but _Keep On Rockin'_ — Ubik, Feb 03 '15 at 22:04

score 2 · Answer 1 · answered Feb 02 '15 at 19:56

2

Here's a possible solution.

class Title
  attr_accessor :string

  IGNORES = %w( the of a and )

  def initialize(string)
    @string = string
  end

  def fix
    tokens = string.split(' ')
    tokens.map do |token|
      token = token.downcase

      if IGNORES.include?(token)
        token
      else
        token.capitalize
      end
    end.join(" ")
  end

end

Title.new("a title of a book").fix

Your starting point was good. Here's a few improvements:

The comparison is always lower-case. This will simplify the if-condition
The list of ignored items is into an array. This will simplify the if-condition because you don't need an if for each ignored string (they could be hundreds)
I use a map to replace the tokens. It's a common Ruby pattern to use blocks with enumerations to loop over items

answered Feb 02 '15 at 19:56

Simone Carletti

173,507
49
363
364

1

This solution fails to always capitalize the first word of the title. – Ajedi32 Feb 02 '15 at 20:18
Further to @Ajedi32's comment, I suggest `first_word, remaining_words = string.split` and deal with `first_word` separately. You could also use an index for the word's offset, but I think what I suggested is simpler and clearer. – Cary Swoveland Feb 02 '15 at 20:41
Thanks for the note @Ajedi32. If that's the case, it's an easy fix, as suggested by Cary Swoveland. I'll leave to the user the exercise to fix it. – Simone Carletti Feb 02 '15 at 21:04
I forgot the splat: `*remaining_words`. – Cary Swoveland Feb 02 '15 at 23:29

vol7ron · Answer 2 · 2015-02-03T21:00:43.463

def fix
   string.downcase.split(/(\s)/).map.with_index{ |x,i| 
     ( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) ? x.capitalize : x 
   }.join
end

Meets all conditions:

a, is, of, the, and all lowercase
capitalizes all other words
all first words are capitalized

Explanation

string.downcase calls one operation to make the string you're working with all lower case
.split(/(\s)/) takes the lower case string and splits it on white-space (space, tab, newline, etc) into an array, making each word an element of the array; surrounding the \s (the delimiter) in the parentheses also retains it in the array that's returned, so we don't lose that white-space character when rejoining
.map.with_index{ |x,i| iterates over that returned array, where x is the value and i is the index number; each iteration returns an element of a new array; when the loop is complete you will have a new array
( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) if it's the first element in the array (index of 0), or the word matches a,is,of,the, or and -- that is, the match is not nil -- then x.capitalize (capitalize the word), otherwise (it did match the ignore words) so just return the word/value, x
.join take our new array and combine all the words into one string again

Additional

Ordinarily, what is inside parentheses in regex is considered a capture group, meaning that if the pattern inside is matched, a special variable will retain the value after the regex operations have finished. In some cases, such as the \s we wanted to capture that value, because we reuse it, in other cases like our ignore words, we need to match, but do not need to capture them. To avoid capturing a match you can pace ?: at the beginning of the capture group to tell the regex engine not to retain the value. There are many benefits of this that fall outside the scope of this answer.

@ptd hope you also downvoted the other answer as that also "fails to answer the question" ;) Also the question has been changed -- there is no question. I suggest not downvoting, as this is a legitimate answer with the appearance of it being incorrect. — vol7ron, Feb 02 '15 at 22:47
Considering that the OP says, "From my beginner's perspective...", don't you think you need to provide some explanation with your answer? — Cary Swoveland, Feb 03 '15 at 00:14
In title case, verbs (including is, was, be) are always capitalized. — Ubik, Feb 03 '15 at 22:07
@Ubik that may be, I didn't look up the rules, and he didn't give the spec. The answer isn't in stone though, he can make changes as he sees fit, but I'll update when I can, or you could as well ;) — vol7ron, Feb 03 '15 at 22:10

Cary Swoveland · Answer 3 · 2015-02-03T00:11:27.770

There are two ways you can approach this problem:

break the string into words, possibly modify each word and join the words back together; or
use a regular expression.

I will say something about the latter, but I believe your exercise concerns the former--which is the approach you've taken--so I will concentrate on that.

Split string into words

You use String#split(' ') to split the string into words:

str = "a title of a\t   book"
a = str.split(' ')
  #=> ["a", "title", "of", "a", "book"]

That's fine, even when there's extra whitespace, but one normally writes that:

str.split
  #=> ["a", "title", "of", "a", "book"]

Both ways are the same as

str.split(/\s+/)
  #=> ["a", "title", "of", "a", "book"]

Notice that I've used the variable a to signify that an array is return. Some may feel that is not sufficiently descriptive, but I believe it's better than s, which is a little confusing. :-)

Create enumerators

Next you send the method Enumerable#each_with_index to create an enumerator:

enum0 = a.each_with_index
  # => #<Enumerator: ["a", "title", "of", "a", "book"]:each_with_index>

To see the contents of the enumerator, convert enum0 to an array:

enum0.to_a
  #=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]

You've used each_with_index because the first word--the one with index 0-- is to be treated differently than the others. That's fine.

So far, so good, but at this point you need to use Enumerable#map to convert each element of enum0 to an appropriate value. For example, the first value, ["a", 0] is to be converted to "A", the next is to be converted to "Title" and the third to "of".

Therefore, you need to send the method Enumerable#map to enum0:

enum1 = enum.map
  #=> #<Enumerator: #<Enumerator: ["a", "title", "of", "a",
        "book"]:each_with_index>:map> 
enum1.to_a
  #=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]

As you see, this creates a new enumerator, which could think of as a "compound" enumerator.

The elements of enum1 will be passed into the block by Array#each.

Invoke the enumerator and join

You want to a capitalize the first word and all other words other than those that begin with an article. We therefore must define some articles:

articles = %w{a of it} # and more
  #=> ["a", "of", "it"]

b = enum1.each do |w,i|
  case i
  when 0 then w.capitalize
  else articles.include?(w) ? w.downcase : w.capitalize
  end
end
  #=> ["A", "Title", "of", "a", "Book"]

and lastly we join the array with one space between each word:

b.join(' ')
  => "A Title of a Book"

Review details of calculation

Let's go back to the calculation of b. The first element of enum1 is passed into the block and assigned to the block variables:

w, i = ["a", 0] #=> ["a", 0] 
w               #=> "a" 
i               #=> 0

so we execute:

case 0
when 0 then "a".capitalize
else articles.include?("a") ? "a".downcase : "a".capitalize
end

which returns "a".capitalize => "A". Similarly, when the next element of enum1 is passed to the block:

w, i = ["title", 1] #=> ["title", 1] 
w               #=> "title" 
i               #=> 1 

case 1
when 0 then "title".capitalize
else articles.include?("title") ? "title".downcase : "title".capitalize
end

which returns "Title" since articles.include?("title") => false. Next:

w, i = ["of", 2] #=> ["of", 2] 
w               #=> "of" 
i               #=> 2 

case 2
when 0 then "of".capitalize
else articles.include?("of") ? "of".downcase : "of".capitalize
end

which returns "of" since articles.include?("of") => true.

Chaining operations

Putting this together, we have:

str.split.each_with_index.map do |w,i|
  case i
  when 0 then w.capitalize
  else articles.include?(w) ? w.downcase : w.capitalize
  end
end
  #=> ["A", "Title", "of", "a", "Book"]

Alternative calculation

Another way to do this, without using each_with_index, is like this:

first_word, *remaining_words = str.split
first_word
  #=> "a" 
remaining_words
  #=> ["title", "of", "a", "book"] 

"#{first_word.capitalize} #{ remaining_words.map { |w|
  articles.include?(w) ? w.downcase : w.capitalize }.join(' ') }"
   #=> "A Title of a Book"

Using a regular expression

str = "a title of a book"

str.gsub(/(^\w+)|(\w+)/) do
  $1 ? $1.capitalize :
    articles.include?($2) ? $2 : $2.capitalize
end
  #=> "A Title of a Book"

The regular expression "captures" [(...)] a word at the beginning of the string [(^\w+)] or [|] a word that is not necessarily at the beginning of string [(\w+)]. The contents of the two capture groups are assigned to the global variables $1 and $2, respectively.

Therefore, stepping through the words of the string, the first word, "a", is captured by capture group #1, so (\w+) is not evaluated. Each subsequent word is not captured by capture group #1 (so $1 => nil), but is captured by capture group #2. Hence, if $1 is not nil, we capitalize the (first) word (of the sentence); else we capitalize $2 if the word is not an article and leave it unchanged if it is an article.

You are beyond helpful. This is such a thorough explanation for a beginner like me! — Matt White, Feb 03 '15 at 00:34

score 0 · Answer 4 · answered May 28 '15 at 03:11

Here is another possible solution to the problem

class Title
  attr_accessor :str
  def initialize(str)
   @str = str
  end

  def fix
    s = str.downcase.split(" ") #convert all the strings to downcase and it will be stored in an array
    words_cap = []
    ignore = %w( of a and the ) # List of words to be ignored
    s.each do |item|
      if ignore.include?(item) # check whether word in an array is one of the words in ignore list.If it is yes, don't capitalize. 
        words_cap << item

      else
        words_cap << item.capitalize
      end  
    end
    sentence = words_cap.join(" ") # convert an array of strings to sentence
    new_sentence =sentence.slice(0,1).capitalize + sentence.slice(1..-1) #Capitalize first word of the sentence. Incase it is not capitalized while checking the ignore list.
  end


end

Converting string to proper title case

4 Answers4

Explanation

Additional