How to access TextBox object text in Word document using Ruby WIN32OLE

Question

I just put together a small script for a team of users that collects all PDF and DOC* files in a directory and parses them for hyperlinks. The PDF section works as intended, however a difference between the Word doc I was given for design (plain text) differs from the actual Word documents that they are using (text is in a TextBox element).

I noticed that when I tried to gather sentences/words from these new files, all I received was the text for the background image of the file (normally a special character).

I have browsed through the API and tried quite a few methods listed in ole_methods, but have not yet found a way to access the TextBox to pull the required text out of it.

I know that I can convert the Word files to PDF and shortcut it that way (tested and proven), but that entails quite a bit of file management that I'd like to avoid in lieu of the simpler solution: access the text.

You can replicate the element in a document using the Draw Text Box function (Word 2007+).

Does anyone know how to access this element, or better yet find ALL text in the document regardless of what element it is located in?

require 'win32ole'
word = WIN32OLE.new('Word.Application')
doc = word.Documents.Open(file)
doc.Sentences.each { |x| puts x.text }

Adam

score 3 · Accepted Answer · answered Dec 12 '10 at 09:45

3

Assuming that something equivalent to doc.Sentences.each { |x| puts x.text } but for textboxes will suffice, then this should work for you:

doc.Shapes.each do |x|
  puts x.TextFrame.TextRange.text
end

It looks quite a bit messier than how you went through the sentences, but the x.TextFrame.TextRange.text will return the actual text contained in the text boxes.

answered Dec 12 '10 at 09:45

Paul Hoffer

12,606
6
28
37

That worked, thank you! doc.Shapes.each { |s| puts s.TextFrame.TextRange.text } – adam reed Dec 12 '10 at 18:54

How to access TextBox object text in Word document using Ruby WIN32OLE

1 Answers1