13

I'm using MATLAB to programmatically create a Microsoft Word document on Windows. In general this solution works fine, but it is having trouble with non-ASCII text. For example, take this code:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
selection = wordApplication.Selection;
umbrella = char(9730);
disp(umbrella)
selection.TypeText(umbrella)

The Command Window displays the umbrella character correctly, but the character in the Word document is the "question mark in a box" missing character symbol. I can cut-and-paste the character from the Command Window into Word, so that character is indeed available in that font.

The TypeText method must be assuming ASCII. There are resources on how to set Unicode flags for similar operations from other languages, but I don't know how to translate them into the syntax I have available in MATLAB.

Clarification: My use case is sending an unknown Unicode string (char array), not just a single character. It would be ideal to be able to send it all at once. Here is better sample code:

% Define a string to send with a non-ASCII character.
umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
disp(toSend)

% Open a new Word document.
wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;

% Send the text.
selection = wordApplication.Selection;
selection.TypeText(toSend)

I was hoping I could simply set the encoding of the document itself, but this doesn't seem to help:

wordApplication = actxserver('Word.Application');
wordApplication.Visible = 1;
wordApplication.Documents.Add;
disp(wordApplication.ActiveDocument.TextEncoding)
wordApplication.ActiveDocument.TextEncoding = 65001;
disp(wordApplication.ActiveDocument.TextEncoding)
selection = wordApplication.Selection;
toSend = sprintf('Have you seen my \23002?');
selection.TypeText(toSend)
Matthew Simoneau
  • 6,199
  • 6
  • 35
  • 46

2 Answers2

9

Method 1. Valid for a single character (original question)

Taken from here:

umbrella = 9730; %// Unicode number of the desired character
selection.InsertSymbol(umbrella, '', true); %// true means use Unicode

The second argument specifies the font (so you could use 'Arial' etc), and '' apparently means use current font. The third argument 'true' means use Unicode.

Method 2. Valid for a single character (original question)

A less direct way, taken from here:

umbrella = 9730; %// Unicode number of the desired character
selection.TypeText(dec2hex(umbrella));
selection.ToggleCharacterCode;

Method 3. Valid for a string (edited question)

You can work with a string at once if you don't mind using the clipboard:

umbrella = char(9730);
toSend = ['Have you seen my ' umbrella '?'];
clipboard('copy', toSend); %// copy the Unicode string contained in variable `toSend`
selection.Paste %// paste it onto the Word document
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Upvoted! Thanks.The only downside with these is that I need to special-case the Unicode characters from the normal ASCII text. I'd like to send a string (MATLAB char array), which may or may not contain non-ASCII, in a reliable way. My sample code didn't make it clear this was my main use case, not just a single character. – Matthew Simoneau May 08 '15 at 22:06
  • 1
    I see. My method would require a loop to take one character at a time. There must be a way to get a Unicode string sent directly – Luis Mendo May 08 '15 at 22:08
  • 2
    @MatthewSimoneau I found another way that works for full strings, if you don't mind using the clipboard. See method 3 – Luis Mendo May 11 '15 at 22:17
  • Clever! I'd rather not clobber the clipboard content, though. Maybe I can save the content and restore it. – Matthew Simoneau May 13 '15 at 18:37
  • 1
    Just a thought regarding "Method 3": it quite simple to store the current clipboard contents in a temporary variable (assuming it's a string; via `str = clipboard('paste')`) and put back the original string into the clipboard after you're done pasting whatever you wanted into Word. No idea how to do this with generic non-string clipboard contents though.... – Dev-iL May 14 '15 at 11:28
  • @Dev-iL Good idea. I also thought about that, but as you say it only works for strings – Luis Mendo May 14 '15 at 16:37
4

I tried this as well, and got the same issue you reported (I tested with MATLAB R2015a and Office 2013)...

I think something in the COM layer between MATLAB and Word is messing up the text encoding.

To confirm this is indeed a bug in MATLAB, I tried the same in Python, and it worked fine:

#!/usr/bin/env python

import os
import win32com.client

word = win32com.client.Dispatch("Word.Application")
word.Visible = True

doc = word.Documents.Add()

str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)

fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()

word.Quit()

I came up with two workarounds for MATLAB:

Method 1 (preferred):

The idea is to create a .NET assembly that uses Office Interop. It would receive any Unicode string and write it to some specified Word document. This assembly can then be loaded in MATLAB and used as a wrapper against MS Office.

Example in C#:

MSWord.cs

using System;
using Microsoft.Office.Interop.Word;

namespace MyOfficeInterop
{
    public class MSWord
    {
        // this is very basic, but you can expose anything you want!
        public void AppendTextToDocument(string filename, string str)
        {
            Application app = null;
            Document doc = null;
            try
            {
                app = new Application();
                doc = app.Documents.Open(filename);

                app.Selection.TypeText(str);
                app.Selection.TypeParagraph();

                doc.Save();
            }
            catch (Exception)
            {
                throw;
            }
            finally
            {
                doc.Close();
                app.Quit();
            }
        }
    }
}

We compile it first:

csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs

Then we test it from MATLAB:

%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')

%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));

%// some text
str = ['Have you seen my ' char(9730) '?'];

%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);

Method 2:

This is more of a hack! We simply write the text in MATLAB directly to a text file (encoded correctly). Then we use COM/ActiveX interface to open it in MS Word, and re-save it as a proper .docx Word document.

Example:

%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];

%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);

%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9'));         % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000'));      % WdSaveOptions

%// start MS Word 
Word = actxserver('Word.Application');
%Word.Visible = true;

%// open text file in MS Word
doc = Word.Documents.Open(...
    fnameTXT, ...                % FileName
    [], ...                      % ConfirmConversions
    [], ...                      % ReadOnly
    [], ...                      % AddToRecentFiles
    [], ...                      % PasswordDocument
    [], ...                      % PasswordTemplate
    [], ...                      % Revert
    [], ...                      % WritePasswordDocument
    [], ...                      % WritePasswordTemplate
    wdOpenFormatUnicodeText, ... % Format
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % Visible
    [], ...                      % OpenAndRepair
    [], ...                      % DocumentDirection
    [], ...                      % NoEncodingDialog
    []);                         % XMLTransform

%// save it as docx
doc.SaveAs2(...
    fnameDOCX, ...               % FileName
    wdFormatDocumentDefault, ... % FileFormat
    [], ...                      % LockComments
    [], ...                      % Password
    [], ...                      % AddToRecentFiles
    [], ...                      % WritePassword
    [], ...                      % ReadOnlyRecommended
    [], ...                      % EmbedTrueTypeFonts
    [], ...                      % SaveNativePictureFormat
    [], ...                      % SaveFormsData
    [], ...                      % SaveAsAOCELetter
    msoEncodingUTF8, ...         % Encoding
    [], ...                      % InsertLineBreaks
    [], ...                      % AllowSubstitutions
    [], ...                      % LineEnding
    [], ...                      % AddBiDiMarks
    []),                         % CompatibilityMode

%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word
Amro
  • 123,847
  • 25
  • 243
  • 454
  • Using Python to demonstrate that this is fundamentally a MATLAB COM layer bug is very helpful, Thanks! – Matthew Simoneau May 18 '15 at 14:51
  • Lost of knowledge here! And it's interesting that it's a Matlab bug – Luis Mendo May 18 '15 at 22:34
  • A third option I forget to mention is to give up working with COM/ActiveX, and use the [Open XML SDK](https://msdn.microsoft.com/en-us/library/office/bb448854.aspx) instead (which was [open-sourced](https://github.com/OfficeDev/Open-XML-SDK) by Microsoft). This enables direct manipulation of XML-based Office files even cross-platforms (using Mono, but that doesn't apply to MATLAB case). There are also other libraries like [Apache POI](https://en.wikipedia.org/wiki/Apache_POI) (pure Java library) which you can invoke inside MATLAB. – Amro May 19 '15 at 07:12
  • I've created a MATLAB chat room for us to discuss things MATLAB related, or for discussions that span beyond the limitations of a single comment. Visit us when you have time! - http://chat.stackoverflow.com/rooms/81987/matlab – rayryeng Jun 30 '15 at 18:04