I tried this as well, and got the same issue you reported (I tested with MATLAB R2015a and Office 2013)...
I think something in the COM layer between MATLAB and Word is messing up the text encoding.
To confirm this is indeed a bug in MATLAB, I tried the same in Python, and it worked fine:
#!/usr/bin/env python
import os
import win32com.client
word = win32com.client.Dispatch("Word.Application")
word.Visible = True
doc = word.Documents.Add()
str = u"Have you seen my " + unichr(9730) + u"?"
word.Selection.TypeText(str)
fname = os.path.join(os.getcwd(), "out.docx")
doc.SaveAs2(fname)
doc.Close()
word.Quit()
I came up with two workarounds for MATLAB:
Method 1 (preferred):
The idea is to create a .NET assembly that uses Office Interop. It would receive any Unicode string and write it to some specified Word document.
This assembly can then be loaded in MATLAB and used as a wrapper against MS Office.
Example in C#:
MSWord.cs
using System;
using Microsoft.Office.Interop.Word;
namespace MyOfficeInterop
{
public class MSWord
{
// this is very basic, but you can expose anything you want!
public void AppendTextToDocument(string filename, string str)
{
Application app = null;
Document doc = null;
try
{
app = new Application();
doc = app.Documents.Open(filename);
app.Selection.TypeText(str);
app.Selection.TypeParagraph();
doc.Save();
}
catch (Exception)
{
throw;
}
finally
{
doc.Close();
app.Quit();
}
}
}
}
We compile it first:
csc.exe /nologo /target:library /out:MyOfficeInterop.dll /reference:"C:\Program Files (x86)\Microsoft Visual Studio 12.0\Visual Studio Tools for Office\PIA\Office15\Microsoft.Office.Interop.Word.dll" MSWord.cs
Then we test it from MATLAB:
%// load assembly
NET.addAssembly('C:\path\to\MyOfficeInterop.dll')
%// I am assuming the document file already exists
fname = fullfile(pwd,'test.docx');
fclose(fopen(fname,'w'));
%// some text
str = ['Have you seen my ' char(9730) '?'];
%// add text to Word document
word = MyOfficeInterop.MSWord();
word.AppendTextToDocument(fname, str);
Method 2:
This is more of a hack! We simply write the text in MATLAB directly to a text file (encoded correctly). Then we use COM/ActiveX interface to open it in MS Word, and re-save it as a proper .docx Word document.
Example:
%// params
fnameTXT = fullfile(pwd,'test.txt');
fnameDOCX = fullfile(pwd,'test.docx');
str = ['Have you seen my ' char(9730) '?'];
%// create UTF-8 encoded text file
bytes = unicode2native(str, 'UTF-8');
fid = fopen(fnameTXT, 'wb');
fwrite(fid, bytes);
fclose(fid);
%// some office interop constants (extracted using IL DASM)
msoEncodingUTF8 = int32(hex2dec('0000FDE9')); % MsoEncoding
wdOpenFormatUnicodeText = int32(hex2dec('00000005')); % WdOpenFormat
wdFormatDocumentDefault = int32(hex2dec('00000010')); % WdSaveFormat
wdDoNotSaveChanges = int32(hex2dec('00000000')); % WdSaveOptions
%// start MS Word
Word = actxserver('Word.Application');
%Word.Visible = true;
%// open text file in MS Word
doc = Word.Documents.Open(...
fnameTXT, ... % FileName
[], ... % ConfirmConversions
[], ... % ReadOnly
[], ... % AddToRecentFiles
[], ... % PasswordDocument
[], ... % PasswordTemplate
[], ... % Revert
[], ... % WritePasswordDocument
[], ... % WritePasswordTemplate
wdOpenFormatUnicodeText, ... % Format
msoEncodingUTF8, ... % Encoding
[], ... % Visible
[], ... % OpenAndRepair
[], ... % DocumentDirection
[], ... % NoEncodingDialog
[]); % XMLTransform
%// save it as docx
doc.SaveAs2(...
fnameDOCX, ... % FileName
wdFormatDocumentDefault, ... % FileFormat
[], ... % LockComments
[], ... % Password
[], ... % AddToRecentFiles
[], ... % WritePassword
[], ... % ReadOnlyRecommended
[], ... % EmbedTrueTypeFonts
[], ... % SaveNativePictureFormat
[], ... % SaveFormsData
[], ... % SaveAsAOCELetter
msoEncodingUTF8, ... % Encoding
[], ... % InsertLineBreaks
[], ... % AllowSubstitutions
[], ... % LineEnding
[], ... % AddBiDiMarks
[]), % CompatibilityMode
%// close doc, quit, and cleanup
doc.Close(wdDoNotSaveChanges, [], [])
Word.Quit()
clear doc Word