1. Export to plaintext CSV
If all you're trying to do is extract data from Excel to use elsewhere, as opposed to capturing Excel formulas and formatting, then you probably should not try to read the .xls file. XLS is a complex format. It's good for Excel, not for general data interchange.
Similarly, you probably don't need to use AppleScript or anything else to integrate with Excel, if all you want to do is save the data as plaintext. Excel already knows how to save data as plaintext. Just use Excel's "Save As" command. (That's what it's called on the Mac. I don't know about PCs.)
The question is then what plaintext format to use. One obvious choice for this is a plaintext comma-separated value file (CSV) because it's a simple de facto standard (as opposed to a complex official standard like XML). This will make it easy to consume in Swift, or in any other language.
2. Export in UTF-8 encoding if possible, otherwise as UTF-16
So how do you do that exactly? Plaintext is wonderfully simple, but one subtlety that you need to keep track of is the text encoding. A text encoding is a way of representing characters in a plaintext file. Unfortunately, you cannot reliably tell the encoding of a file just by inspecting the file, so you need to choose an encoding when you save it and remember to use that encoding when you read it. If you mess this up, accented characters, typographer's quotation marks, dashes, and other non-ASCII characters will get mangled. So what text encoding should you use? The short answer is, you should always use UTF-8 if possible.
But if you're working with an older version of Excel, then you may not be able to use UTF-8. In that case, you should use UTF-16. In particular, UTF-16 is, I believe, the only export option in Excel 2011 for Mac which produces a predictable result which will not depend in surprising ways on obscure locale settings or Microsoft-specific encodings.
So if you're on Excel 2011 for Mac, for instance, choose "UTF-16 Unicode Text" from Excel's Save As command.
This will cause Excel to save the file so that every row is a line of text, and every column is separated by a tab character. (So technically, this is a tab-separated value files, rather than a comma-separated value file.)
3. Import with Swift
Now you have a plaintext file, which you know was saved in a UTF-8 (or UTF-16) encoding. So now you can read it and parse it in Swift.
If your Excel data is complicated, you may need a full-featured CSV parser. The best choice is probably CHCSVParser.
Using CHCSV, you can parse the file with the following code:
NSURL * const inputFileURL = [NSURL fileURLWithPath:@"/path/to/exported/file.txt"];
unichar tabCharacter = '\t';
NSArray *rows = [NSArray arrayWithContentsOfCSVFile:inputFilePath options:CHCSVParserOptionsSanitizesFields
delimiter:tabCharacter];
(You could also call it from Swift, of course.)
On the other hand, if you're data is relatively simple (for instance, it has no escaped characters), then you might not need to use an external library at all. You can write some Swift code that parses tab-separated values just by reading in the file as a string, splitting on newlines, and then splitting on tabs.
This function will take a String
representing TSV data and return an array of dictionaries:
/**
Reads a multiline, tab-separated String and returns an Array<NSictionary>, taking column names from the first line or an explicit parameter
*/
func JSONObjectFromTSV(tsvInputString:String, columnNames optionalColumnNames:[String]? = nil) -> Array<NSDictionary>
{
let lines = tsvInputString.componentsSeparatedByString("\n")
guard lines.isEmpty == false else { return [] }
let columnNames = optionalColumnNames ?? lines[0].componentsSeparatedByString("\t")
var lineIndex = (optionalColumnNames != nil) ? 0 : 1
let columnCount = columnNames.count
var result = Array<NSDictionary>()
for line in lines[lineIndex ..< lines.count] {
let fieldValues = line.componentsSeparatedByString("\t")
if fieldValues.count != columnCount {
// NSLog("WARNING: header has %u columns but line %u has %u columns. Ignoring this line", columnCount, lineIndex,fieldValues.count)
}
else
{
result.append(NSDictionary(objects: fieldValues, forKeys: columnNames))
}
lineIndex = lineIndex + 1
}
return result
}
So you only need to read the file into a string and pass it to this function. That snippet comes from this gist for a tsv-to-json converter. And if you need to know more about which text encodings Microsoft products produce, and which ones Cocoa can auto-detect, then this repo on text encoding contains the research on export specimens which led to the conclusion that UTF-16 is the way to go for old Microsoft products on the Mac.
(I realize I'm linking to my own repos here. Apologies?)