8

My current workflow involves using Applescript to essentially delimit Excel data and format it into plain text files. We're pushing towards an all Swift environment, but I haven't yet found any sort of kits for parsing my Excel data into Swift.

The only thing I can think of is to use C or something and wrap it, but that's not ideal. Any better suggestions for parsing this data for use in Swift?

The goal is to eliminate Applescript, but I'm not sure if that will be possible while still interacting with Excel files. Scripting Excel via Applescript seems to be the only method.

EDIT: I don't have the option of eliminating Excel from this workflow. This is how the data will be coming to the application, thus I have to include it.

Being able to streamline the process of parsing this data then processing it will be paramount. I know Applescript has been good in the past with helping me to process it; however, it's getting a little too closed-off for me.

I've been looking at writing something in Swift/Cocoa, but that still might require the data to be extracted with an Applescript, right?

A big plus for pushing Swift is the readability. I don't know Objective-C all that well, and swift would be an easier transition, I feel.

My workflow on PC has been using the COM object, which as has been said, isn't available in the Mac Excel app. I'm only looking for data extraction at this point. Some previous apps did processing within the app, but I'm looking to make this very self-contained, thus all processing within the app I'm developing. Once the data is extracted from the .XLS or .XLSX files, I'll be doing some text editing via RegEx and perhaps a little number crunching. Nothing too crazy. As of now, it will run on the client side, but I'm looking to extend this to a server process.

pianoman
  • 815
  • 2
  • 16
  • 36
  • 1
    Are there not better command line utilities for parsing Excel files? I'd look there. You don't say enough about your workflow to say more. – jweaks Aug 18 '15 at 21:12
  • See updated/edited question. – pianoman Aug 20 '15 at 02:02
  • 1
    If the data is pulled from Excel and not from an Excel file, then you can simply pull the data from Excel using Applescript or VBA script in any useful format you want. Not understanding. – jweaks Aug 20 '15 at 02:40
  • Is it possible to parse the file, not using the Excel application? I'm not really familiar with how Mac does this, as I am coming from a PC background. – pianoman Aug 21 '15 at 01:08
  • 1
    A quick search shows that OpenOffice includes command line utility "unoconv" that can convert simple excel files, and that perl includes a "perl -MSpreadsheet::XLSX" function to work directly with files as well. But, I still don't see what your workflow is. Is this an app you have that is running that opens a spreadsheet? Is this a one-time conversion of files you're needing to do? – jweaks Aug 21 '15 at 04:37
  • 1
    Ditto the libraries [I've already mentioned](http://stackoverflow.com/a/32105442/2298645). And ditto for you need to tell us a bit more about what you're doing with these Excel files if you want to get better advice. Are you only extracting data out of them, and if so are you just getting simple cell values or do you need more advanced information such as text styles or joins? Or are you manipulating their contents as well? Which file format(s) do you use: .xsl, .xslx, or both? – foo Aug 21 '15 at 14:50
  • 1
    One Mac/PC difference you should be aware of is that Mac Excel (unlike PC Excel) isn't available as a COM component that you can embed directly within your own app. You can only have your app talk to Excel.app over Apple event IPC ("AppleScript"). So if that's how you were hoping to do it on Mac, you're SOOL. Still, as I've already said, if you're only extracting basic data from .xsl[x] files, you probably don't need Excel.app anyway as there are plenty third-party libraries which should do that job just as well. – foo Aug 21 '15 at 14:57
  • Yes, to that end... my workflow on PC has been using the COM object, which as you said, isn't available in the Mac Excel app. I'm only looking for data extraction at this point. Some previous apps did processing within the app, but I'm looking to make this very self-contained, thus all processing within the app I'm developing. Once the data is extracted from the .XLS or .XLSX files, I'll be doing some text editing via RegEx and perhaps a little number crunching. Nothing too crazy. As of now, it will run on the client side, but I'm looking to extend this to a server process. – pianoman Aug 23 '15 at 18:40

5 Answers5

6

In Mac OS X 10.6 Snow Leopard Apple introduced the AppleScriptObjC framework which makes it very easy to interact between Cocoa and AppleScript. AppleScript code and a Objective-C like syntax can be used in the same source file. It's much more convenient than Scripting Bridge and NSAppleScript.

AppleScriptObjC cannot be used directly in Swift because the command loadAppleScriptObjectiveCScripts of NSBundle is not bridged to Swift.

However you can use a Objective-C bridge class for example

ASObjC.h

@import Foundation;
@import AppleScriptObjC;

@interface NSObject (Excel)
- (void)openExcelDocument:(NSString *)filePath;
- (NSArray *)valueOfUsedRange;

@end

@interface ASObjC : NSObject

+ (ASObjC *)sharedASObjC;

@property id Excel;

@end

ASObjC.m

#import "ASObjC.h"

@implementation ASObjC

+ (void)initialize
{
    if (self == [ASObjC class]) {
        [[NSBundle mainBundle] loadAppleScriptObjectiveCScripts];
    }
}

+ (ASObjC *)sharedASObjC
{
    static id sharedInstance = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        sharedInstance = [[ASObjC alloc] init];
    });

    return sharedInstance;
}

- (instancetype)init
{
    self = [super init];
    if (self) {
        _Excel = NSClassFromString(@"ASExcel");
    }
    return self;
}

@end

Create a AppleScript source file form the AppleScriptObjC template

ASExcel.applescript

script ASExcel
  property parent: class "NSObject"

  on openExcelDocument:filePath
    set asFilePath to filePath as text
    tell application "Microsoft Excel"
      set sourceBook to open workbook workbook file name asFilePath
      repeat
        try
          get workbooks
          return
        end try
        delay 0.5
      end repeat
    end tell
  end openDocument

  on valueOfUsedRange()
    tell application "Microsoft Excel"
      tell active sheet
        set activeRange to used range
        return value of activeRange
      end tell
    end tell
  end valueOfUsedRange

end script

Link to the AppleScriptObjC framework if necessary.
Create the Bridging Header and import ASObjC.h

Then you can call AppleScriptObjC from Swift with

 ASObjC.sharedASObjC().Excel.openExcelDocument("Macintosh HD:Users:MyUser:Path:To:ExcelFile.xlsx")

or

let excelData = ASObjC.sharedASObjC().Excel.valueOfUsedRange() as! Array<[String]>
vadian
  • 274,689
  • 30
  • 353
  • 361
  • This is very cool, @vadian. Do you know if it's straightforward to use the AppleScriptObjC framework in the other direction, so that invoking an AppleScript command causes an ObjC method (and then possibly a Swift method) to execute? Seems like the main benefits of using AppleScript instead of just dumping a CSV file (like my answer suggested), are first that with AppleScript you can capture the range of selected cells, and second that you might preserve more type information, like string vs number vs date. The workflow would also be better if you could invoke the operation from within Excel. – algal Sep 04 '15 at 17:18
  • The easiest way is to create a ObjC or Swift class and instantiate this class with an object (blue cube) in Interface Builder. Then declare a property `property MyClass: class "MyClass"` in the `.applescript` source file and call methods with `MyClass's method()`. The methods in the ObjC/Swift class must be class methods in this case. Otherwise `alloc` and `init` the class normally as in ObjC – vadian Sep 04 '15 at 17:28
  • possibilities in swift? – Dylan Feb 09 '17 at 14:08
  • @dylan The answer **are** the possibilities in Swift. – vadian Feb 09 '17 at 14:11
3

It's somewhat unclear if you're trying to eliminate Excel as a dependency (which is not unreasonable: it costs money and not everyone has it) or AppleScript as a language (totally understandable, but a bad practical move as Apple's alternatives for application automation all suck).

There are third-party Excel-parsing libraries available for other languages, e.g. I've used Python's openpyxl (for .xlsx files) and xlrd (for .xsl) libraries successfully in my own projects. And I see through the magicks of Googles that someone's written an ObjC framework, DHlibxls, which [assuming no dynamic trickery] should be usable directly from Swift, but I've not used it myself so can't tell you anything more.

foo
  • 3,171
  • 17
  • 18
  • 1
    From your post, it sounds like all you're doing is a simple extraction of cell values from .xsl/.xslx files, and a third-party library will do that just fine, allowing you to ditch both AppleScript and Excel.app dependencies. I've been doing it myself for years without problem. OTOH, if your workflow is modifying spreadsheets or performing complex Excel queries then you should definitely stick to automating Excel via AppleScript. [Here's a detailed explanation](http://macscripter.net/viewtopic.php?id=43127) of how to use AppleScript-ObjC with Swift. – foo Aug 20 '15 at 14:51
2

You can use ScriptingBridge or NSAppleScript to interact with Apple Scriptable stuff

ScriptingBridge can generate a header file from the Apple Script dictionary.

NSAppleScript can execute any AppleScript for you by passing a String

Kametrixom
  • 14,673
  • 7
  • 45
  • 62
  • I wrote an [article](http://brightdigit.com/blog/17/06/09/scriptingbridge-with-swift-communicating-with-apps-using-applescript-and-swift/) on how to do this with ScriptBridge without having to deal with any Objective-C or AppleScript. – leogdion Jun 09 '17 at 20:13
1

1. Export to plaintext CSV

If all you're trying to do is extract data from Excel to use elsewhere, as opposed to capturing Excel formulas and formatting, then you probably should not try to read the .xls file. XLS is a complex format. It's good for Excel, not for general data interchange.

Similarly, you probably don't need to use AppleScript or anything else to integrate with Excel, if all you want to do is save the data as plaintext. Excel already knows how to save data as plaintext. Just use Excel's "Save As" command. (That's what it's called on the Mac. I don't know about PCs.)

The question is then what plaintext format to use. One obvious choice for this is a plaintext comma-separated value file (CSV) because it's a simple de facto standard (as opposed to a complex official standard like XML). This will make it easy to consume in Swift, or in any other language.

2. Export in UTF-8 encoding if possible, otherwise as UTF-16

So how do you do that exactly? Plaintext is wonderfully simple, but one subtlety that you need to keep track of is the text encoding. A text encoding is a way of representing characters in a plaintext file. Unfortunately, you cannot reliably tell the encoding of a file just by inspecting the file, so you need to choose an encoding when you save it and remember to use that encoding when you read it. If you mess this up, accented characters, typographer's quotation marks, dashes, and other non-ASCII characters will get mangled. So what text encoding should you use? The short answer is, you should always use UTF-8 if possible.

But if you're working with an older version of Excel, then you may not be able to use UTF-8. In that case, you should use UTF-16. In particular, UTF-16 is, I believe, the only export option in Excel 2011 for Mac which produces a predictable result which will not depend in surprising ways on obscure locale settings or Microsoft-specific encodings.

So if you're on Excel 2011 for Mac, for instance, choose "UTF-16 Unicode Text" from Excel's Save As command.

This will cause Excel to save the file so that every row is a line of text, and every column is separated by a tab character. (So technically, this is a tab-separated value files, rather than a comma-separated value file.)

3. Import with Swift

Now you have a plaintext file, which you know was saved in a UTF-8 (or UTF-16) encoding. So now you can read it and parse it in Swift.

If your Excel data is complicated, you may need a full-featured CSV parser. The best choice is probably CHCSVParser.

Using CHCSV, you can parse the file with the following code:

NSURL * const inputFileURL = [NSURL fileURLWithPath:@"/path/to/exported/file.txt"];
unichar tabCharacter = '\t';
NSArray *rows = [NSArray arrayWithContentsOfCSVFile:inputFilePath options:CHCSVParserOptionsSanitizesFields
                                          delimiter:tabCharacter];

(You could also call it from Swift, of course.)

On the other hand, if you're data is relatively simple (for instance, it has no escaped characters), then you might not need to use an external library at all. You can write some Swift code that parses tab-separated values just by reading in the file as a string, splitting on newlines, and then splitting on tabs.

This function will take a String representing TSV data and return an array of dictionaries:

/**
Reads a multiline, tab-separated String and returns an Array<NSictionary>, taking column names from the first line or an explicit parameter
*/
func JSONObjectFromTSV(tsvInputString:String, columnNames optionalColumnNames:[String]? = nil) -> Array<NSDictionary>
{
  let lines = tsvInputString.componentsSeparatedByString("\n")
  guard lines.isEmpty == false else { return [] }

  let columnNames = optionalColumnNames ?? lines[0].componentsSeparatedByString("\t")
  var lineIndex = (optionalColumnNames != nil) ? 0 : 1
  let columnCount = columnNames.count
  var result = Array<NSDictionary>()

  for line in lines[lineIndex ..< lines.count] {
    let fieldValues = line.componentsSeparatedByString("\t")
    if fieldValues.count != columnCount {
      //      NSLog("WARNING: header has %u columns but line %u has %u columns. Ignoring this line", columnCount, lineIndex,fieldValues.count)
    }
    else
    {
      result.append(NSDictionary(objects: fieldValues, forKeys: columnNames))
    }
    lineIndex = lineIndex + 1
  }
  return result
}

So you only need to read the file into a string and pass it to this function. That snippet comes from this gist for a tsv-to-json converter. And if you need to know more about which text encodings Microsoft products produce, and which ones Cocoa can auto-detect, then this repo on text encoding contains the research on export specimens which led to the conclusion that UTF-16 is the way to go for old Microsoft products on the Mac.

(I realize I'm linking to my own repos here. Apologies?)

algal
  • 27,584
  • 13
  • 78
  • 80
  • In order to keep the workflow automated, I'll have to do something with the .xls files programmatically. Having them opened and resaved client-side wouldn't be an option. Additionally, this is the format we're given. I'm leaning towards a script to generate XML, and just incorporate that Applescript into my main application. – pianoman Sep 11 '15 at 22:43
  • So you need to do automated processing of .xls files on OS X? How automated? Can you script the MS Excel GUI, or does this need to be able to run as a process without a window? – algal Sep 11 '15 at 23:24
  • I suppose I could Applescript Excel for this, but I was really trying to discover if there's any ways besides doing this. I'm very familiar with the associated Applescript method, but what other options exist if nixing the AS method? – pianoman Sep 16 '15 at 01:39
  • If you need to get data out of XLS files and you cannot open the Excel GUI, then you've answered your own question. You need a command line tool that can read XLS. As MS does not provide one for OS X, then you'll have to rely on a third (fourth?) party library. Here's one: http://sourceforge.net/p/libxls/code/HEAD/tree/ . And I'd expect Python Ruby etc have libraries that would help making an XLS to csv converter easy. You could even embed that in and call it from a cocoa app I'd need be. – algal Sep 16 '15 at 02:41
0

There is no need to export Excel files to CSV for Swift as you can use an existing open-source library for parsing XLSX files. If you use CocoaPods or Swift Package Manager for integrating 3rd-party libraries, CoreXLSX supports those. After the library is integrated, you can use it like this:

import CoreXLSX

guard let file = XLSXFile(filepath: "./file.xlsx") else {
  fatalError("XLSX file corrupted or does not exist")
}

for path in try file.parseWorksheetPaths() {
  let ws = try file.parseWorksheet(at: path)
  for row in ws.sheetData.rows {
    for c in row.cells {
      print(c)
    }
  }
}

This will open file.xlsx and print all cells within that file. You can also filter cells by references and access only cell data that you need for your automation.

Max Desiatov
  • 5,087
  • 3
  • 48
  • 56