4

Background

The main application where I work is based heavily on the MUMPS-esque Caché database engine from InterSystems. Everything is stored in global arrays. Getting data out of the system for external reporting ranges from simply being a pain to being egregiously slow and painful.

Caché provides an ODBC driver for the database but unless the global arrays involved happen to be keyed by the selection criteria, it resorts to scans and a simple query will take hours to run. For scale, the entire Caché production namespace is about 100GB. I can write ObjectScript (Intersystems' dialect of MUMPS) programs that pull data much faster than the ODBC driver in these cases.

Part of the problem I think is that the application vendor doesn't use Caché's object persistence support but instead has the SQL tables defined as a façade over the global arrays, and it often doesn't work well for batch requests.

I built a reporting database in MS SQL Server that pulls the most common data (2.5GB worth) and even if it has to scan every table, all results are returned within 3 seconds. Unfortunately, it takes a long time to refresh the data so I can only do a full refresh once a week and an active refresh once a day. This is good enough for most needs, but I want to do better.

I'm on Caché 2007, SQL Server 2008 R2, VS2010 on Windows 7 and Windows Server 2008 R2.

Scope of question

I need a way to integrate live data from the source Caché database with other data on SQL Server. I want to be able to integrate views or table valued functions into a SQL query and have it pull live data from the source db.

  • The live data must be available within SQL Server for processing. Doing it with a secondary application would be a huge pain and wouldn't work with reporting tools that just expect to push a query over ODBC and get a final dataset in the right format.

  • I understand that there are ways to get data into SQL Server or accomplish the same general things I want to do. That's not what this question is about.

  • The data needs to come from ObjectScript programs run on Caché since not all the data I need is exposed through the SQL defined tables and I get the control I need to make the performance usable with ObjectScript.

  • I am looking for advice on any new options or how I can improve one of the options I've tried or considered or other pros or cons for those approaches.

What I've tried so far

This project has been an exercise in frustration where each promising avenue I've looked into is either terrible or doesn't work for some reason. Often the reason is some unnecessary restriction on SQLCLR assemblies.

  1. Pulling everything through InterSystem's Caché ODBC driver via a linked server. SQL Server often resorts to scans if it can't push conditions to the remote server or has to perform a join locally. A scan of any nontrivial table takes many hours and is unacceptable. Also, the length of many columns is incorrectly defined by the SQL table definitions in Caché; SQL Server doesn't like that and aborts the query. See this SO question. I can't change the table defs and the vendor doesn't think it's a problem because it works with MS Access.

  2. Using OPENQUERY on demand. This works to some extent but I can still have the column length problem from the previous item and there's no way to parameterize OPENQUERY queries so that makes it pretty useless to pull contextual data.

  3. Using SQLCLR to call the ODBC data provider through CLR table valued functions. This takes care of the parameterization and data length issues, although it does require me to define or modify a function each time I need a new piece of data. Unfortunately, not all the data elements I'm interested in are available through SQL. For some things, I need to access the global arrays directly.

  4. Intersystems provides an ActiveX control that lets you run ObjectScript programs over TCP on the server and get the results. This works great in a stand-alone C# app but as soon as I try to make a connection from a SQLCLR assembly I get a ridiculous URI error:

    A .NET Framework error occurred during execution of user-defined routine or aggregate "GetActiveAccounts": System.UriFormatException: Invalid URI: The URI is empty. System.UriFormatException: at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind) at System.Uri..ctor(String uriString) at System.ComponentModel.Design.RuntimeLicenseContext.GetLocalPath(String fileName) at System.ComponentModel.Design.RuntimeLicenseContext.GetSavedLicenseKey(Type type, Assembly resourceAssembly) at System.ComponentModel.LicenseManager.LicenseInteropHelper.GetCurrentContextInfo(Int32& fDesignTime, IntPtr& bstrKey, RuntimeTypeHandle rth) at FacsAccess.GetActiveAccounts.Client.connect() at FacsAccess.GetActiveAccounts.Client..ctor() at FacsAccess.GetActiveAccounts.E1.GetEnumerator()

    See this unanswered SO question. There are other postings about it on the net but no one seems to have a clue. This is an extremely simple COM wrapper over a C++ DLL; it's not doing anything with licensing and has no reason to be in the managed licensing libraries. I wonder if this is some kind of boilerplate that's trying to get the name for an assembly that doesn't have a name because it's been loaded into the SQL database.

  5. Intersystems also provides a more direct unmanaged interface but those interfaces are all C++, which I can't use through P/Invoke and I can't load a C++/CLI mixed mode impure assembly in SQLCLR.

Options I've considered but seem kind of terrible

  1. I've considered trying the ActiveX control through SQL Server's COM support but that's terribly slow and really cumbersome.

  2. I could create an out of process service to proxy the traffic but I can't use .NET remoting from SQLCLR and you're not supposed to use WCF and it would be really heavy weight for a such a simple interface anyway. I'd sooner roll my own IPC interface.

  3. I could write some kind of extra unmanaged wrapper with a C style interface for the VisM or CacheDirect interfaces and access THAT through P/Invoke.

It doesn't seem like this should be so hard but it's really driving me up the wall and I need some perspective.

Community
  • 1
  • 1
Chris Smith
  • 5,326
  • 29
  • 29

1 Answers1

2

I think you can use ODBC via a linked server accessing stored procedures on the Cache database that are visible to the ODBC driver and that return result sets, but are not implemented using SQL.

I am 100% certain you can create such stored procedures and access them via ODBC, but I have never tried accessing them from SQL server as a linked server. Even if the linked server doesn't work, it seems like it would be preferable to access via the Intersystem's ODBC driver rather than the Active X control or CacheDirect.

I have an example of such a procedure for this question.

In case that link dies, here is the code:

Query OneGlobal(GlobalName As %String) As %Query(ROWSPEC = "NodeValue:%String,Sub1:%String,Sub2:%String,Sub3:%String,Sub4:%String,Sub5:%String,Sub6:%String,Sub7:%String,Sub8:%String,Sub9:%String") [SqlProc]
{
}

ClassMethod OneGlobalExecute(ByRef qHandle As %Binary, GlobalName As %String) As %Status
{
    S qHandle="^"_GlobalName
    Quit $$$OK
}

ClassMethod OneGlobalClose(ByRef qHandle As %Binary) As %Status [ PlaceAfter = OneGlobalExecute ]
{
    Quit $$$OK
}

ClassMethod OneGlobalFetch(ByRef qHandle As %Binary, ByRef Row As %List, ByRef AtEnd As %Integer = 0) As %Status [ PlaceAfter = OneGlobalExecute ]
{

    S Q=qHandle  
    S Q=$Q(@Q)  b  
    I Q="" S Row="",AtEnd=1 Q $$$OK
    S Depth=$QL(Q)
    S $LI(Row,1)=$G(@Q)
    F I=1:1:Depth S $LI(Row,I+1)=$QS(Q,I)
    F I=Depth+1:1:9 S $LI(Row,I+1)=""
    S AtEnd=0
    S qHandle=Q
    Quit $$$OK
}

FYI, that is not an example to use in production, since it exposes all the data from all the globals.

However, it sounded like you were prepared to write Cache Object Script to directly get the data anyway - this is a template for doing so. The main thing to understand is that qHandle will be passed back by the ODBC driver on each call, so you can use it to store state. If there is a lot of state, make qHandle be an integer index into a temporary global holding the "real" state, and clean that up in the close method.

Since you are concerned about performance, you may want to also implement a

MyQueryFetchRows (ByRef qHandle As %Binary, FetchCount As %Integer = 0, ByRef RowSet As %List, ByRef ReturnCount As %Integer, ByRef AtEnd As %Integer) As %Status

method - see the documentation for %Library.Query for more details.

If you really need this to appear to ODBC as a (read-only) table rather than a stored procedure I think it might be possible - but I've never before tried to see if an arbitrary stored procedure can be exposed as a read-only table and I'm not sure how easy it is, or if it's actually always possible.

Community
  • 1
  • 1
psr
  • 2,870
  • 18
  • 22
  • Interesting. I'll definitely look into it. Thanks! – Chris Smith Jun 20 '12 at 04:08
  • None of the subscripts in the globals I looked at have names, so $QSUBSCRIPT is always returning "". Does this dedicate a Caché process to the entire ODBC request, and if so can I just use process private globals for state? Do I need to worry about implementing GetODBCInfo? – Chris Smith Jun 20 '12 at 13:46
  • The subscript can't be an empty string (though $QSUB can return an empty string) so I'm not sure what you mean - perhaps it's a number? The code should still work though. Anyway, why not implement a for loop using this method - you won't have to debug slightly weird things like $QSUB. And no, you can't safely assume process affinity. You shouldn't need to implement GetODBCInfo. – psr Jun 20 '12 at 18:38
  • I mean that for the second paramater of $QS, I get "" for every input except 0, which repeats the global name back to me. That's not really important though. – Chris Smith Jun 20 '12 at 20:03
  • I decided to go with the out of process proxy service using shared memory for communication. That way, I can store as much state as I want in the process and not have to cram into qHandle. You were the only one to answer my question and that method does work, so I accept your answer. Thanks. – Chris Smith Jun 25 '12 at 18:20
  • @ChrisSmith - Maybe you can post some code as an answer once you've implemented it. – psr Jun 25 '12 at 18:54
  • @psr At the moment, I am using the VisM as an activeX control over vb.net. How do I pull a list of GLOBALS (even if it is unformatted) with a specific name (example: ^BACKTR) using this active X control? – Malcolm Salvador Mar 26 '16 at 21:44
  • 1
    @Malky.Kid - That sounds like you should post a separate question. Note that it's unclear to me from your comment what pulling a list of GLOBALS means. Do you mean all the key value pairs in a specific global? Including the global nodes with multiple keys? Or do you want a list of global names (I don't think so, but I'm not sure)? The return value for VisM is basically a string - what do you want to do if the results are too big for a string? How do you want the string formatted? Unclear questions usually get closed, so think about how to explain exactly what you are trying to do. – psr Mar 28 '16 at 15:45