5

I'd like to use the SAS libname JSON engine instead of PROC GROOVY to import the JSON file I get from the Twitter API. I am running SAS 9.4M4 on OpenSuse LEAP 42.3.

I followed Falko Schulz's description in how to access the Twitter API and everthing worked out fine. Up to the point at which I wanted to import the JSON file into SAS. So the last working line of code is:

proc http method="get"
out=res headerin=hdrin
url="https://api.twitter.com/1.1/search/tweets.json?q=&TWEET_QUERY.%nrstr(&)count=1"
ct="application/x-www-form-urlencoded;charset=UTF-8";
run;

which yields a json-file in the file referenced with the filename "res".

Falko Schulz uses PROC GROOVY. In SAS 9.4M4, however, there is this mysterious JSON libname engine that makes life easier. And it works for simple JSON files. But not for the Twitter data. So having the JSON data from Twitter downloaded, using

libname test JSON filref=res;

gives me the following error:

Invalid JSON in input near line 1 column 751: Some code points did not transcode.

I suspected that something is wrong with the encoding of the files so I used a filename statement of the form:

filename res TEMP encoding="utf-8";

without luck...

I also tried to increase the record length

filename res TEMP encoding="utf-8" lrecl=1000000;

and played around with the record format... to no avail...

Can somebody help? What am I missing? How can I use the JSON engine in a LIBNAME statement without running into this error?

Johannes Bleher
  • 321
  • 3
  • 15
  • 1
    What encoding is your SAS session running in? IE, what does this return: `proc options option=encoding; run;` – Joe Oct 16 '17 at 18:17
  • ENCODING=LATIN9, I should probably change that to UTF-8 – Johannes Bleher Oct 16 '17 at 18:19
  • 1
    Yes, there's a good chance that's at least part of your issue. Most SAS installations 9.4+ automatically include a UTF-8 startup option also (it's probably a separate shortcut in the start menu/etc.) – Joe Oct 16 '17 at 18:20
  • Thanks. This fixed it! Sorry for bothering! – Johannes Bleher Oct 16 '17 at 18:23
  • Unfortunately a common issue in SAS thanks to SAS's complexity in dealing with encoding; probably a useful question for others in the future! – Joe Oct 16 '17 at 18:46

1 Answers1

3

Run your SAS session in UTF-8 mode, if you're inputting UTF-8 files into SAS datasets. While it's possible to run SAS in another mode and still read UTF-8 encoded files to some extent, you will generally have a lot of difficulties.

You can tell what encoding your session is in with this code:

proc options option=encoding;
run;

If it returns this:

 ENCODING=WLATIN1  Specifies the default character-set encoding for the SAS session.

Then you're not in UTF-8 encoding.

SAS 9.4 and later on the desktop are typically installed with UTF-8 option automatically selected in addition to the default WLATIN1 (when installed in English, anyway). You can find it in the start menu under SAS 9.4 (Unicode Support), or by using the sasv9.cfg file in the 9.4\nls\u8\ subfolder of your SAS Foundation folder. Other earlier versions may also have that subfolder/language installed, but it was not always default to have it installed.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Whenever someone brings up the documentation in SAS it should just open a chat window with you instead. – Robert Penridge Oct 16 '17 at 22:54
  • @RobertPenridge You haven't been on the SAS Community forums recently have you... I'm fairly sure Reeza wins that one hands down over there ;) – Joe Oct 16 '17 at 23:00
  • So the solution works very nicely. I am wondering, however, is there a solution that does not involve changing the default session encoding? Can I change the encoding option from within my code? So what did not work for me, so far, was changing the encoding options in a filename statement... – Johannes Bleher Oct 17 '17 at 13:46
  • @JohannesBleher Changing the file encoding permits SAS to read it, but the session encoding defines how the SAS datasets are encoded. If your Twitter feed has only characters that neatly transcode to WLATIN1, well, you're fine; but if it doesn't, what else could SAS do? – Joe Oct 17 '17 at 14:05
  • @Joe: Maybe I do not understand the issue quiet as well as you do. But my issue is that the libname JSON engine has no encoding option. So the text file may be encoded in UTF-8, but I can't read the text file with SAS and store it in a SAS dataset with the same encoding (UTF-8) unless I change the encoding of my entire session to UTF-8. So what I would like to do is something along the lines `option encoding=UTF8; /*Read in JSON data that require UTF-8 encoding option to be on*/ option encoding=WLATIN1; /*Do all kinds of other tasks that require WLATIN1 encoding*/` – Johannes Bleher Oct 17 '17 at 14:36
  • 1
    You have to have SAS in the session encoding in UTF-8 to get the dataset to work in UTF-8, realistically. I'm not sure that's *officially* true, but it seems to be generally the case nonetheless. – Joe Oct 17 '17 at 15:02