3

I have a Java WebAgent in Lotus-Domino which runs through the OpenURL command (https://link.com/db.nsf/agentName?openagent). This agent is created for receiving a POST with XML content. Before even parsing or saving the (XML) content, the webagent saves the content into a in-memory document:

For an agent run from a browser with the OpenAgent URL command, the in-memory document is a new document containing an item for each CGI (Common Gateway Interface) variable supported by Domino®. Each item has the name and current value of a supported CGI variable. (No design work on your part is needed; the CGI variables are available automatically.) https://www.ibm.com/support/knowledgecenter/en/SSVRGU_9.0.1/basic/H_DOCUMENTCONTEXT_PROPERTY_JAVA.html

The content of the POST will be saved (by Lotus) into the request_content field. When receiving content with this character: é, like:

 <Name xml:lang="en">tést</Name>

The é is changed by Lotus to a ?®. This is also what I see when reading out the request_content field in the document properties. Is it possible to save the é as a é and not a: ?® in Lotus?

Solution:

The way I fixed it is via this post:

Link which help me solve this problem

The solution but in Java:

 /****** INITIALIZATION ******/
              session = getSession();
              AgentContext agentContext = session.getAgentContext();

              Stream stream = session.createStream();
              stream.open("C:\\Temp\\test.txt", "LMBCS");
        stream.writeText(agentContext.getDocumentContext().getItemValueString("REQUEST_CONTENT"));
              stream.close();
              stream.open("C:\\Temp\\test.txt", "UTF-8");
              String Content = stream.readText();
              stream.close();
              System.out.println("Content: " + Content);
Nuri Ensing
  • 1,899
  • 1
  • 19
  • 42
  • 1
    We don't know how you're saving the string to start with - but I'd *strongly* recommend you to use an XML API instead of building the string up manually, escaping `&` yourself etc. – Jon Skeet Feb 27 '18 at 12:53
  • @JonSkeet Thanks for commenting. This String variable will be used to save the XML to a new XML file. Sometimes we receive an .XML without properly encodings in it, like the &. If we then try to open the XML file for example in a browser then it will give a errormessage because of the & sign an that is why I replace all & manually. Will it also be better to use XML api first and then save the content to a new XML file? – Nuri Ensing Feb 27 '18 at 13:16
  • 1
    "This String variable will be used to save the XML to a new XML file." - yes, but how? You haven't shown the code you use to save it, or what you're doing to get the string from Lotus, or how you're observing the results. All of this can vary by encodings all over the place. Fundamentally, if you're receiving a document that has unescaped `&`, then it sounds like you're not receiving valid XML to start with, and that's a potentially bigger problem. – Jon Skeet Feb 27 '18 at 13:39
  • Edited and altered the whole question @JonSkeet, thanks for implicitly helping me on creating better questions :) – Nuri Ensing Feb 27 '18 at 14:01

2 Answers2

2

I've dealt with this before, but I no longer have access to the code so I'm going to have to work from memory.

This looks like a UTF-8 vs UTF-16 issue, but there are up to five charsets that can come into play: the charset used in the code that does the POST, the charset of the JVM the agent runs in, the charset of the Domino server code, the charset of the NSF - which is always LMBCS, and the charset of the Domino server's host OS.

If I recall correctly, REQUEST_CONTENT is treated as raw data, not character data. To get it right, you have to handle the conversion of REQUEST_CONTENT yourself.

The Notes API calls that you use to save data in the Java agent will automatically convert from Unicode to LMBCS and vice versa, but this only works if Java has interpreted the incoming data stream correctly. I think in most cases, the JVM running under Domino is configured for UTF-16 - though that may not be the case. (I recall some issue with a server in Japan, and one of the charsets that came into play was one of the JIS standard charsets, but I don't recall if that was in the JVM.)

So if I recall correctly, you need to read REQUEST_CONTENT as UTF-8 from a String into a byte array by using getBytes("UTF-8") and then construct a new String from the byte array using new String(byte[] bytes, "UTF-16"). That's assuming that Then pass that string to NotesDocument.ReplaceItemValue() so the Notes API calls should interpret it correctly.

I may have some details wrong here. It's been a while. I built a database a long time ago that shows the LMBCS, UTF-8 and UTF-16 values for all Unicode characters years ago. If you can get down to the byte values, it can be a useful tool for looking at data like this and figuring out what's really going on. It's downloadable from OpenNTF here. In a situation like this, I recall writing code that got the byte array and converted it to hex and wrote it to a NotesItem so that I could see exactly what was coming in and compare it to the database entries.

And, yes, as per the comments, it's much better if you let the XML tools on both sides handle the charset issues and encoding - but it's not always foolproof. You're adding another layer of charsets into the process! You have to get it right. If the goal is to store data in NotesItems, you still have to make sure that the server-side XML tools decode into the correct charset, which may not be the default.

Richard Schwartz
  • 14,463
  • 2
  • 23
  • 41
  • Thanks @Richard. I could not fix it with the code of above but you put me in the right direction. I have found a solution and put it in my question. The only thing which I do not love is that a text file is created to solve this. – Nuri Ensing Mar 02 '18 at 07:30
  • 1
    Glad you figured it out. – Richard Schwartz Mar 02 '18 at 14:51
2

my heart breaks looking at this. I also just passed through this hell, found the old advice, but... I just could not write to disk to solve this trivial matter.

Item item = agentContext.getDocumentContext().getFirstItem("REQUEST_CONTENT");
byte[] bytes = item.getValueCustomDataBytes("");
String content= new String (bytes, Charset.forName("UTF-8"));

Edited in response to comment by OP: There is an old post on this theme: http://www-10.lotus.com/ldd/nd85forum.nsf/DateAllFlatWeb/ab8a5283e5a4acd485257baa006bbef2?OpenDocument (the same thread that OP used for his workaround)

the guy claims that when he uses a particular http header the method fails. Now he was working with 8.5 and using LS. In my case I cannot make it fail by sending an additional header (or in function of the string argument)

How I Learned to Stop Worrying and Love the Notes/Domino: For what it's worth getValueCustomDataBytes() works only with very short payloads. Dependent on content! Starting your text with an accented character such as 'é' will increase the length it still works with... But whatever I tried I could not get past 195 characters. Am I surprised? After all these years with Notes, I must admit I still am...

Well, admittedly it should not have worked in the first place as it is documented to be used only with User Defined Data fields.

Finally Use IBM's icu4j and icu4j-charset packages - drop them in jvm/lib/ext. Then the code becomes:

byte[] bytes = item.getText().getBytes(CharsetICU.forNameICU("LMBCS"));
String content= new String (bytes, Charset.forName("UTF-8"));

and yes, will need a permission in java.policy:

permission java.lang.RuntimePermission "charsetProvider"; 

Is this any better than passing through the file system? Don't know. But kinda looks cleaner.

Normunds Kalnberzins
  • 1,213
  • 10
  • 20
  • Sorry for breaking your heart. I tried your code and on line: byte[] bytes = item.getValueCustomDataBytes(""); , I get this error: HTTP JVM: NotesException: Supplied Data type name does not match stored CustomData type – Nuri Ensing Mar 05 '18 at 07:06
  • 2
    hm, it works for me. 9.0.1 FP4. And yes, not you who is breaking my heart - the fact the IBM can not convert the encoding - you get "binary data" as a text item (or text item converted using wrong encoding - whatever way) with no obvious way to access/convert them – Normunds Kalnberzins Mar 05 '18 at 09:44