2

I'm trying to create an "XLS" file, but it will be available as txt file. (It means when you right click and select open with in windows and choose notepad it will show as tab separated file)

This is the sample file that I'm trying to create (I have to remove some entry because it is big). When you turn it into UTF-8 in notepad++, you will see "hidden characters" https://docs.google.com/spreadsheets/d/1q_AkGaQK8Glc6OzmVl4gRmItO4Ojnq7G/edit?usp=sharing&ouid=113904619378239546124&rtpof=true&sd=true

When you download this file and open it in notepad++. Choose the encoding as UTF-8. You will able to see those hidden characters: Sample file

This is what it looks like if you open it with excel: enter image description here As you can see at the beginning of the content you will see those xA0 as the hidden characters. Now I know that AS400 use EBCDIC characters code.

This is the code that they put in the cobol program:

HSPACE PIC X VALUE X'41'

What are the equivalent of that hidden characters in Java?

I have create a test program below:

 List<Object[]> data = new ArrayList<>();
            data.add(new Object[]{"\u0020 AS1", "185914", "\u0020 NETHERLANDS", "NL", "A0", "\u00202023714", "\u00A02023714", "27-AUG-2022", "03-FEB-2023", "\u00A0", "\u00A04", "00000000", "\u00A0IF-ADAMAS", "\u00A0", "\u00A0PTF166091NL00", "\u00A0P166091NL00", "\u00A0", "\u00A0", "\u00A0", "\u00A0", "\u00A0IF ADAMAS B V"});
            data.add(new Object[]{"\u0020 AS1", "\u0020200893", "\u0020 GERMANY", "DE", "\u00A0", "\u00A013801864.3", "\u00A02915188", "05-NOV-2022", "22-FEB-2023", "\u00A0R80049", "\u00A010", "00000434", "\u00A0MICRONIT M", "\u00A0", "\u00A0PTF124241DEEP", "\u00A0P118354DEEP", "\u00A0", "\u00A0", "\u00A0", "\u00A0", "\u00A0MICRONIT MICROFLUIDICS B.V."});

            FileWriter writer = new FileWriter("output.XLS", StandardCharsets.UTF_8);
            
            writer.write("\"Client\"\t\"Case Number\"\t\"Country\"\t\"WIPO\"\t\"Subcase\"\t\"Application Number\"\t\"Patent Number\"\t\"Due Date\"\t\"Paid Date\"\t\"Invoice Number\"\t\"Annuity Number\"\t\"Invoice Amount\"\t\"Client/Division\"\t\"Client Ref(Inv)\"\t\"Client Ref#1(Ctry)\"\t\"Client Ref#2(Ctry)\"\t\"Attorney(Inv)\"\t\"Attorney(Ctry)\"\t\"Remarks\"\t\"Local Title\"\t\"Title Holder\"\n");

            for (Object[] row : data) {
                for (int i = 0; i < row.length; i++) {
                    writer.write("\"" + row[i].toString() + "\"");
                    if (i < row.length - 1) {
                        writer.write("\t");
                    }
                }
                writer.write("\n");
            }

            writer.close();
            System.out.println("Done");

However, the file when I open in notepad++ with encoding UTF-8, I see nothing enter image description here

Although you will see there is white space like in that text file. However, if you open this file in my generated file in excel: enter image description here You can see it has weird characters that I put in my code! How can I create a text file in Java that output a "XLS / TXT" file like the IBM I series(AS400) cobol program? Can someone help me with this, please?

EthanP
  • 55
  • 8
  • 1
    X'40' is the EBCDIC space character, which converts to an ASCII space (x'20'). X'41' is undefined in EBCDIC, but the original COBOL coders (I'm guessing) used the character to represent a horizontal space. In other words, drop to the next line on a terminal. I have no idea what your ASCII to EBCDIC converter does with an x'41'. – Gilbert Le Blanc Mar 30 '23 at 19:27
  • Thank you! But then what should I do? Because I can't seem to find the EBCDIC and ASCII side by side! Also I was trying to have those xA0 when I open as a text file for that XLS file! – EthanP Mar 30 '23 at 19:29
  • 1
    I used [Decimal - HEXadecimal - EBCDIC - ASCII - Bit Conversion Tables](https://www.ibm.com/support/pages/decimal-hexadecimal-ebcdic-ascii-bit-conversion-tables). But you have to be careful. This table shows what happens if there is no conversion at all. Every EBCDIC to ASCII conversion program will have its own list. One obvious example is A (x'C1') to A (x'41'), which might be what you're seeing if the x'41' comes across as unconverted. – Gilbert Le Blanc Mar 30 '23 at 19:35
  • Ok so I tried a white space already but then when I open the XLS file as a text file. The character xA0 doesn't show up like the file that is generated by the IBM I(AS400) – EthanP Mar 30 '23 at 19:35
  • This is exactly what I'm scared. I also aware that there is some characters that is not converted, but I'm just really trying! :( – EthanP Mar 30 '23 at 19:37
  • ASCII x'A0" is an undefined ASCII character. You can try doing a `replaceAll` on x'A0' and see what happens. Try the null `String` "" and a space `String` " " separately and see if either works. – Gilbert Le Blanc Mar 30 '23 at 19:44
  • Sorry, I don't really understand so I will change. Ex: data.add(new Object[]{"AS1", "185914", "NETHERLANDS", Then writer.write("\"" + " " +row[i].toString() + "\""); Is that what you are trying to say? – EthanP Mar 30 '23 at 19:46
  • Try opening file with charset `windows-1252`, because the file in Google docs is not encoded with utf-8, and replacing \" with \u00a0. – nfgl Mar 30 '23 at 20:01
  • You may misunderstand because \" is to represent the double quote in text file in java. Ex: "Hello" I just realize that the google docs automatically stripped away my double quotation! Sorry because I don't have a way to send it for you guys. But if someone has recommendation. I would follow! – EthanP Mar 30 '23 at 20:04
  • OK so place \u00A0 where you want to see XAO when notepad++ show the file as utf-8, but user windows-1252 charset – nfgl Mar 30 '23 at 20:09
  • I was trying to let you guys notice that hidden characters. Because in windows, all you see is a white space. However, when you open it in notepad++ and encode it as utf8. You can see the xA0 clearly as a hidden character. Sorry for the confusion! As long as when I open the .xls file I see a white space, and that white space is xA0 in notepad++ encoding utf8. It is what I'm trying to do! – EthanP Mar 30 '23 at 20:12
  • And that's what you want to see with the file you write, isn't it ? – nfgl Mar 30 '23 at 20:16
  • By the way, @GilbertLeBlanc I tried to add one of the content from "AS1" to " AS1". It never adds the xA0 for me like the file generated by the IBM i – EthanP Mar 30 '23 at 20:16
  • @nfgl yes sir yes sir – EthanP Mar 30 '23 at 20:17
  • So just try `new FileWriter("output.XLS", "windows-1252");` and write a line with \u00a0, I think you will have what you want – nfgl Mar 30 '23 at 20:20
  • @nfgl oh wao!!!! I just tried it! It now is shows up on the file. Thank you so much! But can you explain to me about when I put "windows-1252" in, then I can see it, please? – EthanP Mar 30 '23 at 20:29
  • just added and answer with a short explanation – nfgl Mar 30 '23 at 20:38

1 Answers1

3

The file produce by AS400 is encoded (probably) with windows 1252 charset, notepad++ names it ansi. When you display it as utf8 you see XA0 because the way it is encode is illegal in utf-8.

So to produce a similar file you have to write it with charset 1252 too and use \u00A0 in your java strings, so that when writen java nio translates it from \u00a0 to \xa0

FileWriter writer = new FileWriter("output.XLS", Charset.forName("windows-1252"));
writer.write("\u00a0");
nfgl
  • 2,812
  • 6
  • 16
  • Thank you so so much! I was banging my head for couple hours now! – EthanP Mar 30 '23 at 20:40
  • Oh I want to point this out too! I have to use this OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream("output.XLS"),"windows-1252"); because the FileWriter doesn't let me specify the "windows-1252" as an argument – EthanP Mar 30 '23 at 20:42
  • Yes sorry I forgot that, just made an edit. Glad that helped, you can accept the answer if you want. – nfgl Mar 30 '23 at 20:49
  • Yes sir yes sir! – EthanP Mar 30 '23 at 20:52
  • oh hi @nfgl, I just realize the space between them is not as a simple tab ("\t") that I did. It was actually DCL VAR(&FLDDLM) TYPE(*CHAR) LEN(1) VALUE(X'05'). Do you know what is the equivalence of X'05'? Because it gave the ENQ in my text file, although it should not? – EthanP Mar 30 '23 at 23:03
  • Should I make a separated post for this matter? – EthanP Mar 30 '23 at 23:03
  • 1
    You'd better ask a new question when you have one, even close to the first one. So anyone can answer it. To answer your question x'05' in EBCDIC is horizontal tab (see [wikipedia](https://en.wikipedia.org/wiki/EBCDIC)), it matches with \t or \u0009 in ansi and utf-8, you can't see a difference. – nfgl Mar 31 '23 at 14:05
  • Oh my I just post a new question about this! Thank you thank you, this was my first job, and I'm new to posting question! I will try it now! :D – EthanP Mar 31 '23 at 14:22
  • oh wait! I already put in as \t in the code. That means I guess it's all good now? – EthanP Mar 31 '23 at 14:23
  • 1
    yes I think it's all good with \t – nfgl Mar 31 '23 at 14:26
  • I will try and email the generated file for the client. When they respond, I will keep updating this question! Again, thank you so so much! – EthanP Mar 31 '23 at 14:27
  • So just an update! Thank you for helping me out! However, the client never responds, so we never know if it work or not! However, this is a great knowledge to know! Thank you thank you! – EthanP Jun 02 '23 at 20:03