2

I want to write an Ada program which replaces Latin1 characters with applicable HTML entities, but my code does not work: text.txt and converted.txt are always the same. My tutor said that code is correct. Thanks in advance!

Here is my code:

with Ada.Text_IO;
procedure Entity_Converter is
   use Ada.Text_IO;

   Source : File_Type;
   Target : File_Type;
   Source_Char : Character;
begin
   Open (Source, In_File, "test.txt");
   Create (Target, Out_File, "converted.txt");
   while not End_Of_File (Source) loop
      Get (Source, Source_Char);
      case Source_Char is
         when 'ä' =>
            Put (Target, "ä");
         when 'Ä' =>
            Put (Target, "Ä");
         when 'ö' =>
            Put (Target, "ö");
         when 'Ö' =>
            Put (Target, "Ö");
         when 'ü' =>
            Put (Target, "ü");
         when 'Ü' =>
            Put (Target, "Ü");
         when 'ß' =>
            Put (Target, "ß");
         when others =>
            Put (Target, Source_Char);
      end case;
   end loop;
   Close (Source);
   Close (Target);
end Entity_Converter;
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
stardust
  • 343
  • 3
  • 17

2 Answers2

3

The result depends on the encoding of both the source text, as well as the test file.

To address the former, use the constants of the package Ada.Characters.Latin_1:

with Ada.Characters.Latin_1;
use Ada.Characters.Latin_1;
...
   case Source_Char is
      when LC_A_Diaeresis =>
         Put (Target, "ä");
      when UC_A_Diaeresis =>
         Put (Target, "Ä");
      ...
      when LC_German_Sharp_S =>
         Put (Target, "ß");
      when others =>
         Put (Target, Source_Char);
   end case;

The latter depends on your editor.

trashgod
  • 203,806
  • 29
  • 246
  • 1,045
1

I’m running on a Mac and I copied your source. When I compiled it, it complained that (for example) ’ä’ needed double quotes; a hint that the source uses wide characters. It seems it’s in UTF-8[1], so I compiled with -gnatW8, which appeared to be successful.

I then ran the program on a copy of its own source text, and it failed to transform the text.

Compiling with -gnatdg, which makes GNAT produce a representation of its internal source tree, I get

  ada__text_io__get (source, source_char);
  case source_char is
     when '["e4"]' =>
        ada__text_io__put__3 (target, "ä");
     when '["c4"]' =>
        ada__text_io__put__3 (target, "Ä");

which looks to me as though GNAT has read the UTF-8 encoding of ä and used the Latin-1 version for the case statement; not unreasonable given that it says Character, and quite enough to explain why it failed to convert itself.

I then tried using Ada.Wide_Text_IO and Wide_Character. Sadly the program failed, for the same reason as before. Could we be looking at a feature? or even a bug?

[1] The file may have ended up in UTF-8 because of the roundabout way I downloaded it, of course.

Simon Wright
  • 25,108
  • 2
  • 35
  • 62
  • Interesting. On Mac, BBEdit/TextWrangler is handy for setting encoding, but I finally just used `echo "\0304\0344…\0334\0374\0337" > test.txt` to create the test input. – trashgod Dec 27 '11 at 16:33