HTML Entity Converter in Ada

Question

I want to write an Ada program which replaces Latin1 characters with applicable HTML entities, but my code does not work: text.txt and converted.txt are always the same. My tutor said that code is correct. Thanks in advance!

Here is my code:

with Ada.Text_IO;
procedure Entity_Converter is
   use Ada.Text_IO;

   Source : File_Type;
   Target : File_Type;
   Source_Char : Character;
begin
   Open (Source, In_File, "test.txt");
   Create (Target, Out_File, "converted.txt");
   while not End_Of_File (Source) loop
      Get (Source, Source_Char);
      case Source_Char is
         when 'ä' =>
            Put (Target, "&auml;");
         when 'Ä' =>
            Put (Target, "&Auml;");
         when 'ö' =>
            Put (Target, "&ouml;");
         when 'Ö' =>
            Put (Target, "&Ouml;");
         when 'ü' =>
            Put (Target, "&uuml;");
         when 'Ü' =>
            Put (Target, "&Uuml;");
         when 'ß' =>
            Put (Target, "&szlig;");
         when others =>
            Put (Target, Source_Char);
      end case;
   end loop;
   Close (Source);
   Close (Target);
end Entity_Converter;

score 3 · Accepted Answer · answered Dec 27 '11 at 15:32

The result depends on the encoding of both the source text, as well as the test file.

To address the former, use the constants of the package Ada.Characters.Latin_1:

with Ada.Characters.Latin_1;
use Ada.Characters.Latin_1;
...
   case Source_Char is
      when LC_A_Diaeresis =>
         Put (Target, "&auml;");
      when UC_A_Diaeresis =>
         Put (Target, "&Auml;");
      ...
      when LC_German_Sharp_S =>
         Put (Target, "&szlig;");
      when others =>
         Put (Target, Source_Char);
   end case;

The latter depends on your editor.

Awesome! There was a problem with encoding.Thank you! – stardust Dec 27 '11 at 16:18 — stardust, Dec 27 '11 at 16:18

score 1 · Answer 2 · answered Dec 27 '11 at 16:21

I’m running on a Mac and I copied your source. When I compiled it, it complained that (for example) ’ä’ needed double quotes; a hint that the source uses wide characters. It seems it’s in UTF-8[1], so I compiled with -gnatW8, which appeared to be successful.

I then ran the program on a copy of its own source text, and it failed to transform the text.

Compiling with -gnatdg, which makes GNAT produce a representation of its internal source tree, I get

  ada__text_io__get (source, source_char);
  case source_char is
     when '["e4"]' =>
        ada__text_io__put__3 (target, "&auml;");
     when '["c4"]' =>
        ada__text_io__put__3 (target, "&Auml;");

which looks to me as though GNAT has read the UTF-8 encoding of ä and used the Latin-1 version for the case statement; not unreasonable given that it says Character, and quite enough to explain why it failed to convert itself.

I then tried using Ada.Wide_Text_IO and Wide_Character. Sadly the program failed, for the same reason as before. Could we be looking at a feature? or even a bug?

[1] The file may have ended up in UTF-8 because of the roundabout way I downloaded it, of course.

Interesting. On Mac, BBEdit/TextWrangler is handy for setting encoding, but I finally just used `echo "\0304\0344…\0334\0374\0337" > test.txt` to create the test input. — trashgod, Dec 27 '11 at 16:33

HTML Entity Converter in Ada

2 Answers2