0

I'm creating a interface between swi-prolog and php. The php writes commands it wants prolog to run on a file and then does a system call so prolog runs the file. The problem is that when there's special characters on the file (like á, í, ã, ê and etc...), these characters are replaced by \uFFFD in the output from prolog, I know that this codepoint is for unknown/unidentified codepoints, but I have been unsuccessful to solve the issue with what I found on the Internet. If a run the file from the terminal myself it shows the correct characters, just when php runs from exec or shell_exec that it seem to lose reason.

Here's the code used, first the php:

        $arquivo = fopen("/home/giz/prologDB/run.pl", w);
        $run = <<<EOT
    go :-   
        consult('/home/giz/prologDB/pessoasOps.pl'),
        addPessoa(0,'$name','$posicao','$resume','$unidade','$curso','$disciplina',$alunos,[]),
        halt.
EOT;

        echo $run;
        fwrite($arquivo, $run);

        $cmd = "prolog -f /home/giz/prologDB/run.pl -g go";     
        exec( $cmd, $output );
        echo "\n";      
        print_r( $output );   
        echo "\n"; 

prolog code:

addPessoa(LOCAL, NOME, POSICAO, RESUMO, UNIDADE, CURSO, DISCIPLINA, ALUNOS, REFERENCIA):-
    write( 'Prolog \nwas called \nfrom PHP \nsuccessfully.\n' ),    
    write('pessoa('),
    write(LOCAL),
    write(',\''),   
    write(NOME),
    write('\',\''),
    write(POSICAO),
    write('\',\''),
    write(RESUMO),
    write('\',\''),
    write(UNIDADE),
    write('\',\''),
    write(CURSO),
    write('\',\''),
    write(DISCIPLINA),
    write('\','),
    write(ALUNOS),
    write(','),
    write(REFERENCIA),
    write(').\n'),
    make.

Does someone know how to make it interpret the string properly?

outis
  • 75,655
  • 22
  • 151
  • 221
  • 1
    apart from the fact that you have to escape quotes in your variables, where do the characters you are talking about enter the scene? And did you check that all encodings match (of the terminal, the IDE, etc.)? – Walter Tross Jul 05 '12 at 20:12

1 Answers1

0

Most probably Prolog expects UTF-8 encoded characters, and you are feeding it ISO-8859-n characters, where n is most probably 1 or 15. In UTF-8, when a byte >= 128 is seen, it is either the first of a multibyte sequence (if it is >= 192) or a continuation byte. If the first byte of a multibyte sequence is not followed by a continuation byte, or if a sequence starts with a continuation byte, you get an unrecognized byte sequence, in your case a U+FFFD codepoint. All characters with diacritics are above 128 in ISO-8859-n.

Check also swi-prolog's manual page on encoding, especially the whole paragraph that starts with these two sentences:

The default encoding for files is derived from the Prolog flag encoding, which is initialised from the environment. If the environment variable LANG ends in "UTF-8", this encoding is assumed.

A good reason for a different behavior of swi-prolog when called from a shell or from within PHP could be a different setting of the LANG environment variable in these two cases. But in the same paragraph the manual mentions ways of forcing the encoding.

In a shell, the fastest way to see the bytes contained in a file is to do an od -tx1z filename | less (leave out the z in case of hard-to-print characters).

Walter Tross
  • 12,237
  • 2
  • 40
  • 64