1

How can we retrieve actual unicode string from the content fields of TWebRequest. When i try to read content fields of TWebRequest to get the input unicode value i have entered in a text i see scrambled value instead of the actual. The input which i gave was Добро but in the content fields i see the value Добро. The Response contenttype is set to text/html and charset='UTF-8'. Can any body tell why doesn't it show the actual value entered in the text box and how this can be corrected.

sample code which i was testing

procedure TWebModule1.WebModule1HelloAction(Sender: TObject;
  Request: TWebRequest; Response: TWebResponse; var Handled: Boolean);
var
  s : string;
  PageProducer1 : TPageProducer;
begin
  Response.ContentType := 'text/html;charset=UTF-8';
  s := Request.ContentFields.Text;
  PageProducer1 := TPageProducer.Create(nil);
  try
    PageProducer1.HTMLFile := 'C:\Hello.tmpl';
    PageProducer1.OnHTMLTag := PageProducer1HTMLTag;
    Response.Content := PageProducer1.Content + ' ' + 'Entered string:' + s;
  finally
    PageProducer1.Free;
  end;
end;

Hello.tmpl just has text box and submit button

LU RD
  • 34,438
  • 5
  • 88
  • 296
ravi12
  • 39
  • 1
  • 4

2 Answers2

3

You can use the UTF8ToString function to convert your UTF-8 string to a UnicodeString.

Ondrej Kelle
  • 36,941
  • 2
  • 65
  • 128
  • Thanks for the reply, UTF8ToString works perfectly but is it that since charset is set to UTF-8, the string is ut8 encoded and we need to convert to unicode string. Is there any way or setting that can be set on Request object so that it automatically does the conversion – ravi12 Jan 20 '12 at 12:44
  • 2
    An HTTP payload is an arbitrary sequence of octets. The content-type (and content-encoding) field tells you how to interpret it. It's your application responsibility to read the proper header fields and handle the payload the correct way. – Mad Hatter Jan 20 '12 at 13:09
  • This will work correctly, but give a warning *W1058 Implicit string cast with potential data loss from 'string' to 'RawByteString'*. Combine it with using RawContent instead of Content and the warning is gone. – Jan Doggen Jan 07 '16 at 11:18
0

You just need to use TWebRequest.ContentRaw which return an AnsiString with correct code page based on charset defined in request header. Unfortunately you will have to process content manually.

To get a string (UnicodeString) use TEncoding.UTF8.GetString(BytesOf(Request.RawContent)) if you are sure that charset is UTF-8. Alternatively you can check original contentType of header with:

var ct: string;
...
ct := string(Request.GetFieldByName('Content-type')).ToUpper;
if (Pos('CHARSET', ct) > 0) and (Pos('UTF-8', ct) > 0) then
    Result := TEncoding.UTF8.GetString(BytesOf(Request.RawContent))
  else
    Result := TEncoding.ANSI.GetString(BytesOf(Request.RawContent));

TWebRequest.Content and TWebRequest.ContentFields are bugged in my current version of (). They are always encoded in ANSI. TWebRequest.EncodingFromContentType try to extract charset from TWebRequest.ContentType, but charset part in contentType is already removed by previous code at this point.

sunix
  • 146
  • 1
  • 8