0

Further to my old question, We are generating XML using following code:

download_xml('GET', [])->
    Xml =generateXML(123445),
    %% generated Xml  data in string without any values 400,.etc
    Filename = export_xml:get_file_name(?SESSION_ID1, ?SESSION_ID2), 
    Filepath = "./priv/static/" ++ Filename,
    TotalSize = filelib:file_size(Filepath),
    {ok, FP} = file:open(Filepath, [read]),
    Generator = fun(FH) -> 
                        case file:read(FH, 1024) of %% But this line is causing something that we never wanted.
                            eof -> file:close(FH), 
                                   done; 
                            {ok, Data} -> 
                                {output, Data, FH} 
                        end 
                end,
{stream, Generator, FP, [
                             {"Content-Type", "application/force-download"},
                             {"Content-Disposition", "attachment; filename=" ++ Filename},
                             {"Content-length", TotalSize}
                            ]}.

We are reading files in chunks using file:read(FH, 1024) by this line. But this line is also appending some numbers 400, 400, 3b2 in each chunk. We have observed that those codes are nothing but the Hexadecimal values for each chunk. Here is the sample XML :

sample.xml

400
<?xml version="1.0" encoding="UTF-8"?>.....</info><inf
400
tel>4444</tel>...<address></address>
3b2
<name> Abc</name><surname>EFg</surname><city>XYZ</city>....
</DATA>
0

Since, on changing the chunk size to 2048 from 1024 (i.e file:read(FH, 2048)) values also get changed to 808, 365, 0.

What we're not understanding is: - While streaming the file contents in chunks, each chunk is appending, it's (chunk's) size in the XML and then actual chunk is inserted.

Here is small XML wanted to generate has size (93 Bytes):

<?xml version="1.0">
<info>
<name> Abc</name>
<surname>EFg</surname>
<city>XYZ</city>
</info>

After generating we get the output as:

5d
<?xml version="1.0">
<info>
<name> Abc</name>
<surname>EFg</surname>
<city>XYZ</city>
</info>
0 

5d = 93 (Chunk size) In this case file size.

The Question is:

  • Why chunk is appending size before each chunk while streaming the file with Generator?

NOTE - We also tried removing header list {"Content-length", TotalSize} from the code, but did not work :(

Community
  • 1
  • 1
trex
  • 3,848
  • 4
  • 31
  • 54
  • Did you check if there is a problem with data encoding (different encoding for read and write operation ?) – Pascal Jan 27 '15 at 08:11
  • @Pascal - Okay, Let me check. – trex Jan 28 '15 at 06:54
  • @Pascal- I wonder, if you deserve the bounty. I haven't got the final solution but I'm not getting codes now. And tried one change `{ok, FP} = file:open(Filepath, [read, {encoding, utf8}])` – trex Jan 28 '15 at 12:26
  • I don't kow how bounty works, but the good thing is that your question poped at the top of the list :o).. I suggested this track because I saw encoding="UTF-8" in your example, and 400 cannot be coded on 8bits. So I thought that the trailing values could be misinterpreted characters. But what I don't understand is why it occurs only at the end of the binary. – Pascal Jan 28 '15 at 21:26
  • @Pascal- Sorry, codes are appearing as it is. Today, we have observed that, seems to be those codes are hexadecimal values for each chunk. Since with some more understanding I will update my query. – trex Jan 29 '15 at 05:44

1 Answers1

1

I saw an exchange in the erlang bug mailing list which looks related to your problem : Misleading docs or implementation of file:read/2 and friends

It seems that the usage of file:read/2 is not 100% clean with the utf8 option. They recommend to use instead io:read/3, but I don't see how to deal with chunks and potential new line.

Pascal
  • 13,977
  • 2
  • 24
  • 32