2

I have a text file (~ 300 MB large) with a nested list, similar to this one:

[[4, 9, 11, 28, 30, 45, 55, 58, 61, 62, 63, 69, 74, 76, 77, 82, 87, 92, 93, 94, 95], [4, 9, 11, 28, 30, 45, 55, 58, 61, 62, 63, 69, 74, 76, 77, 82, 87, 92, 93, 94],[4, 9, 11, 28, 30, 45, 55, 58, 61, 62, 63, 69, 74, 76, 77, 82, 85, 87, 92, 93, 94, 95]]

Here is my program to read the file into a haskell Integer list:

import qualified Data.ByteString as ByteStr

main :: IO ()

-- HOW to do the same thing but using ByteStr.readFile for file access?
main = do fContents <- readFile filePath 
          let numList = readNums fContents
          putStrLn (show nums)

This works for small text files, but I want to use ByteString to read the file quickly. I found out that there is no read function for ByteString, instead you should write your own parser in attoparsec, since it supports parsing ByteStrings.

How can I use attoparsec to parse the nested list?

mrsteve
  • 4,082
  • 1
  • 26
  • 63

1 Answers1

5

The data seems to be in JSON format, so you can use Data.Aeson decode function which works on ByteString

import qualified Data.ByteString.Lazy as BL
import Data.Aeson
import Data.Maybe

main = do fContents <- BL.readFile filePath 
          let numList = decode fContents :: Maybe [[Int]]
          putStrLn (show $ fromJust numList)
Ankur
  • 33,367
  • 2
  • 46
  • 72
  • it's now at least 50% faster for small files, perhaps even more for bigger ones. great! – mrsteve Nov 11 '13 at 06:38
  • 1
    For a 50MB file this quickly uses 10GB of memory. How can I imporve the memory useage? – mrsteve Nov 11 '13 at 07:59
  • 1
    Well [[Integer]] isn't the most memory efficient format (there probably isn't anything worse)... [[Int]] would be a bit better but a Vector of Vector of Int would probably be the right answer (and it should be faster too). Aeson should know how to decode that perfectly well. – Jedai Nov 11 '13 at 16:55
  • I am using Int at the moment. Will try to use a Vector. I wonder if the problem is Aeson too, as I read somewhere that Aeson is very memory inefficient as it was meant to parse very fast small JSON messages. I have 100GB of RAM, so at the moment it works. – mrsteve Nov 11 '13 at 18:25