0

I have a txt file with several integer matrices with different dimensions that I want to parse into an hmatrix package representation, but I can't find any suitable functions. The text file contains the following form:

[single-value]
[single-row 1x10 matrix]
[16x16 square-matrix]
repeats unknowingly often

e.g.

9
1 2 3 ..
9 8 7 6 5 ...
.
.
4 3 2 1 0 ..
...

The closest thing I found was readMatrix at:

https://hackage.haskell.org/package/hmatrix-0.17.0.1/docs/Numeric-LinearAlgebra-Devel.html#v:readMatrix

but since there is no documentation and I'm fairly new to Haskell I have no idea how to use it.

Peter David Carter
  • 2,548
  • 8
  • 25
  • 44
manews
  • 340
  • 2
  • 12

1 Answers1

1

As long as performance isn't crucial, it's easiest to first preprocess the data as simple lists before introducing any special types like matrices. (And if performance does matter, you shouldn't be using text files!)

So first

readAllNumbers :: String -> [[Double]]
readAllNumbers = map (map read . words) . lines

Then you seperate the structure. In this case, you simply take the first two elements of the list of lines specially, then chunk up the remaining lines à 16. Well, that's about it, you can then simply cast the [nested] Double lists to matrices:

parseMContents :: String -> (Double, (HMat.Matrix, [HMat.Matrix]))
parseMContents s = case readAllNumbers s of
     [singleValue] : singleRow : rest
           -> (singleValue, ( HMat.fromLists [singleRow]
                            , HMat.fromLists <$> chunksÀ 16 rest ) )
     _ -> error "Matrix file has wrong format!"

chunksÀ :: Int -> [a] -> [[a]]
chunksÀ n ls = case splitAt n ls of
            (hs:[]) -> [hs]
            (hs:ts) -> hs : chunksÀ n ts
leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • And what about 'repeats unknowingly often' part? – ДМИТРИЙ МАЛИКОВ Apr 24 '16 at 11:28
  • What “what about” about it? – leftaroundabout Apr 24 '16 at 11:56
  • @ДМИТРИЙМАЛИКОВ The code in this answer is operating on a string which represents the entire contents of the file. The function 'consumes' (i.e. examines every character) of the string (unless it encounters an error) so logically speaking it will parse all of the occurrences of the pattern. As an aside, I'm a little confused about the naming of `chunksÀ` but I guess it *is* value identifier with `UnicodeSyntax`. – user2407038 Apr 24 '16 at 15:20
  • @ДМИТРИЙ МАЛИКОВ I meant that the file ends with multiple 16x16 matrices and that i don't know the exact number of those. leftaroundabout got it right. – manews Apr 26 '16 at 18:24
  • @leftaroundabout Many thanks. That gives me a good idea how to do it. Performance is not that crucial, but i had hoped that there is an existing matrix-parser (e.g. for `Parsec` that doesn't operate on `String`) that i can combine more elegantly. Like `parseMat rowCount colCount` – manews Apr 26 '16 at 18:31
  • Well, a `Parsec` solution would definitely be preferrable, in particular _safer_. It's not very hard to code that up yourself, either, but if you mainly need this for a quick one-time data processing script then I'd say the list-based approach is just fine. – leftaroundabout Apr 26 '16 at 18:33