0

I'm implementing patterns mining algorithm, and usually input data are file with the following format

item1 item2 item3
item0 item3 item10
....
item30 item40 item30

where usually itemx is a String. To be efficient, I used to read the file with ByteString which is faster than the default String. Since the great task in patterns mining algorithms is comparison between items sets. I wonder How faster or slower my program will be if I change the input file format in order to make comparison between Int instead of comparison between ByteString. Here is the novel format :

1 2 3
0 3 10
....
30 40 30

thanks !

Fopa Léon Constantin
  • 11,863
  • 8
  • 48
  • 82
  • 3
    are you familiar with the excellent [criterion library](http://hackage.haskell.org/package/criterion)? I would suggest doing a few quick benchmarks that represent your use case and answer your own question – jberryman Jan 02 '13 at 04:04
  • 2
    Or better yet, don't use a text-based file format at all and just store sets of X bit words, how ever small you can make them. – Thomas M. DuBuisson Jan 02 '13 at 05:25

1 Answers1

3

If you restrict yourself to just asking whether the equality function on Int - given by the eqInt# primop - is faster than the equality function on bytestrings -

primop   IntEqOp  "==#"   Compare
   Int# -> Int# -> Bool
   with commutable = True

vs

eq :: ByteString -> ByteString -> Bool
eq a@(PS fp off len) b@(PS fp' off' len')
  | len /= len'              = False    -- short cut on length
  | fp == fp' && off == off' = True     -- short cut for the same string
  | otherwise                = compareBytes a b == EQ
{-# INLINE eq #-}

Then the Int case will be faster. No doubt.

However, if you have to parse your bytestring input (or String input) into Int tokens first, you might lose.

The only way to really know here is to measure.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468