1

I'm walking a directory recursively, in a conventional way. This is a working prototype:

traverseFlatDst :: FilePath -> Int -> Int -> FilePath -> IO ()
traverseFlatDst dstRoot total totw srcDir = do
  (dirs, files) <- listDir srcDir
  mapM_ (\file -> putStrLn (printf "%s" (strp file))) files    -- tracing
  let traverse = traverseFlatDst dstRoot total totw
  mapM_ traverse dirs

I have a not so unusual request: every trace line should be numbered (it is not really for tracing). Like this:

traverseFlatDst :: FilePath -> Int -> Int -> FilePath -> IO ()
traverseFlatDst dstRoot total totw srcDir = do
  (dirs, files) <- listDir srcDir
  mapM_ (\file -> putStrLn (printf "%d: %s" counterFromNowhere (strp file))) files
  let traverse = traverseFlatDst dstRoot total totw
  mapM_ traverse dirs

All the solutions I've seen so far are ugly beyond imagination, if applicable at all. Is there a nice way to manage it?

Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46

5 Answers5

3

You can accomplish this by adding an additional effect to your function; namely, the state effect.

import Control.Monad.State

printPath :: (PrintfArg t, Show a) => (t, a) -> IO ()
printPath (l, file) = printf "%d : %s\n" l (show file)

traverseFlatDst :: Path Abs Dir -> IO ()
traverseFlatDst =
  let loop srcDir = do
        (dirs, files) <- listDir srcDir
        i <- get
        put (i + length files)
        mapM_ (liftIO . printPath) $ zip [i..] files
        mapM_ loop dirs
  in \s -> evalStateT (loop s) 0

(NB: I've also removed unused parameters for clarity).

However, I don't suggest writing this function this way. Semantically, your function is collecting a bunch of file paths. Instead of printing them, you should just return them from the function; you can always print them later! The logic of the modified version is actually quite simpler:

traverseFlatDst' :: Path Abs Dir -> IO [Path Abs File]
traverseFlatDst' srcDir = do
  (dirs, files) <- listDir srcDir
  (concat . (files:)) <$> mapM traverseFlatDst' dirs

You can use this function to print the files with numbers without keeping track of some state explicitly, since you have access to all the files 'at once':

> traverseFlatDst' somePath >>= mapM_ printPath . zip [0..]

Note also that the 2nd version is much more strict than the first; it will traverse the entire directory tree before starting to print anything. As a general rule, the strict version is better anyways, but if you want the lazy version, you can write it using unsafeInterleaveIO:

import System.IO.Unsafe (unsafeInterleaveIO)

traverseFlatDst' :: Path Abs Dir -> IO [Path Abs File]
traverseFlatDst' srcDir = do
  (dirs, files) <- listDir srcDir
  files' <- unsafeInterleaveIO $ mapM traverseFlatDst' dirs
  return $ concat $ files:files'
user2407038
  • 14,400
  • 3
  • 29
  • 42
  • Thanks! Your suggestion is quite reasonable. I'd like to pursue also the "invasive" take. Any ideas how to perfect it? (See the UPD) – Alexey Orlov Dec 11 '17 at 18:07
  • @AlexeyOrlov Not sure what you mean by "the "invasive" take". The first function in this answer is the one which solves the problem as originally stated. – user2407038 Dec 11 '17 at 18:19
  • The UPD part above. Is it a dead end? – Alexey Orlov Dec 11 '17 at 18:21
  • I'm not sure what that code is trying to do. You don't need the entire `[Int]` list of indices as a function parameter, just the next index. Furthermore, `cnt` never changes in that function. Do you see how the recursive call to `loop` is called with `i+length files` instead of the previous value `i`? – user2407038 Dec 11 '17 at 18:26
  • Would you please add type annotation to you uppermost version of `traverseFlatDst` (and check syntax again)? – Alexey Orlov Dec 11 '17 at 18:59
  • ` parse error on input ‘<-’ Perhaps this statement should be within a 'do' block?` – Alexey Orlov Dec 11 '17 at 19:14
  • I've verified there are no syntax errors and added type annotations. [Here](http://lpaste.net/3278097500493316096) is the complete file – user2407038 Dec 11 '17 at 19:23
  • Thank you very much! I grokked the syntax, but there is still a hiccup (two number one files, UPD2). – Alexey Orlov Dec 11 '17 at 19:31
  • So it turns out the logic of my function wasn't correct when there are multiple subdirectories to be traversed. See edit for the fixed version. Also note that the version which returns the list of files doesn't suffer this problem. – user2407038 Dec 11 '17 at 19:39
  • Thanks! I'll have to think it over. – Alexey Orlov Dec 11 '17 at 19:56
3

I would probably use a streaming library like streaming to separate enumerating the files from adding the numbers and from printing the decorated entries:

import Streaming
import qualified Streaming.Prelude as S

traverseFlatDst :: FilePath -> Int -> Int -> FilePath -> Stream (Of FilePath) IO ()
traverseFlatDst dstRoot total totw srcDir = do
  (dirs, files) <- liftIO $ listDir srcDir
  S.each files
  mapM_ (traverseFlatDst dstRoot total totw) dirs

decorate :: Stream (Of FilePath) IO r -> Stream (Of (Int,FilePath)) IO r
decorate stream = S.zip (S.enumFrom 1) stream

display:: Stream (Of (Int,FilePath)) IO () -> IO ()
display = S.mapM_ $ \(index,path) ->  putStrLn $ show index ++ " " ++ path

Where S.each, S.zip, S.mapM_ are from streaming.

danidiaz
  • 26,936
  • 4
  • 45
  • 95
0

From nowhere? Of course not.

You can zip your files with numbers, though, and then mapM over them:

mapM_ (\(file, counter) -> putStrLn (printf "%d: %s" counter (strp file))) (zip [0..] files)
arrowd
  • 33,231
  • 8
  • 79
  • 110
  • Your solution have at least syntax. Also I am not getting the \. Thanks, anyway :) . – Alexey Orlov Dec 11 '17 at 17:08
  • This won't work because each recursive call to `traverseFlatDst` will number the printed lines starting at 0. – user2407038 Dec 11 '17 at 17:14
  • @AlexeyOrlov fixed typo. If you want counter to preserve its value for nested calls, there is nothing you can do except introducing an additional argument. – arrowd Dec 11 '17 at 17:17
  • Some trouble is still there. Never mind, I see the idea now. And yes, I want to keep the counter right through. But where can I introduce that magic argument? – Alexey Orlov Dec 11 '17 at 17:24
  • In `traverseFlatDst`. – arrowd Dec 11 '17 at 17:42
0

Final solution, borrowed from How to implement a global counter using Monad?

import Data.IORef

type Counter = Int -> IO Int

makeCounter :: IO Counter
makeCounter = do
  r <- newIORef 0
  return (\i -> do modifyIORef r (+i)
                   readIORef r)


printPath :: Counter -> FilePath -> IO ()
printPath counter file = do
  n <- counter 1
  putStrLn (printf "%d : %s" n (strp file))


traverseFlatDst :: FilePath -> Int -> Int -> Counter -> FilePath -> IO ()
traverseFlatDst dstRoot total totw cnt srcDir = do
  (dirs, files) <- listDir srcDir
  let iterate = printPath cnt
  mapM_  iterate files                -- tracing
  let traverse = traverseFlatDst dstRoot total totw cnt
  mapM_ traverse dirs


groom :: FilePath -> FilePath -> Int -> IO ()
groom src dst total = do
  counter <- makeCounter
  let totWidth = length $ show total
  traverseFlatDst dst total totWidth counter src
  putStrLn (printf "total: %d, width: %d" total totWidth)

Still caged and can't be used anywhere, but it's OK. Not ugly.

Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46
  • Using an `IORef` to implement something which can be written as a pure function is typically considered 'ugly' (i.e. not idiomatic). – user2407038 Dec 12 '17 at 00:29
  • How do you mean? If pure functions were not useless for the job, this class of topics wouldn't have existed. – Alexey Orlov Dec 12 '17 at 17:39
  • 1
    I think @user2407038 is suggesting that `IORef`s should only be used when `StateT` can't do the job. For example, if you build data structures that include `IORef`s (especially if they have lots of them, and/or also have `IOArray`s), then you're probably doing something legit. The same holds if you're using an `IORef` to implement inter-thread communication using CAS. But if you just have one counter doing counter business, it smells a bit off. – dfeuer Dec 12 '17 at 21:11
  • I see. On the other hand, the very existence of this particular counter is due to i/o needs. So it belongs where it should. – Alexey Orlov Dec 13 '17 at 05:34
0

This solution doesn't require additional libraries, processes each file the moment it is found and, for the sake of separating concerns, doesn't require traverseFlatDst to know what is being done with the produced files.

This last feature is achieved by passing a small effectful state machine (a step function, really) as a parameter to traverseFlatDst, and making traverseFlatDst polymorphic in the machine state, so it doesn't know anything about it:

{-# language RankNTypes #-}
import Control.Monad (foldM)

type Source e = forall s. (s -> e -> IO s) -> s -> IO s

traverseFlatDst :: FilePath -> Int -> Int -> FilePath -> Source FilePath
traverseFlatDst dstRoot total totw srcDir step state = do
  (dirs, files) <- listDir srcDir
  state' <- foldM step state files
  foldM (\s path -> traverseFlatDst dstRoot total totw path step s) state' dirs

-- Pass this as the step argument to traverseFlatDst
-- The counter is the state.
step :: Int -> FilePath -> IO Int
step index path = do
    putStrLn $ show index ++ " " ++ path
    return $ succ index
danidiaz
  • 26,936
  • 4
  • 45
  • 95