1

This feels like kind of a long shot, but I wrote a pipe to connect to a database, get a list of databases on the server, connect to each one, then perform a query on each one (a user count), then print those counts. Unfortunately this is about as much as I can simplify it from my real example. I'm using pipes-4.1.0, pipes-safe-2.0.2, and mysql-simple-0.2.2.4. Here is the code:

{-# LANGUAGE RankNTypes, OverloadedStrings #-}

import Pipes
import qualified Pipes.Safe as PS
import qualified Pipes.Prelude as P
import Database.MySQL.Simple

import qualified Data.Text as T

import Control.Monad.Catch as MC

import Control.Monad (forever)

import Database.MySQL.Simple.QueryParams
import Database.MySQL.Simple.QueryResults

data DBName = DBName T.Text deriving Show

-- connect to a database and use a table.
mydb :: T.Text -> ConnectInfo
mydb = undefined

-- Quirk of (mysql|postgresql)-simple libraries
unOnly (Only a) = a

queryProducer :: (MonadIO m, QueryParams params, QueryResults r) => Connection -> Query -> params -> Pipes.Producer' r m ()
queryProducer = undefined

myDBNames :: (PS.MonadSafe m, MonadIO m) => Producer DBName m ()
myDBNames = PS.bracket (liftIO $ connect $ mydb "sometable") (liftIO . close) $ \db ->
  queryProducer db "show databases" () >-> P.map (DBName . unOnly)

-- I realize this is inefficient, one step at a time.
connectToDB :: (PS.MonadSafe m, MonadIO m) => Pipe DBName Connection m ()
connectToDB = forever $ do
      (DBName dbname) <- await
      PS.bracket
        (liftIO . connect . mydb $ dbname)
        (liftIO . close)
        yield

userCount :: (PS.MonadCatch m, MonadIO m) => Pipe Connection Int m ()
userCount = forever $ do
  db <- await
  queryProducer db "select count(*) from user" () >-> P.map unOnly

main :: IO ()
main = PS.runSafeT $ runEffect $ myDBNames >-> P.tee P.print >-> connectToDB >-> userCount >-> P.print

This works fine. However, let's say in one of those databases, the user table is named users instead of user, therefore mysql-simple will throw an exception when that query is run. I want to catch that error inline, and just return 0 users for those queries, but keep going. Things I've tried:

(queryProducer db "select count(*) from user" () `PS.catchAll` (\e -> (liftIO $ putStrLn "failure") >> yield (Only 0))) >-> P.map unOnly 

This doesn't work. Sometimes it will print failure and yield a 0, only to immediately terminate on exception. I thought maybe it is because I broke out of queryProducer with the exception, and I should call it again so I tried this recursive version:

thequery db >-> P.map unOnly
where
  thequery db = queryProducer db "select count(*) from user" () `PS.catchAll` (\e -> (liftIO $ putStrLn "failure") >> yield (Only 0) >> thequery db)

But this also fails. However sometimes it will actually perform a several queries, printing out failure a few times and yielding a few 0's before terminating with an exception again. I'm really confused about why this is happening.

According to the async library, exceptions should be send up into the thread that the pipe is running in, so it doesn't seem like it could be a threading problem.

In case the implementation of my queryProducer matters, it is modeled after the pipes-postgresql query function, generalized to Producer' so I can embed it in other combinators. Below mysql-simple, in the mysql library there is a throw which throws a ConnectionError if your sql doesn't make sense, that percolates all the way up through this function.

{-# LANGUAGE RankNTypes #-}

import qualified Control.Concurrent.Async as Async
import qualified Control.Concurrent.STM as STM
import qualified Database.MySQL.Simple as My
import Database.MySQL.Simple.QueryParams
import Database.MySQL.Simple.QueryResults
import qualified Pipes
import qualified Pipes.Concurrent as Pipes

--------------------------------------------------------------------------------
-- | Convert a query to a 'Producer' of rows.
--
-- For example,
--
-- > pg <- connectToMysql
-- > query pg "SELECT * FROM widgets WHERE ID = ?" (Only widgetId) >-> print
--
-- Will select all widgets for a given @widgetId@, and then print each row to
-- standard output.
queryProducer 
    :: (MonadIO m, QueryResults r, QueryParams params)
    => My.Connection -> My.Query -> params -> Pipes.Producer' r m ()
queryProducer c q p = do
    (o, i, seal) <- liftIO (Pipes.spawn' Pipes.Single)
    worker <- liftIO $ Async.async $ do
        My.fold c q p () (const $ void . STM.atomically . Pipes.send o)
        STM.atomically seal
    liftIO $ Async.link worker
    Pipes.fromInput i

I also attempted to use EitherT to try and catch exceptions since that seems to be the way it was done in the past in pipes. But the documentation for that in pipes' tutorial disappeared between 3 and 4 making me wonder if that technique is still recommended or not. Unfortunately I could not get it to work because the way I am using queryProducer instead of singular await/yields, I'm not sure how to structure it.

David McHealy
  • 2,471
  • 18
  • 34
  • What happens if, in `queryProducer`, you remove the line `liftIO $ Async.link worker` and change the last line to something like `Pipes.fromInput i <* liftIO (wait worker)` ? – danidiaz Mar 19 '14 at 21:44
  • 1
    This is still supported and you can find documentation for this in `Pipes.Lift`. Use `Pipes.Lift.catchError` for now and I will write up an answer later today. – Gabriella Gonzalez Mar 19 '14 at 22:13
  • 1
    So I'm not sure if this is related or not, but there is a known race condition in the `a <- async ...; link a` idiom, which is that an exception could be raised in between the `async` and `link` commands. – Gabriella Gonzalez Mar 20 '14 at 02:52
  • Can you also open an issue for this on the `pipes-safe` issue tracker on Github? I think this will take me a while to solve and I want to keep track of this outside of Stack Overflow. – Gabriella Gonzalez Mar 20 '14 at 02:54
  • It is definitely a race condition in the async library as you said. When I put a small threadDelay before the fold the problem goes away. – David McHealy Mar 20 '14 at 15:20
  • Oh and I tried the <* liftIO (wait worker). It runs very slow and flips out with a different mysql exception. I don't know why. – David McHealy Mar 20 '14 at 15:33

1 Answers1

2

Based on Gabe's comment, I fixed my queryProducer function by making sure the query cannot happen until the link function has fired.

query :: (MonadIO m, QueryResults r, QueryParams params) => My.Connection -> My.Query -> params -> Pipes.Producer' r m ()
query c q p = do
    (o, i, seal) <- liftIO (Pipes.spawn' Pipes.Single)
    mvar <- liftIO $ newEmptyMVar
    worker <- liftIO $ Async.async $ do
        takeMVar mvar
        My.fold c q p () (const $ void . STM.atomically . Pipes.send o)
        STM.atomically seal
    liftIO $ Async.link worker
    liftIO $ putMVar mvar ()
    Pipes.fromInput i

I've tested this and it seems to work.

David McHealy
  • 2,471
  • 18
  • 34