0

I'm still relatively new to TPL Dataflow, and not 100% sure if I am using it correctly or if I'm even suppose to use it.

I'm trying to employ this library to help out with file-copying+file-upload.

Basically the structure/process of handling files in our application is as follows:

1) The end user will select a bunch of files from their disk and choose to import them into our system.

2) Some files have higher priority, while the others can complete at their own pace.

3) When a bunch of files is imported here is the process:

  • Queue these import requests, one request maps to one file
  • These requests are stored into a local sqlite db
  • These requests also explicitly indicate if it demands higher priority or not
  • We currently have two active threads running (one to manage higher priority and one for lower)
  • They go into a waiting state until signalled.
  • When new requests come in, they get signalled to dig into the local db to process the requests.
  • Both threads are responsible for copying the file to a separate cached location, so just a simple File.Copy call. The difference is, one thread does the actual File.Copy call immediately. While the other thread just enqueues them all onto the ThreadPool to run. -Once the files are copied, the request gets updated, the request has a Status enum property that has different states like Copying, Copied, etc.
  • The request also requires a ServerTimestamp to be set, the ServerTimestamp is important, because there are times where a user may be saving changes to a file that's essentially the same, but has different versions, so the order is important.
  • Another separate thread is running that gets signalled to fetch requests from the local DB where the status is Copied. It will then ping an endpoint to ask for a ServerTimestamp, and update the request with it
  • Lastly once the request has had the file copy complete and the server ticket is set, we can now upload the physical file to the server.

So I'm toying around with using TransformBlock's

1- File Copy TransformBLock

  • I'm thinking there could be two File Copy TransformBlock's one that's for higher priority and one for lower priority.

My understanding is that it uses the TaskScheduler.Current which uses the ThreadPool behind the scenes. I was thinking maybe a custom TaskScheduler that spawns a new thread on the fly. This scheduler can be used for the higher priority file copy block.

2- ServerTimestamp TransformBlock

  • So this one will be linked to the 1st block, and take in all the copied files in and get the server timestamp and set it int he request.

3-UploadFIle TransformBlock

  • This will upload the file

Problems I'm facing:

Say for example we have 5 file requests enqueued in the local db.

File1 File2 File3-v1 File3-v2 File3-v3

We Post/SendAsync all 5 requests to the first block.

If File1,File2,File3-v1,File3-v3 are successful but File3-v2 fails, I kind of want the block to not flow onto the next ServerTimestamp block, because it's important the File3 versions are completely copied before proceeding, or else it will go out of order.

But this kind of leads onto how is it going to retry correctly and have the other 4 files that had already been copied move with it over to the next block?

I'm not sure if I am structuring this correctly or if TPL Dataflow supports my usecase.

wpa
  • 220
  • 3
  • 11
  • It seems TPL Dataflow would not help you. You seem to want to apply a lot of custom rules on how and when to execute work. TPL Dataflow will not assist but resist you in doing this. Maybe you should represent each file as a simple `async Task ProcessFile(string path)` method that sequentially performs all the steps incl. the local database updates. Then you need one high and one low prio thread/task to find work and just call that method. Can you speak to the feasibility of this? What parts of your requirements would not be solved by this? – usr Feb 03 '18 at 15:47
  • `TPL Dataflow` doesn't work well if there are priorities in queue - it's sequential. More over, it process each item independently from other ones, so it will be easier for you to have some collection and consumers of it, which will make all the checks you need. You can start with `BlockingCollection`, for example, create two of them for each priority, and consume messages from there with some notification logic for a new version of a file – VMAtm Feb 06 '18 at 00:30

0 Answers0