0

I need help in better arranging the code in .NET TPL dataflow. here is the code

    var finalBlock = new ActionBlock<Category_KeywordsToMatch>(x =>
    {
        List<Resume> Resumes = new List<Resume>();

        using (var context = new IndepthRecruitDbContext())
        {
            Resumes = context.Resumes.Include("Candidate").ToList();
        }
        foreach (var res in Resumes)
        {
            var keywords = FindKeywords(x.KeywordsToMatch, res);
            if (keywords.Count > 0)
            {
                matchedCandidates_dataflow.Add(new MatchedCandidate
                {
                    Id = res.CandidateId,
                    Name = res.Candidate.Name,
                    Url = res.Url,
                    Uploaded = res.DateUploaded.ToShortDateString(),
                    MatchedKeywordsList = keywords
                });
            }
        }
    });

This is the final block of my chain. Here action block input is Category_KeywordsToMatch which is a class containing Job category and list of keywords to match in a resume. {Category, List< Keywords >}. Inside block I am using foreach loop to enumerate through a List of resumes. Is there any better design using dataflow, like Resumes can be supplied as different input. Final block is the last block for one category. I need to search keywords for multiple categories.

  • If your `Resumes` doesn't change, you can do the `WriteOnceBlock`, or you may give it a try with `BroadcastBlock`, it accepts the delegate for copying the value. – VMAtm Aug 19 '17 at 11:16
  • Thanks for reply. I want to know can I split this final block in some chain of blocks. I am a starter in tpl dataflow and there are not much tutorials on the same. Resumes does not change. I was thinking of supplying a tuple of a resume and Category_KeywordsToMatch to other block and then making the final result in further blocks. @VMAtm can you please give me your skype or any other id so I can contact you. – user3522311 Aug 19 '17 at 15:04

1 Answers1

0

Assuming matchedCandidates_dataflow is some thread safe collection you can process match the resumes in parrallel and add them to it once they ready. Below I attached sample code of one way to do it. For example calling BuildMatchingBlock(4) will give you a block that will make sure you process up to 4 WorkItems in parralel. Assuming you do only CPU work here I recommend to make sure the parallelism level is not greater than you number of cores. I chose to use the "SendAsync" interface, but you could use "Post" as well, make sure you understand the differences between them and pick the one that suits you best. Also you could pass a BoundedCapacity parameter to ExecutionDataflowBlockOptions

    private class WorkItem
    {
        public Category_KeywordsToMatch CategoryKeywordsToMatch { get; set; }
        public Resume Resume { get; set; }
        public WorkItem(Category_KeywordsToMatch c, Resume r)
        {
            CategoryKeywordsToMatch = c;
            Resume = r;
        }
    }

    private ActionBlock<Category_KeywordsToMatch> BuildMatchingBlock(int matchingParallelism)
    {
        var finalBlock = new ActionBlock<WorkItem>(
            workItem =>
            {
                var keywords = FindKeywords(workItem.CategoryKeywordsToMatch.KeywordsToMatch, workItem.Resume);
                if (keywords.Count > 0)
                {
                    // match...
                }
            },
            new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = matchingParallelism });

        var preparatorBlock = new ActionBlock<Category_KeywordsToMatch>(
            async x =>
            {
                List<Resume> Resumes = new List<Resume>();
                // load resumes...

                foreach (var res in Resumes)
                {
                    await finalBlock.SendAsync(new WorkItem(x, res)).ConfigureAwait(false);
                }
            });

        return preparatorBlock;
    }
Zorik
  • 207
  • 1
  • 3
  • 9