0

I have 6 GB's of ".txt" data file to read and sort. In order to do this fast enough, I thought about using multiple threads reading different chunks of same file at the same time and sort read lines. Is there any way to do this?

To visualize the task:

Dim SalaryTXTPath as string = "C:\6_GB_Salary_Data.txt"

Dim Threads As Integer = 3

Dim RichCount As Integer = 0
Dim PoorCount As Integer = 0
Dim MidCount As Integer = 0
Dim ReadTHR As System.Threading.Thread

Private Sub Button1_Click() HandlesButtton1.Click
    Dim CHUNKS() As DataChunk = Get.DataChunks(SalaryTXTPath , Threads ) '<-- Separating file into 3 chunks'
    For i = 0 to Threads 
        ReadTHR  = New System.Threading.Thread(Sub() ReadTXTChunks(CHUNKS(i))) '<--- Send each chunk to new thread'
        ReadTHR.Start()
    Next
End Sub


Private Sub ReadTXTChunks(CHUNK As DataChunk)
    Me.CheckForIllegalCrossThreadCalls = False

    For Each line As String In File.ReadAllLines(CHUNK) 'Reading lines of chunk'

SyncLock "Sorting"
     Select Case Convert.ToInt32(line)
       Case < 100
         PoorCount+=1
       Case < 1000
         MidCount+=1
       Case < 100000
         RichCount+=1
     End Select
End SyncLock 

   Next
End Sub

Note: The code above is hypothetical to visualize the task. There could be some wrong usages.

Edit: I Solved the problem with using Parallel.ForEach. The Parallel method is new for me and since there is not much VB.NET example compared to C# figuring out the syntax took a bit time but thanks to your comments I discovered this method.

Parallel ForEach Syntax "VB.NET"

        Dim CancelToken As CancellationTokenSource = New CancellationTokenSource() 'The Token For Cancelling Task if needed

        Dim POptions As ParallelOptions = New ParallelOptions() 'Option Argument For Parallel.ForEach 
        POptions.MaxDegreeOfParallelism =  Environment.ProcessorCount 'max threads
        POptions.CancellationToken = CancelToken.Token 'Setting The Cancellation Token



 Parallel.ForEach(File.ReadAllLines("Filepath"), POptions, Sub(ReadedLine)

'YOUR CODE, 
'FOR EXAMPLE:
'Richtextbox1.Invoke(Sub()
'Richtextbox1.Text+= ReadedLine
'End Sub)



                                                                                
                                                                             End Sub)


Thanks for helping...

Sean Flake
  • 11
  • 1
  • 3
    Although I'm no C# nor .NET expert, parallelizing stream input/output is likely to present more issues than it solves. And the same thing goes for many (most) sorting alorithms. – Adrian Mole Jul 25 '20 at 16:10
  • 1
    This: `Me.CheckForIllegalCrossThreadCalls = False` is one of the worst things you could possibly do, in any scenario. `File.ReadLines()` is your friend here. Then you have to partition the elaboration of your data, that's where threading can come in handy, if you get it right. Reading from a file, not so much: a hard drive (SSD drives make it a little better) can only read from one position at the time, the Heads need to be moved to read from another position, which can cost more than just reading as sequentially as possible (depending on the fragmentation level of the device) – Jimi Jul 25 '20 at 16:50
  • 1
    Why in the world would this type of data be stored in a text file? – Mary Jul 25 '20 at 23:02
  • @Jimi , I Totally agree `Me.CheckForIllegalCrossThreadCalls = False` is a really messy thing, However the code I gave is just and example to visualize the task. If somebody reading this comment strogly suggest using Delegates/Invoke. – Sean Flake Jul 26 '20 at 04:51
  • @Mary My program is not dealing with necessaryly for this kind of data. The purpose I am coding this task is actually for the sake of supporting multiple file extensions. – Sean Flake Jul 26 '20 at 04:57

0 Answers0