Take a look at this:
private static void Search(string[] input)
{
string datafile = @"C:\Users\User\Documents\text.txt";
string inputfile = @"C:\Users\User\Documents\input.txt";
string outputfile = @"C: \Users\User\Documents\output.txt";
string[] parameters = System.IO.File.ReadAllLines(inputfile);
string[] data = System.IO.File.ReadAllLines(datafile);
var index = new Dictionary<string, int>();
for (int i = 0; i < data.Length - 1; i += 2)
{
string currentline = data[i];
string[] splitline = currentline.Split(' ');
index[splitline[0].Trim('>')] = i;
}
foreach(var p in parameters)
{
if (index.ContainsKey(p))
Console.WriteLine($"Found {p} at line {index[p]}");
else
Console.WriteLine($"File doesn't contain {p}");
}
}
You're looking for exact text matches, so you'd be ideally suited to loading them into a Dictionary. They will be hashed. The dictionary provides fast lookup and you can store the line number where the item was found
You made far too many calls to reading files, including inside loops; don't. You had a call that read a parameters file into an array, then you read the file again just to get the count of lines (hint: array length)
I stripped it down to a single read of each file, loading the dictionary with all the search data then looking through it for each parameter in the parameters file and outputting whether it is found or not. I don't know what your intention is with the output file
Other hints: you could gain a bit by not doing a split- it's more expensive to break a string into an array of string when all you really want is the data between 1 and the first occurrence of ' ' space. As such the first loop could be reduced to
for (int i = 0; i <= data.Length; i +=2)
index[data[i].Substring(1, data[i].IndexOf(' ') -1)] = i;
You don't have to just store an int in the dictionary. You could upgrade it to store a class, use a tuple or anonymous type (A dictionary where value is an anonymous type in C#) and then you can track more than just the line number- you could track the whole data line, its number and the info line related to it, for example. Let's upgrade it.. I've added a routine at the top to generate some fake data and some other data that is and is not in the file:
static void Main()
{
Console.WriteLine(DateTime.Now + " generate some fake data");
StringBuilder datasb = new StringBuilder(100 * 1024 * 1024);//initialize for 100 megabytes
var para = new List<Guid>();
for (int i = 0; i < 500000; i++) {
var g = Guid.NewGuid();
datasb.AppendFormat(">{0} datapointname{1}\r\nInformation; generated at {2}\r\n", g, i, DateTime.Now);
if (i % 20000 == 0) //25 items in 500,000
para.Add(g);
if (i % 40000 == 0) //~12 items not findable in 500,000
para.Add(Guid.NewGuid());
}
var pfile = string.Join("\r\n", para.OrderBy(g => g.ToString()));
string datafile = @"C:\temp\text.txt";
string inputfile = @"C:\temp\input.txt";
string outputfile = @"C:\temp\output.txt";
//write fake files
File.WriteAllText(datafile, datasb.ToString());
File.WriteAllText(inputfile, pfile);
var start = DateTime.Now;
Console.WriteLine(DateTime.Now + " begin loading dictionary");
//BEGIN USEFUL PART
string[] parameters = System.IO.File.ReadAllLines(inputfile);
string[] data = System.IO.File.ReadAllLines(datafile);
var index = new Dictionary<string, Thing>();
for (int i = 0; i < data.Length - 1; i += 2)
{
string currentline = data[i];
string[] splitline = currentline.Split(' ');
Thing t = new Thing()
{
DataPointNumber = splitline[0].Trim('>'),
DataPointName = splitline[1],
Information = data[i + 1],
LineNumber = i
};
index[t.DataPointNumber] = t;
}
Console.WriteLine(DateTime.Now + " begin searching dictionary");
int found = 0, notFound = 0;
foreach (var p in parameters)
{
if (index.ContainsKey(p))
{
Console.WriteLine($" Found {p}: {index[p]}"); //ToString will be called
found++;
}
else
{
Console.WriteLine($" File doesn't contain {p}");
notFound++;
}
}
Console.WriteLine($"{DateTime.Now } search complete, searched {index.Count} items looking for {parameters.Length} items, found {found}, didnt find {notFound}, took {(DateTime.Now-start).TotalSeconds} seconds");
}
The bit you'll want for your program starts at //BEGIN USEFUL PART
, take a look at the timings when loading a file into a dictionary and searching it - on my machine it takes 1.5 seconds to find 38 items in half a million (~50mb text file), and this includes the time taken to load the stuff into the dictionary in the first place:
2019-09-05 07:54:17 generate some fake data
2019-09-05 07:54:19 begin loading dictionary
2019-09-05 07:54:21 begin searching dictionary
Found 0ae4b83a-95f0-46e1-acc2-fe802f51441b: Line:240000-0ae4b83a-95f0-46e1-acc2-fe802f51441b with info Information; generated at 2019-09-05 07:54:17
Found 0d007ca2-f21c-4d3c-b52d-fcd3833d31a7: Line:480000-0d007ca2-f21c-4d3c-b52d-fcd3833d31a7 with info Information; generated at 2019-09-05 07:54:18
Found 16849c07-c7a4-4b8b-b0fa-9ed8fd8dedde: Line:200000-16849c07-c7a4-4b8b-b0fa-9ed8fd8dedde with info Information; generated at 2019-09-05 07:54:17
Found 1afdc959-297d-43fe-8106-58c648c25d76: Line:400000-1afdc959-297d-43fe-8106-58c648c25d76 with info Information; generated at 2019-09-05 07:54:18
Found 21dcb6fd-1bd5-4920-b3fa-fd1a908f153d: Line:560000-21dcb6fd-1bd5-4920-b3fa-fd1a908f153d with info Information; generated at 2019-09-05 07:54:18
File doesn't contain 2944f7f7-2fa8-425a-bbf9-f833cfdb1fd2
Found 3b1c0712-2211-4a36-b6dd-739619142fa5: Line:80000-3b1c0712-2211-4a36-b6dd-739619142fa5 with info Information; generated at 2019-09-05 07:54:17
Found 3b2fb141-61e9-4b2d-8ad5-44171648ac03: Line:840000-3b2fb141-61e9-4b2d-8ad5-44171648ac03 with info Information; generated at 2019-09-05 07:54:19
File doesn't contain 487bc8d3-708d-40bc-9278-79ae34fb9732
File doesn't contain 4a9b40b4-fe53-4ba8-9842-6f8d99dd405a
Found 528943c2-d243-4963-b98d-6c60f9c5e118: Line:600000-528943c2-d243-4963-b98d-6c60f9c5e118 with info Information; generated at 2019-09-05 07:54:18
Found 53f60bb6-cf12-4ac7-a0c0-c0d0daf9571f: Line:760000-53f60bb6-cf12-4ac7-a0c0-c0d0daf9571f with info Information; generated at 2019-09-05 07:54:18
File doesn't contain 574d8611-eeec-4ea4-882e-9ea6f8c3a553
Found 591c5ce9-c32f-4f88-a620-6f2b9f90de35: Line:120000-591c5ce9-c32f-4f88-a620-6f2b9f90de35 with info Information; generated at 2019-09-05 07:54:17
Found 60ecbf50-c362-42d2-80e2-666c339b87cc: Line:0-60ecbf50-c362-42d2-80e2-666c339b87cc with info Information; generated at 2019-09-05 07:54:17
Found 63a07cb7-e416-4da6-8a2f-a33fafc6c5d7: Line:720000-63a07cb7-e416-4da6-8a2f-a33fafc6c5d7 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain 69083432-4cb9-484c-8cd2-6b5412b1fccf
Found 705dae03-54d8-48b0-a2ab-7d82d8afc59c: Line:40000-705dae03-54d8-48b0-a2ab-7d82d8afc59c with info Information; generated at 2019-09-05 07:54:17
Found 7182cb17-5070-4801-92d4-bc01bc05e851: Line:960000-7182cb17-5070-4801-92d4-bc01bc05e851 with info Information; generated at 2019-09-05 07:54:19
Found 71dbc2a3-4a40-4ce3-b3c2-1039aa866bf8: Line:360000-71dbc2a3-4a40-4ce3-b3c2-1039aa866bf8 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain 7cc9f35b-9524-4f95-b580-fbeef80c0557
Found 8f8e89ae-3dcf-4a8a-bf34-36a1078c88c6: Line:800000-8f8e89ae-3dcf-4a8a-bf34-36a1078c88c6 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain 9807a242-48dc-47f2-8963-af323bf61b5c
Found 9c8ccbfd-ff70-4fc5-b3a7-02872a9c731c: Line:680000-9c8ccbfd-ff70-4fc5-b3a7-02872a9c731c with info Information; generated at 2019-09-05 07:54:18
File doesn't contain a3f6d083-588e-4337-b800-56af12bde5a9
Found abe63355-6df4-452c-9b56-9879961cba38: Line:440000-abe63355-6df4-452c-9b56-9879961cba38 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain b709726c-e5f6-432e-8e22-4cea924ae29b
Found b7c040c5-b5f9-4744-a0ec-61744c2f65d6: Line:640000-b7c040c5-b5f9-4744-a0ec-61744c2f65d6 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain bbdc590a-bbb6-42b0-8ba3-00c4bffa0a2a
File doesn't contain bd0c1164-d754-41f4-afd3-bed92d4063f4
Found c351a7ee-f4b8-449b-86d4-6d663942939f: Line:880000-c351a7ee-f4b8-449b-86d4-6d663942939f with info Information; generated at 2019-09-05 07:54:19
Found c9296a3c-2167-4c40-b4dc-3d25b9fa285a: Line:320000-c9296a3c-2167-4c40-b4dc-3d25b9fa285a with info Information; generated at 2019-09-05 07:54:18
Found cdfbce4a-cb9a-4617-a6c5-366bbbd6872f: Line:160000-cdfbce4a-cb9a-4617-a6c5-366bbbd6872f with info Information; generated at 2019-09-05 07:54:17
File doesn't contain d057453e-c91b-4f11-8770-600780200835
Found e0361b8a-25e1-4d6c-ae17-0f3ccb6f85fa: Line:280000-e0361b8a-25e1-4d6c-ae17-0f3ccb6f85fa with info Information; generated at 2019-09-05 07:54:18
Found e38f4fb8-fc51-40af-bc06-60188139a0ba: Line:920000-e38f4fb8-fc51-40af-bc06-60188139a0ba with info Information; generated at 2019-09-05 07:54:19
Found f4794410-7873-4fc5-adc2-7750667f88a7: Line:520000-f4794410-7873-4fc5-adc2-7750667f88a7 with info Information; generated at 2019-09-05 07:54:18
File doesn't contain f736b5e2-acea-44f2-89eb-090fbe6cc50c
2019-09-05 07:54:21 search complete, searched 500000 items looking for 38 items, found 25, didnt find 13, took 1.5084395 seconds