-1

I'm trying to create a MD5 malware scanner using C#. Using normal dictionary comparison has a fatal flaw, there exists duplicate files with the same hash across directories so, the same key (md5) would represent a lot of file directories to relate with. I tried switching to KeyValuePair<> but due to my inexperience, I still can't figure out how to insert lambda coordinate output into KeyValuePair<> (represented by Idon'tknowwhatshouldbehere in the code below).

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Security.Cryptography;
using System.Diagnostics;
using System.Text;
using System.Web;
using static System.Net.WebRequestMethods;

namespace RiskRemover
{
    class Program
    {
        private static void Main(string[] args)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var currDir = Directory.GetCurrentDirectory();
            Console.WriteLine("Stage 1: Update");
            HttpWebRequest updRq = (HttpWebRequest)WebRequest.Create("https://www.googleapis.com/drive/v3/files/15WR2yTVJzgwg2pn64IhxFUbfy2BmmsdL?alt=media&key=APIKey");
            updRq.Referer = "referrer";
            HttpWebResponse updRqF = (HttpWebResponse)updRq.GetResponse();
            using (Stream output = System.IO.File.OpenWrite("virushashesL.txt"))
            using (Stream input = updRqF.GetResponseStream())
            {
                input.CopyTo(output);
            }
            bool dbExist = System.IO.File.Exists($"{currDir}\\virushashesL.txt");
            if (!dbExist)
            {
                Console.WriteLine("Database Doesn't exist, Terminating...");
                return;
            }
            var lineCount = System.IO.File.ReadLines($"{currDir}\\virushashesL.txt").Count();
            Console.WriteLine(" ");
            Console.WriteLine($"Database Hash Count: {lineCount}");
            Console.WriteLine(" ");
            Console.Write("Press any key to continue...");
            Console.Clear();
            Console.Write("Scan Path:");
            string pathScan = @Console.ReadLine();
            Console.Clear();
            Console.WriteLine("Stage 2: MD5 Hashing");
            var data = GetHasList(@pathScan, false).Select(x => $"\"{x.fileName}\" {x.hash}");
            System.IO.File.WriteAllLines("output.txt", data);
            Console.Clear();
            Console.WriteLine("Stage 3: Comparing MD5 hashes to DB");
            KeyValuePair<string, string> dic = new KeyValuePair<string, string>();
             dic = System.IO.File.ReadAllLines("output.txt")
              .Select(l => l.Split(new[] { '<' }))
              .Idon'tknowwhatshouldbehere(s => s[1].Trim().Substring(0, 10), s => s[0].Trim());
            List<string> lines = System.IO.File.ReadAllLines("virushashesL.txt").ToList();
            foreach (var line in lines)
            {
                bool malicious = dic.ContainsKey(line);
                if (malicious)
                {
                    string malPath = dic[line];
                    System.IO.File.Delete(malPath);
                }
            }
            Console.Clear();
            sw.Stop();
            Console.Write($"Done in {sw.Elapsed}...");
            Console.ReadKey();
            return;
        }
        public static IEnumerable<(string fileName, string hash)> GetHasList(string path, bool isRelative)
        {
            foreach (var file in Directory.GetFiles(path, "*.*", SearchOption.AllDirectories))
            {
                string hash;
                using (var md5 = MD5.Create())
                using (var stream = System.IO.File.OpenRead(file))
                    hash = BitConverter.ToString(md5.ComputeHash(stream)).ToLower();
                    hash = hash.Replace("-", "");
                if (isRelative)
                    yield return (file.Remove(0, path.TrimEnd('/').Length + 1), hash);
                else
                    yield return ($"{file}<", hash);
            }
        }
    }
}

Example output.txt

"D:\EvaxHybrid\Downloads\CS8\insdir\CSMediaLibParser.dll<" a384ff0a72a89028fc5edc894309ce81
"D:\EvaxHybrid\Downloads\CS8\insdir\CSMediaLibTools.dll<" 62cd2374d3a2bbeb888a078dc20e6b18
...

Example virushashesL.txt

2d3f18345c
2d427ec2c7
...
kornkaobat
  • 93
  • 9
  • Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. Then [edit] your question to include the source code you have as a [mcve], which can be compiled and tested by others. It is not quite clear what you want to do or how MD5 is related to what you are doing. Also explain what "file directories" are. And your file paths in your `output.txt` has a `<` in it. Is that on purpose or what is the meaning of that character? – Progman Jul 12 '20 at 12:47
  • @Progman The whole source code is required for it to compile. What I want to do is compare malicious MD5 with all files in a specified directory and subdirectories' MD5 and deletes any malicious file. ( Malware' MD5 is 10 chars long, files' MD5 is full size ) As for the "<", it is a MD5 and directory seperator. – kornkaobat Jul 12 '20 at 12:55
  • The `GetHasList()` method (an `s` is missing in the method name?) is returning a tuple of `fileName` and `hash`, separated. But you put them together in a string and moments later, use `Split()` to split them up again. Why do you put them in a string and split them again? Why not keep them separated? Maybe even return a `IDictionary` which maps the filepath to the hash. Based on that you might do stuff like `.Any()` or `.Where()` to find the files for a given hash. Or you can "swap" the dictionary with `GroupBy()`, see https://stackoverflow.com/questions/13410590/grouping-dictionary-by-value – Progman Jul 12 '20 at 13:03

2 Answers2

1

I think you want to delete all paths with malware

        ILookup<string, string> lookup = System.IO.File.ReadAllLines("output.txt")
         .Select(l => l.Split(new[] { '<' }))
         .Select(s => (key: s[1].Trim().Substring(0, 10), value: s[0].Trim())) // create a value tuple (string key, string value)
         .ToLookup(s => s.key, s => s.value); // make a lookup from the tuples

        List<string> lines = System.IO.File.ReadAllLines("virushashesL.txt").ToList();
        foreach (var line in lines)
        {
            var malPaths = lookup[line];
            // if the key is not found an empty sequence is returned
            // so no further checks are neccessary
            foreach (var malPath in malPaths)
            {
                // delete all malicious paths
                System.IO.File.Delete(malPath);
            }
        }
Lev
  • 583
  • 3
  • 15
  • `System.IO.IOException: 'The filename, directory name, or volume label syntax is incorrect. : 'D:\EvaxHybrid\Mywork\RiskRemover\C#\RiskRemover\bin\Release\netcoreapp3.1\"D:\EvaxHybrid\Downloads\adobe\acrobat\programdirready\Designer 9.0` Seems like I left some `"` – kornkaobat Jul 12 '20 at 13:12
  • trim the quote with: .Trim('"') – Lev Jul 12 '20 at 13:14
  • you probably want to trim the quote from the key/hash – Lev Jul 12 '20 at 13:15
1

D://output.txt


D:\EvaxHybrid\Downloads\CS8\insdir\CSMediaLibParser.dll< a384ff0a72a89028fc5edc894309ce81
D:\EvaxHybrid\Downloads\CS8\insdir\CSMediaLibTools.dll< 62cd2374d3a2bbeb888a078dc20e6b18 



D://virushashesL.txt
a384ff0a72a89028fc5edc894309ce81
62cd2374d3a2bbeb888a078dc20e6b18



private void fileintodis()
        {
             List<KeyValuePair<string, string>> dic = new List<KeyValuePair<string, string>>();
            dic = System.IO.File.ReadAllLines("D://output.txt").ToList()
                .Select(l => new KeyValuePair<string, string>(l.Split('<')[1].Trim(), l.Split('<')[0].Trim())).ToList();
            
           
            List<string> lines = System.IO.File.ReadAllLines("D://virushashesL.txt").ToList();
            foreach (var line in lines)
            {
               
                bool malicious = dic.Where(s => s.Key.Trim() == line).Count() > 0 ? true : false;
                if (malicious)
                {
                    string malPath = dic.Where(s => s.Key == line).Select(e => e.Value).FirstOrDefault().ToString();
                    System.IO.File.Delete(malPath);
                }
            }
        }
LDS
  • 354
  • 3
  • 9