2

I'm a beginner programmer and I've been banging my head on my desk for a while because of this problem. Basically, I'm trying to write some code for an application I'm making that will be able to read the rating of multiple thousands of files quickly inside a specified folder.

I was actually able to write something that works, the problem is the performance. Here is the code in its entirety, I will explain why it is problematic in more detail below:

using System.Collections.Generic;
using System.Windows.Forms;
using Microsoft.WindowsAPICodePack.Shell;
using System.Diagnostics;

namespace Tests
{
    public partial class Form1 : Form
    {
        List<string> Files = new List<string>();

        public Form1()
        {
            InitializeComponent();
        }

        private void event_Form1_Shown(object sender, EventArgs e)
        {
            string File = @"D:\Downloads\1.png";
            int NumberOfLoops = 5000;

            Stopwatch sw = new Stopwatch();
            sw.Start();

            for (int i = 0; i < NumberOfLoops; i++)
            {
                var file = ShellFile.FromFilePath(File);
                int Rating = Convert.ToInt32(file.Properties.System.Rating.Value);

                if (Rating == 0)
                {
                    Files.Add(File);
                }
            }

            sw.Stop();
            MessageBox.Show("Time: " + sw.ElapsedMilliseconds.ToString() + "ms (" + NumberOfLoops.ToString() + "x)");
        }
    }
}

On my system, reading the rating of this one file 5000 times takes around 6200ms (local harddrive) and 21500ms if the file is on a network share.

The problem is, as I eluded before, that this code will be used to read the rating of way more files than 5000 (sometimes hundreds of thousands) and the performance is absolutely abysymal. What I have also learned is that Windows uses some form of caching for reading this kind of metadata from a file more rapidly once it has been read before, so reading a specific file's metadata over and over is the absolute best scenario in terms of performance.

But even though it might not be accurate, it is still a useful test to do in order to have some kind of benchmark to compare different methods of reading file extended attributes to see which one takes the least amount of time to complete. In real-world use, the app will actually have to read the ratings of a gigantic pool of different files, which slows things down by a factor of around 25 times by my testing (the 21500ms operation takes 578000ms for example, which is around 10 minutes so you can see why this is becoming a problem).

Since I know I'm a beginner and that my code is probably super inefficient, I started looking around for other methods of doing the same thing. So using this solution from a thread on a similar problem, I came up with this code:

using System.Collections.Generic;
using System.Windows.Forms;
using System.Diagnostics;

namespace Tests
{
    public partial class Form1 : Form
    {
        List<string> Files = new List<string>();
        Shell32.Shell app = new Shell32.Shell();

        public Form1()
        {
            InitializeComponent();
        }

        private void event_Form1_Shown(object sender, EventArgs e)
        {
            string Folder = @"D:\Downloads\";
            string File = "1.png";
            int NumberOfLoops = 5000;

            Stopwatch sw = new Stopwatch();
            sw.Start();

            for (int i = 0; i < NumberOfLoops; i++)
            {
                var folderObj = app.NameSpace(Folder);
                var filesObj = folderObj.Items();

                var headers = new Dictionary<string, int>();
                for (int j = 0; j < short.MaxValue; j++)
                {
                    string header = folderObj.GetDetailsOf(null, j);
                    if (String.IsNullOrEmpty(header))
                        break;
                    if (!headers.ContainsKey(header)) headers.Add(header, j);
                }

                var testFile = filesObj.Item(File);

                if (folderObj.GetDetailsOf(testFile, headers["Rating"]) == "Unrated")
                {
                    Files.Add(Folder + File);
                }
            }

            sw.Stop();
            MessageBox.Show("Time: " + sw.ElapsedMilliseconds.ToString() + "ms (" + NumberOfLoops.ToString() + "x)");
        }
    }
} 

Unfortunately, this method is even slower than the one before, clocking in at around 6700ms on my local harddrive and 23000ms on a network share. I also found these other solutions which seemed to be doing something to what I want, but I couldn't get them to work for various reasons:

  • https://stackoverflow.com/a/65349545 : the StorageFile.GetFileFromPathAsync call gives me an error even if I added Microsoft.Windows.SDK.Contracts into the project NuGet packages.

  • https://stackoverflow.com/a/48096438 : Using the popular TagLib-Sharp library, but unfortunately even though I was able to compile the code using this solution, I was not able to read the rating from a file (I was able to read the tags though, which are similar but not quite the thing I was looking for).

  • https://stackoverflow.com/a/29308647/19518435: This solution looked promising, but as another commenter mentionned, I have no idea what the FolderItem2 is supposed to be referencing. EDIT: Got this solution to work with some help, but unfortunately it is not really on par in terms of performance, see EDIT1 below for more details.

Ideally, I would like to find a way for this "benchmark" I've made to take around 1000ms or less (so in the realms of around 6-7 times faster than the first two methods).

I am really motivated to get this to work, but frankly I am out of ideas. It's kind of a frustrating situation because I know my code is probably very unoptimised or there might be a way more obvious way to do what I'm trying to achieve, but since I am very inexperienced I don't really know what else to try. So that's why I'm turning to you, any help would be greatly appreciated!

EDIT1: Was able to make two more methods work with some help, but unfortunately both are not very good in terms of performance. I compiled all 4 in this GitHub repo if anyone wants to take a second look at them, because I feel like there's a good chance my bad implementation is affecting performance: https://github.com/user-727/FileRatingReader

user_727
  • 21
  • 3
  • For #3 I think you need to look here https://learn.microsoft.com/en-us/windows/win32/api/shobjidl_core/nf-shobjidl_core-ishellitem2-getproperty it's a COM interface, hence the use of `Activator.CreateInstance` etc. Yeah, Windows doesn't make this easy. Even that might not help you much if you want outright performance, you might need to work out the format of the metadata and read it directly, possibly it's a secondary NTFS stream. – Charlieface Jul 10 '22 at 02:23
  • 1
    What *rating* are you looking form? The MetaData value, used in specific file formats (Image, Video etc.)? -- About IFolderItem2, that's the Interface that exposes the `ExtendedProperty()` method, used to query `System.*` properties. For example, you get the Folder Item: `Folder folder = [Shell].NameSpace([Some path]);` and its Items: `FolderItems items = folder.Items();`. Then you can enumerate the folder's Items as `foreach (FolderItem2 item in items) { }`. An Item's rating is then `var itemRating = (int?)item.ExtendedProperty("System.Rating");` (not to confuse with `System.Media.Rating`) – Jimi Jul 10 '22 at 02:59
  • @Jimi that is indeed the rating I'm looking to get (https://learn.microsoft.com/en-us/windows/win32/properties/props-system-rating). Thank you very much for the tips, with your information, I was able to make the third solution work with `FolderItem2` (the one described in https://stackoverflow.com/a/29308647/19518435), but unfortunately the performance isn't very good (around 122000ms). I also fiddled around for a bit and came up with a simpler method using the other examples you provided, and was able to take down the time to around 17500ms. – user_727 Jul 10 '22 at 19:08
  • So in total I have now 4 methods which are working but are not satisfactory in terms of the performance I'm looking for: -Method 1: 6200ms -Method 2: 6700ms -Method 3: 122000ms -Method 4: 17500ms Although it might just be my implementation that is bad and affecting performance, so I'll try to put up the four methods on Github and edit the link here if anyone wants to take a look at them. – user_727 Jul 10 '22 at 19:10
  • If you just want the Rating value of images, try the `ImagingBitmapInfo` class shown here: [How to determine if an Image is Grayscale](https://stackoverflow.com/a/49481035/7444103). Initialize the class as `var bitmapInfo = BitmapFormatInfo("[Image Path]");` (or pass a Stream), then read the `bitmapInfo.MetaData.Rating` value. If you can use it, let me know how it goes in your context. -- As a note, you can execute that code in a Thread other than the UI Thread (setting the Thread's `ApartmentState` to `STA`). If my code is of some use, you can run a couple of Tasks. – Jimi Jul 10 '22 at 23:32
  • I fiddled around for a bit and was able to add the the System.Windows.Media namespace required for the code in the link you provided to run ([Ref1](https://stackoverflow.com/q/3154198/19518435), [Ref2](https://docs.microsoft.com/en-us/visualstudio/ide/how-to-add-or-remove-references-by-using-the-reference-manager?view=vs-2022)) but unfortunately I wasn't able to get it to compile. I get 41 errors, with the first one being `The name 'BitmapFormatInfo' does not exist in the current context`. I don't understand a lot of what the original poster wrote, so I'm not too sure where to add their code – user_727 Jul 11 '22 at 19:07
  • Messed around a bit more with it, after including the references to `System.Xaml` and `WindowsBase` I "only" get 23 errors, with the first one being the same as the one I mentionned in my previous comment. I've added my code in the GitHub I linked in the main post if you want to check it out. – user_727 Jul 11 '22 at 21:25
  • @user_727 Is this something that definitely has to be done within your forms themselves? This sounds like an ideal situation for a Windows service (https://learn.microsoft.com/en-us/dotnet/core/extensions/windows-service). You can probably do the processing asynchronously in batches and just have your UI elements respond to the results, which would be shuttled back. – jonsca Jul 11 '22 at 22:13
  • You have copied the `BitmapPixelFormat()` methods outside a class (it's in the `namespace` scope). You need to move those methods to the Form. Then, in the `Shown` handler, you have `var bitmapInfo = BitmapPixelFormat(File);` and in the loop `int Rating = bitmapInfo.Metadata.Rating;`. Call the MessageBox with an explicit assembly: `System.Windows.Forms.MessageBox.Show(...)` – Jimi Jul 11 '22 at 22:15
  • You can also move the `BitmapPixelFormat` methods to the `ImagingBitmapInfo` class (since those are static methods), then you'd have: `var bitmapInfo = ImagingBitmapInfo.BitmapPixelFormat(File);` – Jimi Jul 11 '22 at 22:19
  • BTW, remember to prefix a nickname with `@` if you want to ping someone, otherwise the comment is not delivered to a specific recipient (i.e., nobody receives it). – Jimi Jul 11 '22 at 22:35
  • @jonsca it doesn't need to be done within the form itself, I don't mind the form being unresponsive while it's gathering the ratings but if it can improve performance it might be something worth taking a look at. But honestly I'm not sure if I could even get that to work because it looks very complicated from the guide you've linked and I'm already struggling to do simple stuff as you can see. Also, would it even be possible to launch different numbers of services depending on the number of files to read? From what I can see it looks like it has to be an hardcoded amount... – user_727 Jul 11 '22 at 23:14
  • @Jimi thanks for the reminder about the pings! I'm very sorry I should probably take an actual programming class or do more research online but I can't figure out what you mean by your last comment. Or I thought I did, but when I tried to do it I still got a lot of errors (down to only 19 though!). I feel like I'm just missing something really obvious but no matter what I do it doesn't seem like I'm putting the code at the right place. I've updated the GitHub once again if you'd be so kind as to take another look at it. Thanks! – user_727 Jul 12 '22 at 00:11
  • @user_727 I think you're doing just fine with the "simple stuff"! You could probably spin up multiple instances of the service if you found that's what you really need. Just start simple. The only part you need to worry about is filling out the `ExecuteAsync` method, which should be something like 1) Receive the filename from the Forms application, 2) Read the file off disk 3) Do processing 4) Send the result back to the Forms application. This is definitely just a suggestion and does make your life more difficult at the onset, but you need to examine what would happen if your – jonsca Jul 12 '22 at 01:08
  • app starts taking 10X the number of images you have now or 100X that amount. – jonsca Jul 12 '22 at 01:08
  • You're probably just missing `using System.Linq;` and changing `MessageBox` to `System.Windows.Forms.MessageBox`. Then you need to write, e.g., `int rating = bitmapInfo.Metadata?.Rating ?? 0;` -> e.g., a PNG file doesn't have Metadata. – Jimi Jul 12 '22 at 01:47
  • Note that when a Type or a Method etc. cannot be found, you just need to put the caret inside the underlined member and pre `ALT+ENTER`. Visual Studio should suggest to add the missing namespace / assembly. – Jimi Jul 12 '22 at 01:56
  • @Jimi it was indeed the `using System.Linq;` that was giving me issues, I feel really stupid now! I've never seen a line like this one before `int rating = bitmapInfo.Metadata?.Rating ?? 0;` but if I understand correctly it just makes it so that the variable is set to 0 if the value of the rating is null? Also, I must say that this method you've discovered is simply astonishing in terms of performance. In fact, the ellapsed milliseconds reported by the stopwatch is actually 0 so I can't even tell how faster exactly it is compared to the other 4 methods! – user_727 Jul 12 '22 at 18:36
  • Although unfortunately there is a big issue with it compared to the other methods. As you eluded to in your previous comment, PNG files can't have metadata, but that isn't exactly quite right. NTFS actually supports storing metadata in annex to any type of file. Windows OS also supports it to some degree, although that functionality has been greatly cut back ever since Windows Vista. Windows Explorer and Windows Search for example both have the ability to see metadata for files that normally wouldn't be allowed to have any on Windows OS like PNGs or GIFs. – user_727 Jul 12 '22 at 18:36
  • More information [here (in the readme)](https://github.com/Dijji/FileMeta) and [here](https://github.com/Dijji/FileMeta/wiki/Using-the-File-Meta-Association-Manager). I'm aware this might be a super niche use case but I didn't think it would be an issue since all the other methods didn't have any issues reading the metadata of files that weren't "supposed" to have any. Although I'm (maybe naively) hopeful that there could be a way to modify the code made by the other of the solution you linked to be able to read such metadata. – user_727 Jul 12 '22 at 18:37
  • I'll fiddle around with it a little bit to try and get more familiar with the code, and I might also try to contact the original author to ask them if they think it might be something that is possible to do. – user_727 Jul 12 '22 at 18:38
  • That's not Metadata, that's an Alternate Stream attached to a file. It was removed (it's not, it's still there, there's just no Shell interface for it) because, when you copy the file in another machine, the Alternate Stream is still attached but without reference, so the file is marked as *insecure*, you have to open up its properties and allow the feature. In some cases, the System wouldn't let you open the file and some anti-virus would complain, so some users (not devs) complained. Mark Russinovich wrote [Streams](https://docs.microsoft.com/en-us/sysinternals/downloads/streams) for this. – Jimi Jul 13 '22 at 04:58
  • That utility lets you see the content of the Alternate Stream attached to files, works in batch and also allows bulk-erase them. – Jimi Jul 13 '22 at 05:00
  • I can't seem to be able to read the ratings of png/gif files with this tool unfortunately. No matter what rating I give to the file, the two streams detected stay the same: `SebiesnrMkudrfcoIaamtykdDa:$DATA 112` and `{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}:$DATA 0`. I like this idea though, I'll try to do some more digging in the next couple of days – user_727 Jul 15 '22 at 02:28

0 Answers0