0

I'm developing DLL under Win32 that makes a simple job: it scans host's virtual memory for substring. But for some reason it does it very slow comparing to Cheat Engine, ArtMoney or even OllyDbg that uses single thread to scan. Here's the code of the function that scans single memory section which I got with VirtualQuery(). The host (.exe application) commits about 300-400 MiB of memory and I have to scan about ~170 memory sections with different size from 4KiB to 32MiB. I scan MEM_PRIVATE, MEM_COMMIT regions only, don't scan PAGE_GUARD, PAGE_NOACCESS, PAGE_READONLY, skip DLL's own memory.

For some reason the performance is terrible - it takes 10-12 seconds to find single string. For example OllyDbg finds the string in ~2-3 seconds.

UINT __stdcall ScanAndReplace(UCHAR* pStartAddress, UCHAR* pEndAddress, const char* csSearchFor, const char* csReplaceTo, UINT iLength)
{
    // This function runs inside the single memory section and looks for a specific substring

    // pStartAddress: UCHAR* - The begining of the memory section
    // pEndAddress: UCHAR* - The ending of the memory section
    // csSearchFor: const char* - The pointer to the substring to search for
    // csReplaceTo: const char* - The pointer to the substring to replace with
    // iLength: UINT - max length of csSearchFor substring

    // Total iterations
    UINT iHits = 0;

    // Scan from pStartAddress to (pEndAddress - iLength) and don't overrun memory section
    for (pStartAddress; pStartAddress < (pEndAddress - iLength); ++pStartAddress)
    {
        UINT iIterator = 0;

        // Scan for specific string that begins at current address (pStartAddress) until condition breaks
        for (iIterator; (iIterator < iLength) && (pStartAddress[iIterator] == csSearchFor[iIterator]); ++iIterator);

        // String matches if iIterator == iLength
        if (iIterator == iLength)
        {
            // Found, do something (edit/replace, etc), increment counter...
            ++iHits;
        }

        /*
        // Even if you search for single byte it's very slow
        if (*pStartAddress == 'A')
            ++iHits;
        */
    }

    return iHits;
}

I'm using MSVS 2010.

Compiler command line:

/nologo /W3 /WX- /O2 /Os /Oy- /GL /D "WIN32" /D "NDEBUG" /D "_WINDOWS"
  /D "_USRDLL" /D "MYDLL_EXPORTS" /D "_WINDLL" /GF /Gm- /MD /GS- /Gy
  /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\MyDll.pch" /FAcs
  /Fa"Release\" /Fo"Release\" /Fd"Release\vc100.pdb" /Gd /TC /analyze-
  /errorReport:queue

Linker command line:

/OUT:"D:\MyDll\Release\MyDll.dll" /INCREMENTAL:NO /NOLOGO /DLL "Dbghelp.lib"
  "msvcrt.lib" "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib"
  "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib"
  "uuid.lib" "odbc32.lib" "odbccp32.lib" /NODEFAULTLIB /MANIFEST:NO
  /ManifestFile:"Release\MyDll.dll.intermediate.manifest" /ALLOWISOLATION
  /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG
  /PDB:"D:\MyDll\Release\MyDll.pdb" /SUBSYSTEM:WINDOWS /OPT:REF /OPT:ICF
  /PGD:"D:\MyDll\Release\MyDll.pgd" /LTCG /TLBID:1 /ENTRY:"DllMain"
  /DYNAMICBASE /NXCOMPAT /MACHINE:X86 /ERRORREPORT:QUEUE

What am I doing wrong? Is my algorythm bad or is there some sort of "magic" other memory scanners use?

deadbeef
  • 5,409
  • 2
  • 17
  • 47
Dr Morgan
  • 1
  • 1
  • How long does it take you to enumerate all the sections? I mean - if you reduce your ScanAndReplace() function to nothing, will it still take considerable time? Maybe you are looking for a problem in the wrong place? – Vlad Feinstein Sep 16 '15 at 20:01
  • If I remove all code (or just part where actual reading happens - the internal FOR loop) from this function it passes whole virtual memory almost instantly. I think I have found the roots of the problem, it is not related to the algorithm but host application somehow affects on memory reading speed. I need a bit more time to figure out what happens and why this DLL loaded in "test" exe application with same amount of commited memory works fast as expected but very slow in "real" application. It's probably try/catch {}. – Dr Morgan Sep 17 '15 at 21:28

1 Answers1

0

Other memory scanners may be using "magic" in the form of better search algorithms like for example Boyer-Moore. They may also be doing additional micro-optimizations on their search algorithm but I would guess that the choice of algorithm would account for most of the difference you're seeing.

mattnewport
  • 13,728
  • 2
  • 35
  • 39
  • Thanks for useful link. But even if I scan for single byte it works slow. I have checked the source code of [memedit0x0 project](http://code.google.com/p/memedit0x0/) and found it works pretty fast and I didn't find anything related to Boyer–Moore string search algorithm. It scans for single [value in memory](http://code.google.com/p/memedit0x0/source/browse/MemoryCell.cpp) in the function `MemoryCell::update()` As you can see it uses same algorithm as I do but it works faster in the orders of magnitude. – Dr Morgan Sep 16 '15 at 00:10
  • @DrMorgan hmm, I don't see any obvious reason why your code would be slower. Have you tried replacing the scan code in memedit0x0 with your code to see if it performs slower there. If not, perhaps the difference is due to some other aspect of your program. – mattnewport Sep 16 '15 at 00:34