2

thats how i wrote your beautiful code(some simple changes for me for easier understanding)

     private void Form1_Load(object sender, EventArgs e)
    {

        prev = GetDesktopImage();//get a screenshot of the desktop;
        cur = GetDesktopImage();//get a screenshot of the desktop;


        var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
                                    ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
        var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
                                    ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
        ApplyXor(locked1, locked2);
        compressionBuffer = new byte[1920* 1080 * 4];

        // Compressed buffer -- where the data goes that we'll send.
        int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;

        backbuf = new CompressedCaptureScreen(backbufSize);

        MessageBox.Show(compressionBuffer.Length.ToString());
        int length = Compress();

        MessageBox.Show(backbuf.Data.Length.ToString());//prints the new buffer size

    }

the compression buffer length is for example 8294400 and the backbuff.Data.length is 8326947

  • 2
    Try http://codereview.stackexchange.com/ – Maxime Peloquin Jul 21 '15 at 15:56
  • 1
    I'd recommend using some sort of adaptive image compression. Basically, when you detect "large" changes, send a low-res approximation and queue up high-res details to refine the image to be sent later. That way you don't have giant bursts of traffic, and if there are several large changes in a row, you'd only send low-res frames. You can probably find some technical articles/documents on how streaming works, as that's effectively what you're trying to do. – Drew McGowen Jul 21 '15 at 16:02
  • Is there any special reason why you're trying to rebuild Teamviewer or its alternatives? ;-) – Tobias Knauss Jul 21 '15 at 17:25
  • @TobiasKnauss no haha. I literally liked that idea and the image processing subject,also i got to know the socket and network idea, so i just thought it would be a challenge to me to try to try making somthing like this. –  Jul 21 '15 at 17:27
  • @DrewMcGowen thanks for your comment. i just tried that using jpeg encode(tried 50 for the first time) but now in the server side,when i try to merge it,im getting an `Attempted to read or write protected memory. This is often an indication that other memory is corrupt.` **exception** in this line `if (p2[0] == 0 && p2[1] == 0 && p2[2] == 0 && p2[3] == 0)` –  Jul 21 '15 at 17:40
  • Jpegs are not rgb. You need to convert before you can access them as ARGB. – Mitch Jul 21 '15 at 22:18
  • @Mitch okay... but convert to just to png format? or convert the pixelformat using the `Clone` method? –  Jul 22 '15 at 07:45
  • No, you have to pass the pixel format to the `LockBits` call. Currently you are passing `bmp2.PixelFormat` which means you are accessing whatever the current format is. You need to specify it as `PixelFormat.Something`. As far as performance improvement, you are still limited by the delta compression. Video codecs are not easy to write, and you often end up with a tradeoff between latency and bandwidth. Consider using a standard codec like h.264 or some other [screen compression algorithm](https://www.google.com/?q=screen+compression+algorithm). You may look at VNC for inspiration. – Mitch Jul 22 '15 at 15:41
  • You could reuse DirectShow. Check this answer http://stackoverflow.com/questions/3167032/real-time-video-encoding-in-directshow (and especially Daniel Mošmondor's one) – Simon Mourier Jul 23 '15 at 17:26
  • You should consider Windows Media Encoder http://gopalakrishna.palem.in/screencap.html#Windows-Media-Encoder-Screen-Capture – drooksy Jul 23 '15 at 23:14
  • @itapi the other way around is the exact same transformation. You will figure it out. – atlaste Jul 27 '15 at 21:37

3 Answers3

8

I didn't like the compression suggestions, so here's what I would do.

You don't want to compress a video stream (so MPEG, AVI, etc are out of the question -- these don't have to be real-time) and you don't want to compress individual pictures (since that's just stupid).

Basically what you want to do is detect if things change and send the differences. You're on the right track with that; most video compressors do that. You also want a fast compression/decompression algorithm; especially if you go to more FPS that will become more relevant.

Differences. First off, eliminate all branches in your code, and make sure memory access is sequential (e.g. iterate x in the inner loop). The latter will give you cache locality. As for the differences, I'd probably use a 64-bit XOR; it's easy, branchless and fast.

If you want performance, it's probably better to do this in C++: The current C# implementation doesn't vectorize your code, and that will help you a great deal here.

Do something like this (I'm assuming 32bit pixel format):

for (int y=0; y<height; ++y) // change to PFor if you like
{
    ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
    ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
    for (int x=0; x<width; x += 2)
        row2[x] ^= row1[x];
}

Fast compression and decompression usually means simpler compression algorithms. https://code.google.com/p/lz4/ is such an algorithm, and there's a proper .NET port available for that as well. You might want to read on how it works too; there is a streaming feature in LZ4 and if you can make it handle 2 images instead of 1 that will probably give you a nice compression boost.

All in all, if you're trying to compress white noise, it simply won't work and your frame rate will drop. One way to solve this is to reduce the colors if you have too much 'randomness' in a frame. A measure for randomness is entropy, and there are several ways to get a measure of the entropy of a picture ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ). I'd stick with a very simple one: check the size of the compressed picture -- if it's above a certain limit, reduce the number of bits; if below, increase the number of bits.

Note that increasing and decreasing bits is not done with shifting in this case; you don't need your bits to be removed, you simply need your compression to work better. It's probably just as good to use a simple 'AND' with a bitmask. For example, if you want to drop 2 bits, you can do it like this:

for (int y=0; y<height; ++y) // change to PFor if you like
{
    ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
    ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
    ulong mask = 0xFFFCFCFCFFFCFCFC;
    for (int x=0; x<width; x += 2)
        row2[x] = (row2[x] ^ row1[x]) & mask;
}

PS: I'm not sure what I would do with the alpha component, I'll leave that up to your experimentation.

Good luck!


The long answer

I had some time to spare, so I just tested this approach. Here's some code to support it all.

This code normally run over 130 FPS with a nice constant memory pressure on my laptop, so the bottleneck shouldn't be here anymore. Note that you need LZ4 to get this working and that LZ4 is aimed at high speed, not high compression ratio's. A bit more on that later.

First we need something that we can use to hold all the data we're going to send. I'm not implementing the sockets stuff itself here (although that should be pretty simple using this as a start), I mainly focused on getting the data you need to send something over.

// The thing you send over a socket
public class CompressedCaptureScreen
{
    public CompressedCaptureScreen(int size)
    {
        this.Data = new byte[size];
        this.Size = 4;
    }

    public int Size;
    public byte[] Data;
}

We also need a class that will hold all the magic:

public class CompressScreenCapture
{

Next, if I'm running high performance code, I make it a habit to preallocate all the buffers first. That'll save you time during the actual algorithmic stuff. 4 buffers of 1080p is about 33 MB, which is fine - so let's allocate that.

public CompressScreenCapture()
{
    // Initialize with black screen; get bounds from screen.
    this.screenBounds = Screen.PrimaryScreen.Bounds;

    // Initialize 2 buffers - 1 for the current and 1 for the previous image
    prev = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
    cur = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);

    // Clear the 'prev' buffer - this is the initial state
    using (Graphics g = Graphics.FromImage(prev))
    {
        g.Clear(Color.Black);
    }

    // Compression buffer -- we don't really need this but I'm lazy today.
    compressionBuffer = new byte[screenBounds.Width * screenBounds.Height * 4];

    // Compressed buffer -- where the data goes that we'll send.
    int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
    backbuf = new CompressedCaptureScreen(backbufSize);
}

private Rectangle screenBounds;
private Bitmap prev;
private Bitmap cur;
private byte[] compressionBuffer;

private int backbufSize;
private CompressedCaptureScreen backbuf;

private int n = 0;

First thing to do is capture the screen. This is the easy part: simply fill the bitmap of the current screen:

private void Capture()
{
    // Fill 'cur' with a screenshot
    using (var gfxScreenshot = Graphics.FromImage(cur))
    {
        gfxScreenshot.CopyFromScreen(screenBounds.X, screenBounds.Y, 0, 0, screenBounds.Size, CopyPixelOperation.SourceCopy);
    }
}

As I said, I don't want to compress 'raw' pixels. Instead, I'd much rather compress XOR masks of previous and the current image. Most of the times this will give you a whole lot of 0's, which is easy to compress:

private unsafe void ApplyXor(BitmapData previous, BitmapData current)
{
    byte* prev0 = (byte*)previous.Scan0.ToPointer();
    byte* cur0 = (byte*)current.Scan0.ToPointer();

    int height = previous.Height;
    int width = previous.Width;
    int halfwidth = width / 2;

    fixed (byte* target = this.compressionBuffer)
    {
        ulong* dst = (ulong*)target;

        for (int y = 0; y < height; ++y)
        {
            ulong* prevRow = (ulong*)(prev0 + previous.Stride * y);
            ulong* curRow = (ulong*)(cur0 + current.Stride * y);

            for (int x = 0; x < halfwidth; ++x)
            {
                *(dst++) = curRow[x] ^ prevRow[x];
            }
        }
    }
}

For the compression algorithm I simply pass the buffer to LZ4 and let it do its magic.

private int Compress()
{
    // Grab the backbuf in an attempt to update it with new data
    var backbuf = this.backbuf;

    backbuf.Size = LZ4.LZ4Codec.Encode(
        this.compressionBuffer, 0, this.compressionBuffer.Length, 
        backbuf.Data, 4, backbuf.Data.Length-4);

    Buffer.BlockCopy(BitConverter.GetBytes(backbuf.Size), 0, backbuf.Data, 0, 4);

    return backbuf.Size;
}

One thing to note here is that I make it a habit to put everything in my buffer that I need to send over the TCP/IP socket. I don't want to move data around if I can easily avoid it, so I'm simply putting everything that I need on the other side there.

As for the sockets itself, you can use a-sync TCP sockets here (I would), but if you do, you will need to add an extra buffer.

The only thing that remains is to glue everything together and put some statistics on the screen:

public void Iterate()
{
    Stopwatch sw = Stopwatch.StartNew();

    // Capture a screen:
    Capture();

    TimeSpan timeToCapture = sw.Elapsed;

    // Lock both images:
    var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height), 
                               ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
    var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
                                ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
    try
    {
        // Xor screen:
        ApplyXor(locked2, locked1);

        TimeSpan timeToXor = sw.Elapsed;

        // Compress screen:
        int length = Compress();

        TimeSpan timeToCompress = sw.Elapsed;

        if ((++n) % 50 == 0)
        {
            Console.Write("Iteration: {0:0.00}s, {1:0.00}s, {2:0.00}s " + 
                          "{3} Kb => {4:0.0} FPS     \r",
                timeToCapture.TotalSeconds, timeToXor.TotalSeconds, 
                timeToCompress.TotalSeconds, length / 1024,
                1.0 / sw.Elapsed.TotalSeconds);
        }

        // Swap buffers:
        var tmp = cur;
        cur = prev;
        prev = tmp;
    }
    finally
    {
        cur.UnlockBits(locked1);
        prev.UnlockBits(locked2);
    }
}

Note that I reduce Console output to ensure that's not the bottleneck. :-)

Simple improvements

It's a bit wasteful to compress all those 0's, right? It's pretty easy to track the min and max y position that has data using a simple boolean.

ulong tmp = curRow[x] ^ prevRow[x];
*(dst++) = tmp;

hasdata |= tmp != 0;

You also probably don't want to call Compress if you don't have to.

After adding this feature you'll get something like this on your screen:

Iteration: 0.00s, 0.01s, 0.01s 1 Kb => 152.0 FPS

Using another compression algorithm might also help. I stuck to LZ4 because it's simple to use, it's blazing fast and compresses pretty well -- still, there are other options that might work better. See http://fastcompression.blogspot.nl/ for a comparison.

If you have a bad connection or if you're streaming video over a remote connection, all this won't work. Best to reduce the pixel values here. That's quite simple: apply a simple 64-bit mask during the xor to both the previous and current picture... You can also try using indexed colors - anyhow, there's a ton of different things you can try here; I just kept it simple because that's probably good enough.

You can also use Parallel.For for the xor loop; personally I didn't really care about that.

A bit more challenging

If you have 1 server that is serving multiple clients, things will get a bit more challenging, as they will refresh at different rates. We want the fastest refreshing client to determine the server speed - not slowest. :-)

To implement this, the relation between the prev and cur has to change. If we simply 'xor' away like here, we'll end up with a completely garbled picture at the slower clients.

To solve that, we don't want to swap prev anymore, as it should hold key frames (that you'll refresh when the compressed data becomes too big) and cur will hold incremental data from the 'xor' results. This means you can basically grab an arbitrary 'xor'red frame and send it over the line - as long as the prev bitmap is recent.

atlaste
  • 30,418
  • 3
  • 57
  • 87
  • thank you very very much but im writing c# codes.. i dont really have a clue about c++;my project is in c# so i would really apperciate if you could help in c#?(although they're similar.) @atlaste –  Jul 24 '15 at 08:47
  • about 64bit xor. what is it actually? –  Jul 24 '15 at 08:48
  • ohh im srry i just got confused for a moment because im pretty new for these operator ^ ,^= haha thaml you.whats actually the mask purpose? to copy only different parts? and dont you think dividing it into blocks would be better? @atlaste –  Jul 24 '15 at 10:52
  • @itapi The mask is there to remove accuracy (e.g. reduce the number of bits). It'll forces bits to 0 - but because the entropy will be reduced as a result of that, it'll probably compress better (with less pretty images). As for dividing in blocks; I'm not sure if the performance you loose will make up for the few extra bytes you have to send... LZ basically find the same sequences of bytes in previously encountered bytes - I think it'll have approximately the same end result without the (performance) overhead of creating the blocks. – atlaste Jul 24 '15 at 11:03
  • The main reason these compression algorithms use blocks is because of the _order_, not because of the _diff_. The idea is that within a block you have similar colors. See also this image for how this works: https://en.wikipedia.org/wiki/JPEG#/media/File:JPEG_ZigZag.svg . I do think this idea will work better in terms of compression rate. – atlaste Jul 24 '15 at 11:05
  • one last thing, i need to send these uolng rows converted to byte array on a socket right? how do i extract it on the other side? i mean how would i know to fill the image according to this @atlaste –  Jul 24 '15 at 11:11
  • @itapi You should read what 'xor' does (exclusive or) on the wikipedia. Basically you need to uncompress the data (lz7) and xor again. The initial image is all zero's. Or more formally: `a = b ^ c;` -> `c = a ^ b;` You always know the previous state, so you can calculate the new state. – atlaste Jul 24 '15 at 11:58
  • i just tried using the lz4 from here https://lz4net.codeplex.com/ u use the the ` byte[] rest=LZ4Codec.Encode(mybuffer,0,mybuffer.length)` and i didnt actually got such a good result... for a 340kb image it outputed an 270kb array.. than i tried gzip stream and it was even better!-230kb... second thing is :is that all lz4 does? wha'ts about the stream thing you've mentioned? @atlaste –  Jul 27 '15 at 17:05
  • @itapi Ehm that means you're doing it wrong. I just tested it (will post), and I'm getting about 32 KB/image (1080p resolution) and >100 FPS (without sockets so yes I'm cheating). The main thing about LZ4 is about _high performance_ compression; as I said, google for LZ4 on what it does exactly, it's described in quite a bit of detail. I haven't tried decompression, it should have an even higher performance. – atlaste Jul 27 '15 at 18:22
  • you're awesome man! you just gave me what i need. you tought me a things i could learn in hours at 10 minutes! you're undoubtedly deserves the right answer! :):) right now im not at home so i'll try it later. i'll will very very apperciate if you could answer my questions(if will be :)) after i'll implement your methods. one last thing-in thet server side after i'll decode the data using lz4 ,do i have to use inverted method of xor to get the bitmap object and display it on a picturebox? thank you very much! @atlaste –  Jul 27 '15 at 20:36
  • somthing very weird-im trying to output `backbuf.Data.Length.ToString()` and im just getting a bigger number .. i mean a bigger int that the `Compressionbuffer.Length.Tostring()`... altoguht when i print backbuf.Size its smaller.. im not sure i i didnt undestand that or there's a little mistake in the code? @atlaste –  Jul 28 '15 at 13:27
  • look at the post i'll edit it with the problem @ atlaste –  Jul 28 '15 at 13:29
  • @itapi Buffers are allocated to have _at least_ the size that's required. In other words, they're usually overallocated. The length that's stored is the data that you need. – atlaste Jul 28 '15 at 13:32
  • what do i need to do with the length? i need a buffer to send over the net...im sorry im just a bit confused.. edited post^ @atlaste –  Jul 28 '15 at 13:34
  • Read my last comment again, @itapi . I already gave you the answer to that. – atlaste Jul 28 '15 at 13:42
  • ohh i undestand.. ;) so i need to convert the length integer into a byte of array and send this on a socket right? @atlaste –  Jul 28 '15 at 13:45
  • maybe we can continue this little convesation on chat? i won't bother much :) @atlaste –  Jul 28 '15 at 19:37
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84504/discussion-between-itapi-and-atlaste). –  Jul 28 '15 at 19:37
  • what's the point man? i realy apperciate what you've dont so far and i guess you dont owe me nothing but ignoring is such a hurting pain XD @atlaste –  Jul 28 '15 at 20:42
  • @itapi The point was that socket write gets a byte[], an index and a length. The byte[] is fixed, the offset is 0 and the length is the length you get back. That's the way to reuse a buffer over and over again. – atlaste Jul 28 '15 at 20:53
  • yea but which buffer do i send? the compression buffer? what about the `length`? @atlaste –  Jul 28 '15 at 20:56
  • The only relevant thing here is the backbuf... But really @itapi , there's no sense in this if you don't understand what you're working with; I'm not here to give you endless answers on questions you should be able to figure out by yourself (so yea this is my last response on this subject). If necessary, take a debugger and go through it line by line if you need that. – atlaste Jul 28 '15 at 21:12
  • thanks for everything. seems like next time i should finish all my questions then give a bounty. @atlaste –  Jul 28 '15 at 21:13
  • @itapi That wouldn't have helped; I gave you the answers you're looking for. I find it okay to help you understand how to do things; I'm just not here to do your work for you - that's also not what stackoverflow is about. – atlaste Jul 29 '15 at 05:52
1

H264 or Equaivalent Codec Streaming

There are various compressed streaming available which does almost everything that you can do to optimize screen sharing over network. There are many open source and commercial libraries to stream.

Screen transfer in Blocks

H264 already does this, but if you want to do it yourself, you have to divide your screens into smaller blocks of 100x100 pixels, and compare these blocks with previous version and send these blocks over network.

Window Render Information

Microsoft RDP does lot better, it does not send screen as a raster image, instead it analyzes screen and creates screen blocks based on the windows on the screen. It then analyzes contents of screen and sends image only if needed, if it is a text box with some text in it, RDP sends information to render text box with a text with font information and other information. So instead of sending image, it sends information on what to render.

You can combine all techniques and make a mixed protocol to send screen blocks with image and other rendering information.

Akash Kava
  • 39,066
  • 20
  • 121
  • 167
  • Hey. i'm also working on a screen sharing project, i found your answer as the most interesting here. i have not found a h.64 or any similiar library in c#,so i'm actually aiming for your second suggestion.just to make sure i got your idea: The steps are:dividing the screen to small blocks,compares each block against the block from the previous shot,and send it to the server? any suggestion for the block size?does 100x100 is the best division? thank you so much! i'll really appreiciate if you'll explain a liltle bit more. – Slashy Aug 14 '16 at 19:14
  • any chance for that? :) @Akash Kava – Slashy Aug 16 '16 at 16:45
  • @Slashy, you can use ffmpeg, which will capture screen, encode and stream it !! – Akash Kava Aug 16 '16 at 17:05
  • ffmpeg would encode it into a video file.. this is not my purpose.. i dont want to save a video record of the screen. it's little different. i want to constantly send updates from the screen to a remote client..that's how a basically screen sharing concept works.. i dont want to send a video file after the capture is done or somthing..@Akash Kava – Slashy Aug 16 '16 at 17:08
  • @Slashy ffmpeg can record your screen and stream it, you can use H264 stream over http to display, search for ffmpeg stream desktop. – Akash Kava Aug 16 '16 at 18:54
0

Instead of handling data as an array of bytes, you can handle it as an array of integers.

int* p = (int*)((byte*)scan0.ToPointer() + y * stride);
int* p2 = (int*)((byte*)scan02.ToPointer() + y * stride2);

for (int x = 0; x < nWidth; x++)
{
    //always get the complete pixel when differences are found
    if (*p2 != 0)
      *p = *p2

    ++p;
    ++p2;
}
GeirGrusom
  • 999
  • 5
  • 18
  • You shouldn't do it like that. Basically it's better to use `byte*` until you need the row and then cast it to `uint* row = (uint*)(y*stride + basePtr);`. After all, there's no guarantee that stride is divisable by 4 (it's likely though) – atlaste Jul 24 '15 at 05:58
  • Updated the answer with regards to stride and offset. – GeirGrusom Jul 24 '15 at 05:59
  • @GeirGrusom is it supposed to be faster? i mean whats the ffectiveness by using this method? –  Jul 24 '15 at 08:45
  • @itapi Instead of four copy instructions it has been reduced to one. Four branches has been reduced to one as well. The branching will be fairly significant because if you actually hit 0, 0, 0, 1 it will do three branch test and still fail. This will do only one test in any case and still perform the same task. Branch prediction done by the processor also has a better chance of doing the right thing. – GeirGrusom Jul 24 '15 at 08:53
  • @GeirGrusom okay.. but how i am supposed to extract an image from this int pointers? –  Jul 24 '15 at 09:00
  • @itapi If you want to extract red for example you can shift the value 24 bits to the right. To get green you shift it 16 to the right and do `& 0xff`. You could also make a `struct` that has the correct pixel format as well. – GeirGrusom Jul 24 '15 at 09:12
  • @GeirGrusom sorry.. i just cant understand that solution.. i think its too complicated for me.. thanks anyway –  Jul 24 '15 at 09:13
  • @itapi It's not that complicated. Each pixel requires four bytes, red, green, blue and alpha, correct? the `int` datatype takes four bytes, so each pixel fits neatly inside this. The advantage here is that your processor's general purpose registers can process 4 or more commonly, 8 bytes in a single go. Use that fact. – GeirGrusom Jul 24 '15 at 09:15
  • @GeirGrusom if so -isnt it should be 4 int variables for each byte?(red,green,blue,alpha)? and what do you mean by saying shift? –  Jul 24 '15 at 09:19