22

Lately, I have been working on a simple screen-sharing program.

Actually, the program works on a TCP protocol and uses the Desktop duplication API- a cool service that supports very fast screen capturing and also provides information about MovedRegions(areas that only changed their position on the screen but still exist) and UpdatedRegions(changed areas).

The Desktop duplication has 2 important properties-2 byte arrays an array for the previous-pixels and a NewPixels array. Every 4 bytes represent a pixel in the RGBA form so for example if my screen is 1920 x 1080 the buffer size is 1920 x 1080 * 4.

Below are the important highlights of my strategy

  1. In the initial state (the first time) I send the entire pixel buffer (in my case it's 1920 x 1080 * 3) - the alpha component is always 255 on screens :)
  2. From now on, I iterate over the UpdatedRegions (it's a rectangles array) and I send the regions bounds and Xo'r the pixels in it something like this:
writer.Position = 0;
var n = frame._newPixels;
var w = 1920 * 4; //frame boundaries.
var p = frame._previousPixels;

foreach (var region in frame.UpdatedRegions)
{
    writer.WriteInt(region.Top);
    writer.WriteInt(region.Height);
    writer.WriteInt(region.Left);
    writer.WriteInt(region.Width);
    
    for (int y = region.Top, yOffset = y * w; y < region.Bottom; y++, yOffset += w)
    {
        for (int x = region.Left, xOffset = x * 4, i = yOffset + xOffset; x < region.Right; x++, i += 4)
        {
            writer.WriteByte(n[i] ^ p[i]); //'n' is the newpixels buffer and 'p' is the previous.xoring for differences.
            writer.WriteByte(n[i+1] ^ p[i+1]);
            writer.WriteByte(n[i + 2] ^ p[i + 2]);

        }
    }
}
  1. I Compress the buffer using the lz4 wrapper written in c# (refer to lz4.NET. Then, I write the data on a NetworkStream.
  2. I merge the areas on the receiver side to get the updated image - this is not our problem today :)

'writer' is an instance of the 'QuickBinaryWriter' class I wrote (simply to reuse the same buffer again).

public class QuickBinaryWriter
{
    private readonly byte[] _buffer;
    private int _position;

    public QuickBinaryWriter(byte[] buffer)
    {
        _buffer = buffer;
    }

    public int Position
    {
        get { return _position; }
        set { _position = value; }
    }

    public void WriteByte(byte value)
    {
        _buffer[_position++] = value;
    }


    public void WriteInt(int value)
    {
        byte[] arr = BitConverter.GetBytes(value);
        
        for (int i = 0; i < arr.Length; i++)
            WriteByte(arr[i]);
    }
   
}

From many measures, I've seen that the data sent is really huge, and sometimes for a single frame update the data could get up to 200kb (after compression!). Let's be honest-200kb is really nothing, but if I want to stream the screen smoothly and watch at a high Fps rate I would have to work on this a little bit - to minimize the network traffic and bandwidth usage.

I'm looking for suggestions and creative ideas to improve the efficiency of the program- mainly the data sent on the network part (by packing it in other ways or any other idea) I'll appreciate any help and ideas. Thanks!

spaleet
  • 838
  • 2
  • 10
  • 23
Slashy
  • 1,841
  • 3
  • 23
  • 42
  • Your question is a bit vague. You should specify which part you would want to optimize. Right now the question has too many potential answers, which may result in down votes and it being put on hold for being too broad. I'll give an example of how broad. Do you want to optimize the code, how it sends data, the compression, or how it updates the screen? – dakre18 Dec 22 '15 at 17:45
  • @dakre18 thanks for the attention, i mainly look for data compression - i need to focus on minimizing the networktrafic-maybe packing the graphic data in other way... i dont know that's what i wrote my question :) – Slashy Dec 22 '15 at 18:00
  • You've asked this question before. – harold Dec 22 '15 at 19:41
  • @harold i have asked somthing similar to this- you may notice there is a change in the approach of the data organizing here. – Slashy Dec 22 '15 at 19:44
  • Is it a generic caster or is it intended for a particular kind of application/desktop to be streamed? E.g. when you know that there will be lots of patches of the same color simple compression might be a fast and well compressing option. If it's 3D games where potentially all pixels change and hardly any patches of the same color exist a jpg or H.264 encoding might be better. – Emond Dec 24 '15 at 17:59
  • @ErnodeWeerd actually it's a generic streaming for the entire screen but there is a feature of a certain area streaming :) I'm asking about the whole desktop stream – Slashy Dec 24 '15 at 18:16
  • Why not just use/develop OBS? It's an open source project that supports many screen streaming options, including monitor, region, window, game. – Lunyx Dec 30 '15 at 21:41
  • "I'm looking for suggestion and creative ideas" is not a clear question. Please read [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). – Blackwood Jan 11 '16 at 05:11

2 Answers2

19

For your screen of 1920 x 1080, with 4 byte color, you are looking at approximately 8 MB per frame. With 20 FPS, you have 160 MB/s. So getting from 8 MB to 200 KB (4 MB/s @ 20 FPS) is a great improvement.

I would like to get your attention to certain aspects that I am not sure you are focusing on, and hopefully it helps.

  1. The more you compress your screen image, the more processing it might need
  2. You actually need to focus on compression mechanisms designed for series of continuously changing images, similar to video codecs (sans audio though). For example: H.264
  3. Remember, you need to use some kind of real-time protocol for transferring your data. The idea behind that is, if one of your frame makes it to the destination machine with a lag, you might as well drop the next few frames to play catch-up. Else you will be in a perennially lagging situation, which I doubt the users are going to enjoy.
  4. You can always sacrifice quality for performance. The simplest such mechanism that you see in similar technologies (like MS remote desktop, VNC, etc) is to send a 8 bit color (ARGB each of 2 bits) instead of 3 byte color that you are using.
  5. Another way to improve your situation would be to focus on a specific rectangle on the screen that you want to stream, instead of streaming the whole desktop. This will reduce the size of the frame itself.
  6. Another way would be to scale your screen image to a smaller image before transmitting and then scale it back to normal before displaying.
  7. After sending the initial screen, you can always send the diff between newpixels and previouspixels. Needless to say the the original screen and the diff screen will all be LZ4 compressed/decompressed. Every so often you should send the full array instead of the diff, if you use some lossy algorithm to compress the diff.
  8. Does UpdatedRegions, have overlapping areas? Can that be optimized to not send duplicate pixel information?

The ideas above can be applied one on top of the other to get a better user experience. Ultimately, it depends on the specifics of your application and end-users.

EDIT:

Vikhram
  • 4,294
  • 1
  • 20
  • 32
  • thanks for the help. Few points as a comment. 1.I though to implement simple rle compression for the different regions. (After xoring their buffers)- do you think it can get good results? 2.about the quality part - did you really mean 2bits per channel? ? That's mean a maximum of 2^2 options. . The pixels are going to be so corrupted from their source. – Slashy Dec 29 '15 at 10:21
  • 3.I'm aiming at full desktop streaming at this moment. ;)4.scaling the image can also massively destroy the quality- maybe transferring the region as Jpeg would be a good idea.8.there are no overlapping areas ;) the desktop api works pretty good - this is the only thing I satisfied by. – Slashy Dec 29 '15 at 10:35
  • 1
    You will have to see if rle works better than lz4 in your case, but I doubt it. If you use say 2 bits per color, it will not look as good as 8 bit per color, but you will have cut your bandwidth needs by 4. You can say use 4 bit per color and see if that is acceptable. You will have to make that call. As I mentioned in the post, to reduce bandwidth, you should look at RTP with lossy video encoding options – Vikhram Dec 29 '15 at 11:27
  • 1
    Actually I thought using rle ( really basic one, not in the bits level) plus using the lz4. .I'll have to check this and let me ask , if for example there Is a pixel on the screen (45,220,85) how it would look after the bit reduction operation? (4,2,8)? It will completely look different. .. A few pixels would look fine but more the it..The image will twitch and it result in so much corrupted data. The maximum I can do in this case is something like the lsb encoding... – Slashy Dec 29 '15 at 12:01
  • On top of the compression, you could send a single set of pixels per horizontal line at one time, and adjust framerates to only allow 15/30 frames per second. This will lift off a lot of the network traffic. – zackery.fix Dec 29 '15 at 13:05
  • @zackery.fix what's the actualy meaning "set of pixels per horizontal line"? – Slashy Dec 29 '15 at 14:07
  • Client side buffer request based on the horizontal resolution of the screen being shared. If your resolution is 1024x768 only send over 1 pixel by 1024 pixels at a time starting from the top row of pixels, to the last row of pixels per frame. If you do this correctly you will not get V-sync tearing and the moderate frame-rate is sure to increase performance and reduce network traffic (at a given moment, because no matter what you do after compression, you will still have to send the entire frame of pixels over the line). – zackery.fix Dec 29 '15 at 18:34
  • @Slashy: [Color Quantization](https://en.wikipedia.org/wiki/Color_quantization) is used to reduce the number of bits used for a color. Usually the quantized colors are stored in a [Color Palette](https://en.wikipedia.org/wiki/Indexed_color) and only the index into this palette is given to the decoding logic – Vikhram Dec 29 '15 at 19:39
  • @zackery.fix not sure i've got the idea.. you actually say that i shall send single pixels? or even one pixels at a time? – Slashy Dec 30 '15 at 15:53
  • @Vikhram thank you so much for the help, but could you instruct me to find a fast algorithm for that color quantization part? or shall i do it by myself? :( – Slashy Dec 31 '15 at 10:09
  • You send a row of pixels at one time...1024 pixels (4096 bytes without compression) at a one time per row of pixels being rendered by the client. – zackery.fix Dec 31 '15 at 12:41
  • @Vikhram thank you so much for your help, i've just tried the nquant implemention ,unfortunately the encoding algorithm takes very long and it's perfomance is really low-1.58s to encode one medium(900x580) image – Slashy Dec 31 '15 at 18:28
  • Then you should try the Palette-based Quantization from the MSDN link – Vikhram Dec 31 '15 at 19:27
  • @Vikhram i've just looked at the msdn solution-they just do not explain nothing.. look-in the code part they call ` destinationPixel[x,y] = Quantize ( sourcePixel[x,y] ) ;` and there is no documantaion for that method... – Slashy Jan 02 '16 at 18:32
  • @Slashy The piece of code you are looking at explains how to handle the bitmap processing. The `Quantize` function has 2 possible implementations that they have provided below. I would recommend you to try the palette based approach if speed is what you are looking at. – Vikhram Jan 03 '16 at 00:35
1

Slashy,

Since you are using a high res frames and you want a good frame rate you're likely going to be looking at H.264 encoding. I've done some work in HD/SDI broadcast video which is totaly dependent on H.264, and a little now moving to H.265. Most of the libraries used in broadcast are written in C++ for speed.

I'd suggest looking at something like this https://msdn.microsoft.com/en-us/library/windows/desktop/dd797816(v=vs.85).aspx

Aaron Thomas
  • 1,770
  • 2
  • 13
  • 14
  • i've heard about that encoder long time ago but how actually i can implement that in my c# project? thanks for the attention :) – Slashy Dec 24 '15 at 18:18
  • 1
    The problem that I've run into with HiDef video streaming in C# is that the framework is just to slow when you are talking about massive amounts of real time data being streamed over copper. I recently tested some frame rates for output of HD/SDI data over a very expensive HD/SDI encoder card using a provided C# library. My max framerate was around 30fps without any graphics or overlays being generated in my code. Most hardware vendors don't provide C# libraries for this reason, Which means you're stuck writing a CLI wrapper for a C++ library. Or separate the encoding process into a C++ App – Aaron Thomas Dec 24 '15 at 18:24
  • @ AaronThomas i understand.. but what do you mean by "seperate the encoding process"? i dont think there is a real performance problem here because the processing part is done quite fast... – Slashy Dec 24 '15 at 20:27
  • Performing H.264 encoding for a hi res live video stream can be quite heavy. If you want to use something like http://www.ffmpeg-csharp.com/ and see if the framerate is good enough for you that's an option. The reason you aren't seeing heavy load during encoding/decoding is because of the nature of LZ4, it is a lossless standard designed for encoding/decoding speed, not maximum data compression. The only significant way to shrink your frames is to use a lossy compression like H.264 which will force you to think about encoding/decoding workloads but should compress much smaller. – Aaron Thomas Dec 24 '15 at 23:48
  • By separate process I mean you could write a C# UI and launch a C++ exe that performs the frame compression and spits it out to the network. Likely a C++ CLI wrapper around an H.264 codec will be your "best of both worlds" scenario. This may be a good informative read for you too http://stackoverflow.com/questions/5724423/h-264-or-similar-encoder-in-c – Aaron Thomas Dec 24 '15 at 23:52
  • i do want to mark your answer but i need more information and explanation about that.. come to private? – Slashy Dec 28 '15 at 12:53
  • 1
    What do you need more information on? I'd be happy to edit my original answer with some more examples and explanations. – Aaron Thomas Dec 28 '15 at 19:27
  • @ Aaron Thomas it's something more paritcular,for private – Slashy Dec 28 '15 at 19:54
  • 2
    @ Aaron Thomas CLI C++ applications run just as slow as C# applications... Honestly, I haven't seen a big difference between the two. Ofcourse we can do far more in C++ than in C#, but both VC++ and C# use the .NET frameworks CLI and memory management buffers. Its like using C++ and COM, or whatever... – zackery.fix Dec 29 '15 at 18:37