Speeding up writing images into hard disk in OpenCV

Question

I am working with a 50 fps camera (in Ubuntu environment and Qt framework) and every 20 ms I get a frame to process.

I wrote a code to read images from camera and then store them in hard drive.

while(3.14)
{
 cv::Mat Camera_Image = Capture_Image();
 double T1 = (double)cv::getTickCount();
 cv::imwrite (STORE_ADDRESS,Camera_Image);
 T1 = (((double)cv::getTickCount() -T1)*1000)/cv::getTickFrequency();
 print(T1);
}

when I see the output the time to write a single image into hard disk is around 30 ms for a 2048*1080 image size. each image is single channel (gray scale) but I'm writing them in .jpg format in hard disk. size of each image in hard disk is approximately 500Kbytes.

Since I'm capturing a frame in around 20 ms, I'm not able to write them all into hard disk in real time. I've written my code using Qthread and created a queue to see if there's any improvement but the results were the same and it was only a memory overhead.

Is it possible to change this situation, or use some other library to write these images into hard disk much faster? I would also prefer a Qt solution if available ...

Plus I need to write every single frame into hard disk so please do not propose to use Motion compression algorithms since they don't apply to my situation ....

Code: Mainwindow.cpp

 Qlist<cv::Mat> FINAL_IM_VEC;
MainWindow::MainWindow(QWidget *parent) :
  QMainWindow(parent),
  ui(new Ui::MainWindow)
{
  ui->setupUi(this);

  IMREAD *IMR = new IMREAD(this);   // an instance of IMREAD Class which reads camera frames
  IMWRITE *IMW = new IMWRITE(this);  // an instance of IMWRITE Class which Writes camera frames into hard disk
  QThread *IMAGE_READ_Thread = new QThread(this);
  QThread *Image_Store_Thread = new QThread(this);
  connect(IMAGE_READ_Thread,SIGNAL(started()),IMR,SLOT(IMREAD_Process()));
  connect(Image_Store_Thread,SIGNAL(started()),IMW,SLOT(IMWrite_Process()));
  IMR.moveToThread(IMAGE_READ_Thread);
  IMW.moveToThread(Image_Store_Thread);
  IMAGE_READ_Thread->start();
  Image_Store_Thread->start();
}

imread.hpp

class IMREAD : public QObject
{
    Q_OBJECT
public:
    explicit IMREAD(QObject *parent = 0);

signals:

public slots:
    void IMREAD_Process();
private:
    bool Stop;
};

imread.cpp

IMREAD::IMREAD(QObject *parent) :
    QObject(parent)
{
  this->Stop = false;
}

void IMREAD::IMREAD_Process()
{

  while(!Stop)
    {
          cv::Mat Image = CAM::Campture_F(25);//wait a maximum of 25 milisecond to grab a new frame
          if(Image.data())
            {
          FINAL_IM_VEC.push_back(Image);
            }
      }
    }

}

imwrite.hpp

#ifndef IMWRITE_H
#define IMWRITE_H
#pragma once
#include <QObject>
class IMWRITE : public QObject
{
    Q_OBJECT
public:
    explicit IMWRITE(QObject *parent = 0);
signals:

public slots:
    void IMWrite_Process();
private:
    bool Stop;
};

imwrite.cpp

IMWRITE::IMWRITE(QObject *parent) :
    QObject(parent)
{
  this->Stop =false;
}
void IMWRITE::IMWrite_Process()
{
    static int counter = 0;
    while(!Stop)
      {
        for(int i = 0 ; i < FINAL_IM_VEC.size() ; i++)
            {
                QString address = "/home/Provisioner/ThreadT/Results/" + QString::number(counter++) + ".jpg";
                cv::imwrite(address.toUtf8().constData(),FINAL_IM_VEC[i]);
                FINAL_IM_VEC.erase(FINAL_IM_VEC.begin() + i);
                i--;
            }
      }

}

Since this is just part of the whole project, I've removed some of its irrelevant parts ...but it shows how I wrote my multithreaded code in a big picture... so if there's any thing wrong with please inform me.

Thanks in advance.

Dear Laszlo I used qt tag since I expected other people to propose qt functions instead of opencv imwrite function .... — PsP, Jan 05 '14 at 08:26
Then, please write that into the question what you wish. I cannot read your mind. :) — László Papp, Jan 05 '14 at 08:28
@EmilioGaravaglia: you should have extended the question content, too. It is still unclear without reading the comments... — László Papp, Jan 05 '14 at 08:33
Please see here: http://meta.stackexchange.com/questions/158450/retagging-c-questions-as-c-without-consulting-asker This is the same type of problem: the user is asking answer also in the Qt domain. — Emilio Garavaglia, Jan 05 '14 at 08:34
I think the only way to speed it up is some hefty compression, but then you would lose the quality as a price. Would that be acceptable? — László Papp, Jan 05 '14 at 08:38
PANAHI, why don't you use an mpeg stream which records only the pixel different between frames as opposed to always sending an image through? You could even record the differences only, not the whole images. — László Papp, Jan 05 '14 at 09:04
Looks more like an algorithm question rather than programming, so perhaps math subsite would potentially better suit for this question. — László Papp, Jan 05 '14 at 10:14
You don't state: Do you require JPEG output? Because much of your time is consumed by the compression(or you have a 16 Mbyte/s drive). The image stream is 2048x1080x1x50 = 110 Mbyte/s. For 2013, most spinning disk drives can support that write throughput. Save as BMP. — jdr5ca, Jan 06 '14 at 09:06
@jdr5ca that was another good solution ... I'll try yours too ... thanks man — PsP, Jan 06 '14 at 20:05

user1906 · Accepted Answer · 2014-01-05T08:51:25.120

5

Let's see: 2048*1080*3(number of channels)*50 fps ~= 316MB/s, if you were writing the images in raw. If you're using JPEG, depending on the compression parameters you may get a substantial reduction, but if it's 1/5th, you're still writing a lot of data to the harddrive, specially if you're using a 5400rpm on a laptop.

Things you could do:

As David Schwartz suggests, you should use queues and multiple threads.
If you're effectively writing an image sequence, save a video instead. The data is compressed much more and the writing to disk is faster.
Check the specks of your current device and get an estimate of the maximum size of the images that you can write to it. Choose compression parameters to fit that size constraint.

edited Jan 05 '14 at 08:51

answered Jan 05 '14 at 08:36

user1906

2,310
2
20
37

How would queues help, really? The OP's point he cannot fulfil this operation within the obtaining interval. – László Papp Jan 05 '14 at 08:37
1

Depending on what image format he's using, the call to `cv::imwrite` has the compression component and the writing to harddrive component. The compression does benefit from using a secondary thread. – user1906 Jan 05 '14 at 08:40
@LaszloPapp Queues would increase the permitted interval. For example, if his write takes 30 ms and he needs to process an image every 20 ms, a queue with two workers servicing it would increase the permitted interval to 40 ms, which is more than 30 ms. – David Schwartz Jan 05 '14 at 08:42
As the OP wrote, he already uses multiple threads, so to me, it looks red herring. I believe, the OP is asking about a different technique he has not explored yet like a nice compression algorithm usable by Qt classes, etc. – László Papp Jan 05 '14 at 08:42
@DavidSchwartz: you continously skip the point, he was already mentioning multi-threads. – László Papp Jan 05 '14 at 08:42
@LaszloPapp Mentioning is not the same as using correctly. That's why I'm trying to explain to him how to use them correctly and why that will fix his problem. – David Schwartz Jan 05 '14 at 08:44
1

@DavidSchwartz: you are explaining general threading. If you think, he has coding problems, ask for details in comments. Answers are better avoided until you can point out the exact issue. – László Papp Jan 05 '14 at 08:45
1

Minor nitpick: OP specified 50 fps, not 30. – user2802841 Jan 05 '14 at 08:46
@LaszloPapp I know his exact issue to near certainty. He doesn't have one thread that captures images and multiple threads that concurrently service the same queue. (After you've seen the same problem description ten or twenty times, you start to get good at figuring out the explanation.) – David Schwartz Jan 05 '14 at 08:49
"I've written my code using Qthread and created a queue to see if there's any improvement but the results were the same and it was only a memory head over." -> Please consider deleting your answer for the time being. You can undelete it later if you have something more concrete to point the issue out. – László Papp Jan 05 '14 at 08:50
@LaszloPapp We now know that user1906 and I were precisely correct. And you know how we knew that before -- because it was the *only* explanation for the OP's problem. (Or, at least, the only one that's even remotely probable.) – David Schwartz Jan 05 '14 at 18:42

score 3 · Answer 2 · answered Jan 05 '14 at 16:49

There are multiple solutions generally possible, but you need to specify the format of your images - grayscale what? 8 bits? 12 bits? 16 bits?

Most other answers completely miss the mark by ignoring the physical reality of what you're trying to do: the bandwidth, both in terms of I/O and processing, is of primary importance.

Did you verify the storage bandwidth available on your system, in realistic conditions? It will be generally a bad idea to store this stream on the same drive your operating system lives on, because the seeks due to requirements of other applications will eat into your bandwidth. Remember that on a modern 50+Mbyte/s hard drive with 5ms seeks, one seek costs you 0.25MBytes of bandwidth, and that's rather optimistic since modern "run of the mill" hard drives read faster and seek slower, on average. I'd say 1MByte lost per each seek is a conservative estimate on yesteryear's consumer drives.

If you need to write raw frames and don't want to compress them even in a lossless fashion, then you need a storage system that can support the requisite bandwidth. Assuming 8 bit grayscale, you'll be dumping 2Mbytes/frame, at 50Hz that's 100Mbytes/s. A striped RAID 0 array of two contemporary off-the-shelf drives should be able to cope with it without problems.
If you are OK with burning some serious CPU or GPU for compression, but still want lossless storage, then JPEG2000 is the default choice. If you use a GPU implementation, it will leave your CPU alone for other things. I'd think the expected bandwidth reduction is 2x, so your RAID 0 will have plenty of bandwidth to spare. That would be the preferred way to use it - it will be very robust and you won't be losing any frames no matter what else the system is doing (within reason, of course).
If you are OK with lossy compression, then off-the-shelf jpeg libraries will do the trick. You'd probably want a 4x reduction in size, and the resultant 12.5Mbytes/s data stream can be handled by the hard drive the OS lives on.

As for the implementation: two threads are enough if there's no compression. One thread captures the images, another one dumps them to the drive. If you see no improvement compared to a single thread, then it's solely due to the bandwidth limitations of your drive. If you use GPU for compression, then one thread that handles compression is enough. If you use CPU for compression, then you need as many threads as there are cores.

There is no issue at all with storing image differences, in fact JPEG2k loves this and you my get an overall 2x compression improvement (for a total factor of 4x) if you're lucky. What you do is store a buch of difference frames for each reference frame stored in full. The ratio is based solely on the needs of the processing done afterwards - you're trading off resilience to data loss and interactive processing latency for decreased storage-time bandwidth.

I'd say anywhere between 1:5 and 1:50 ratio is reasonable. With the latter, the loss of the reference frame knocks out 1s worth of data, and randomly seeking anywhere in the data requires on average a read of a reference frame and 24 delta frames, plus the cost of decompressing 25 frames.

thanks ... I'll try your proposed solutions and see if it works ... ooh , I have a question about your explanation on writing in OS drive (Speed limitation of 25 MB/s)... I only have one drive and my operating system is ubuntu 12.04 ? Do you think this might be the source of evil for my program ? I thought in linux based OS's, there aint any issues like that — PsP, Jan 06 '14 at 05:22
@PANAHI: The OS doesn't have much to do with it. It is a physical limitation of the hard drive that you are using. You need to check the actual bandwidth of the drive! Whatever benchmark you use, ensure that direct drive access is used, since presence of caches will give you results influenced by cached data. — Kuba hasn't forgotten Monica, Jan 06 '14 at 05:28
It's basically the tade-off between compression and writing/hd speed, but there is one more thing to add: The OS MIGHT give some additional problems. I had a Debian environment where I wanted to write 30 fps, that worked well until some random time where many images were dropped (maybe a caching/defrag problem or anything else, still don't know). Adding a threaded queue there made it work again. — Micka, Jan 06 '14 at 09:15
@PANAHI: You also must check the size of the image actually written to disk by OpenCV. We're just assuming that it's single-channel image, it may happen to be stored in three (or four!) channels for some reason, thereby tripling or quadrupling the necessary bandwidth. At the moment, you simply don't know what's going on - you neither know the true size of the image, nor the writing speed the disk can cope with. — Kuba hasn't forgotten Monica, Jan 06 '14 at 15:08

score 3 · Answer 3 · edited Jun 20 '20 at 09:12

3

Compression is the key here.

Imwrite docs

For JPEG, it can be a quality ( CV_IMWRITE_JPEG_QUALITY ) from 0 to 100 (the higher is the better). Default value is 95.

For PNG, it can be the compression level ( CV_IMWRITE_PNG_COMPRESSION ) from 0 to 9. A higher value means a smaller size and longer compression time. Default value is 3.

For PPM, PGM, or PBM, it can be a binary format flag ( CV_IMWRITE_PXM_BINARY ), 0 or 1. Default value is 1.

For .bmp format compression is not needed since it directly writes the bitmap.

In summary: Image write time png > jpg > bmp

If you don't care about the disk size, I would say go with .bmp format which is almost 10 times faster than writing a png and 6 times faster than writing a jpg.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 13 '17 at 23:58

Ankur Jain

507
3
9

Write time image for bmp is more than the two ! I think you made a mistake here ! – PsP Aug 08 '17 at 14:48
1

I think you have some mistakes in your code. I check using kernprof and writing bmp is always faster than writing jpg or png. – Ankur Jain Aug 08 '17 at 18:12

David Schwartz · Answer 4 · 2014-01-05T18:41:30.530

0

You should have a queue of images to be processed. You should have a capture thread that captures images and places them on the queue. You should have a few compress/write threads that take images off the queue and compress/write them.

There's a reason CPUs have more than one core these days -- so you don't have to finish one thing before you start the next.

If you believe that this was what you were doing and you are still seeing the same issue, show us your code. You are most likely doing this incorrectly.

Update: As I suspected, you are using threads in a way that doesn't accomplish the objective for using threads in the first place. The whole point was to compress more than one image at a time because we know it takes 30 ms to compress an image and we know that compressing one image every 30 ms is insufficient. The way you are using threads, you still only try to compress one image at a time. So 30 ms to compress/write an image is still too long. The queue serves no purpose, since only one thread reads from it.

edited Jan 05 '14 at 18:41

answered Jan 05 '14 at 08:28

David Schwartz

179,497
17
214
278

THis is somehow the pseudo code here .... I've written my code in multithreaded way ... but there were no any sign of speed improvement .... – PsP Jan 05 '14 at 08:34
it is worth noting that creating this queue is dangerous since its size is going to increase every second ... and there's no any benefit in it ... (Memory filling and other issues) – PsP Jan 05 '14 at 08:35
@David, how does your answer speed up the disk write? I do not think the OP can do significant improvements. – László Papp Jan 05 '14 at 08:36
@LaszloPapp The "disk write" includes image compression. It's not the I/O that's taking time. – David Schwartz Jan 05 '14 at 08:38
1

@PANAHI There was probably something wrong with the multi-threaded code. And the benefit is huge -- you can use multiple cores to compress images concurrently. You do have to bound the queue, of course. If it's growing, that means you can't keep up with the image sampling rate and need to drop images or adjust it. – David Schwartz Jan 05 '14 at 08:39
I do not follow. The OP gets images per 20 ms, and the disk write require 30 ms. The question was how to make it faster. Threading is not a solution for this. That does not make the write faster itself. IMO, the threading is red herring here. To me, it seems he is more into compression techniques and trade-offs. – László Papp Jan 05 '14 at 08:40
if you're going to do the compression yourself, you should know how to do it .. since I don't know the details of JPEG compression algorithm I don't know what to do !!! – PsP Jan 05 '14 at 08:43
@LaszloPapp If he had two threads **correctly** pulling from a queue, then the compress/write taking 30 ms wouldn't be a problem. (Either the code waits for I/O to complete or it doesn't. If it doesn't, then it must not be the I/O that's taking time, so threads will help. If it is, then pending more I/Os at a time will improve throughput, so again threads will help.) – David Schwartz Jan 05 '14 at 08:44
He **already** had threads. You do not seem to wish to accept it. Ask for details in comments to proceed if you think it is something about the not-yet-shown code. – László Papp Jan 05 '14 at 08:46
@LaszloPapp Threads are not some magic dust you can sprinkle on code to make it go faster. You must use them appropriately, as I explained in my answer. And you must understand *why* they must work, so if they don't you can figure out what is wrong. The OP clearly doesn't understand why threads will solve his problem, so obviously trying to get them to solve it isn't going to work for him. He must first understand what the source of his problem is and how and why threads will solve it. – David Schwartz Jan 05 '14 at 08:47
@DavidSchwartz: your claims are unfounded to me at this point. It is possible the OP uses everything correct, but it just takes that much time with multi-threading in which case the solution is a better compression algorithm, or something else, but definitely not threading since that is solved. I do not think you can be a mind reader without asking for more information. – László Papp Jan 05 '14 at 08:48
@LaszloPapp Explain why multiple threads, if used correctly, wouldn't speed up the compression. I don't have to read minds, I just have to have seen the same problem a dozen times. (See above for my explanation of why it will improve the I/O.) – David Schwartz Jan 05 '14 at 08:50
You are **stil** stuck to multi-threading despite what the OP writes: "I've written my code using Qthread and created a queue to see if there's any improvement but the results were the same and it was only a memory head over." – László Papp Jan 05 '14 at 08:51
@LaszloPapp Thus demonstrating that he did so incorrectly because, as I explained, if done correctly this will make a huge difference. – David Schwartz Jan 05 '14 at 08:53
@DavidSchwartz thanks for your comments ... but I tried my code in multithreaded way .. And I know how to do threading in appropriate way but believe me it didn't worked for me .... I didn't use any semaphore or mutex in threads to make sure that they will run simultaneously but at the end there were not any significant improvement (it became faster but not fast enough to compensate the 20 ms frame rate) – PsP Jan 05 '14 at 08:54
@DavidSchwartz: me and the OP are telling you that your answer is irrelevant based on the technical content. I would have suggested improvement, but IMO it is fundamentally wrong, and would need deletion for now, at least, but then again, this is just IMHO. – László Papp Jan 05 '14 at 08:56
@PANAHI Can you give us the details? How many threads did you use? How long did the compression/write take? What you're saying is almost impossible. – David Schwartz Jan 05 '14 at 09:10
@DavidSchwartz: I _really_ do not get what is so unbelievable about an operation taking 30 ms in a multi-threaded application. Btw, I already asked for details as you should have done in the first place. ;-) – László Papp Jan 05 '14 at 09:16
@DavidSchwartz I've added the code to the question please tell me if there's anything wrong with it ... Thank you David – PsP Jan 05 '14 at 10:03
@PANAHI You only have one store thread, so you still are only compressing one image at a time. The whole point of using threads was to allow you to compress more than one image at a time. As I said in my answer: "**You should have a few compress/write threads that take images off the queue and compress/write them.**" – David Schwartz Jan 05 '14 at 18:37
Thanks Dear David .. I'll try your solution and inform you if it'd work ... Thanks again – PsP Jan 06 '14 at 05:48

score 0 · Answer 5 · answered Jan 05 '14 at 09:25

0

I would suggest taking a look into the QtMultimedia module, and if you are dealing with streams as opposed to images, try to convert your code to MPEG.

That will avoid dealing with every pixel all the time as only the pixel differences will be processed. That could potentially a give performance increase for the processing.

You could, for sure, also take a look at heftier compression algorithms, but that is outside the scope of Qt, and the Qt deal would probably be just interfacing the algorithms.

answered Jan 05 '14 at 09:25

László Papp

51,870
39
111
135

Thanks but I need to have access to each single frame in future time ... I've added this to my question and put the code to be analyzed ... – PsP Jan 05 '14 at 10:02
Why do you need each single frame of a stream? It sounds unusual. Could you please elaborate? – László Papp Jan 05 '14 at 10:03
They are not some normal images as you think ... these camera frames are used to detect some specific abnormal creatures which only exist for 1 frame and then disappears in the next frame .. and after storing these moments , we will process these images to count these creatures – PsP Jan 05 '14 at 10:06
Can you elaborate why you could not do that with pixel difference storage? – László Papp Jan 05 '14 at 10:07
Sorry it is not as easy as you think ... the quality of difference output depends on many factors .. many subtraction algorithms have been proposed for that and we tested more than 40 algorithms !!!.... also existence of some noise e.g Gaussian and pepper and salt noise may interfere in obtaining our desired objects ... – PsP Jan 05 '14 at 10:10
I do not follow. You do not want to compress, but you wanna get it faster? Sounds like a contradiction right in there. If you want to speed it up, use compression without storing every pixel each time, etc. Otherwise, I would suggest re-architecture your project with hardware speed up like using dedicated DSPs with instrinsics and VLW architecture, etc. – László Papp Jan 05 '14 at 10:12
Dear Laszlo... my question is about faster writing functions and not compressing (Mytopic says this :) ... I want to make faster my algorithm by writing faster into the hard disk and not avoiding compression ... the compression is not the issue in here ... I hope you get it now ... Motion compression is not good for me .. I just want to store the frames as soon as I get them from the camera .. – PsP Jan 05 '14 at 10:34
I do not get how you wish to speed an operation that requires a fix amount of time (disk io). To my knowledge and experience with image processing, the only place for improvement is compression to write less. If you write "compression is not the issue", you have to stick to the slow speed. Moreover, your question is off-topic on SO. SO is for programming, not algorithm development. There is math SE and other sites for that. – László Papp Jan 05 '14 at 10:36
@LaszloPapp Then you don't have much experience with image processing. The best place for improvement in compression is to parallelize and do more things at a time. – David Schwartz Jan 06 '14 at 05:46
@LaszloPapp We now know that he did the threading wrong, precisely as I said all along. See his code, my updated answer, and his comment to it. And it had to be this way -- it was the only plausible explanation for his symptoms. You were wrong all along. You are the one continuing to refuse to listen. If you still think I'm wrong, provide another plausible explanation for the OP's problem. When there is only one possibility -- that is the explanation. – David Schwartz Jan 06 '14 at 06:08
@LaszloPapp If you must, conclude that I'm a lucky guesser then. But the truth is that having seen dozens of issues dozens of times each, you get to the point where you can see the way people explain a problem and know what the problem is. Yes, many people would not be able to reach that conclusion without more details. But as I've explained, there is no other plausible explanation, even given just what he gave initially. If you think there's another explanation, tell me what it is. You continue to stick to your position despite now overwhelming contrary evidence. – David Schwartz Jan 06 '14 at 06:23
Fwiw, I have been working with DSPs for 4-5+ years in the Image Processing field. Either way, the OP explicitly mentioned what to do on top of threading? I think, it is not an accident all the answers (except yours) mention that. Even if threading could be improved, compressing is another place to take a look at, so is intrinsitics, VLVW architectures, dedicated DSPs, et cetera. – László Papp Jan 06 '14 at 10:14
@LaszloPapp We heard hoofs. I said it was probably horses. We now know it was horses. Why are you still confusing everyone with irrelevant nonsense about zebras and hippopotami? This isn't some complicated, obscure question. The OP clearly didn't understand how and why threading was vital to solving his problem because if he had, he wouldn't have said, "I tried threading and it didn't help". It's not something you try to see if it helps, it's the crux to solving his problem. You must get more than one core compressing, and if that doesn't speed it up at all, you must be doing it wrong. – David Schwartz Jan 06 '14 at 11:20
Dear Laszlo and David please stop arguing ... I feel ashamed for what I've done by asking this question ... I promise you two, to test and write a proper code and tell you if multiple threads for writing and compressing solve my problem or not .... Believe me reading your comments hurt me and make me to feel guilty !!!! – PsP Jan 06 '14 at 19:57
beside that, if you two are familiar with opencv and qt (I'm certain that you are familiar) you can test and see what would be the results I think its less than an hour for you two to see if your solutions work or not !!! , Dear laszlo I'm familiar with C6000 TI DSPs and Also Davinci and OMAP series but I'm very busy at the moment and I'll try your solution in an appropriate time ... thank you – PsP Jan 06 '14 at 20:01

Speeding up writing images into hard disk in OpenCV

5 Answers5