0

Cobalt will hang(block) indefinitely after calling the ApplicationDirectFB::Get()->Stop() function, and can not exit, and the backtrace when hung is as follows, could anyone help to have a look?

        <unknown> [0xb5d988f4]
        SbConditionVariableWait [0xbd598]
        base::WaitableEvent::TimedWait() [0xa0f1c]
        base::WaitableEvent::Wait() [0xa0ff8]
        cobalt::storage::StorageManager::FinishIO() [0x374454]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]
        cobalt::storage::StorageManager::~StorageManager() [0x374750]

If I comment the no_flushes_pending_.Wait(); in StorageManager::FinishIO in src/cobalt/storage/storage_manager.cc, it will not hung(block), and can exit successfully

void StorageManager::FinishIO() {
  TRACE_EVENT0("cobalt::storage", __FUNCTION__);
  DCHECK(!sql_message_loop_->BelongsToCurrentThread());
  // The SQL thread may be communicating with the savegame I/O thread still,
  // flushing all pending updates.  This process can require back and forth
  // communication.  This method exists to wait for that communication to
  // finish and for all pending flushes to complete.
  // Start by finishing all commands currently in the sql message loop queue.
  // This method is called by the destructor, so the only new tasks posted
  // after this one will be generated internally.  We need to do this because
  // it is possible that there are no flushes pending at this instant, but there
  // are tasks queued on |sql_message_loop_| that will begin a flush, and so
  // we make sure that these are executed first.
  base::WaitableEvent current_queue_finished_event_(true, false);
  sql_message_loop_->PostTask(
      FROM_HERE,
      base::Bind(&base::WaitableEvent::Signal,
                 base::Unretained(&current_queue_finished_event_)));
  current_queue_finished_event_.Wait();
  // Now wait for all pending flushes to wrap themselves up.  This may involve
  // the savegame I/O thread and the SQL thread posting tasks to each other.
  //no_flushes_pending_.Wait();  -->Comment it
}
bitchainer
  • 535
  • 2
  • 19

2 Answers2

0

This is not the best answer because I only vaguely remember encountering this before, but I couldn't find the reference to it anywhere to confirm. I believe this happens when one of the SbStorage APIs doesn't return the right value, perhaps on an error?

David Ghandehari
  • 534
  • 3
  • 12
  • @daivd, I can not find the SbStorage keyword in whole the source code, so which file do you mean the SbStorage API located in? – bitchainer Mar 13 '17 at 01:23
  • do you know where the data to be stored in StorageManager? – bitchainer Mar 13 '17 at 03:29
  • I have added some log and found StorageManager::OnFlushIOCompletedSQLCallback has never been called, and make it can not call no_flushes_pending_.Signal(); to unkock the wait. – bitchainer Mar 13 '17 at 03:37
  • The storage interface is in src/starboard/storage.h – David Ghandehari Mar 13 '17 at 06:29
  • The data flow is: Cobalt stores data in SQLite, which writes it out to an in-memory virtual file system. That buffer goes through StorageManager, which then it writes it out to SbStorageWriteRecord. – David Ghandehari Mar 13 '17 at 06:33
  • I think some response code from some SbStorage API will cause the completed callback to fail. And if the API fails, I think it just doesn't call that callback. Which version of Cobalt are you on? This may have been fixed upstream. – David Ghandehari Mar 13 '17 at 06:35
  • Our cobalt version is 8.21796. I also find the SbStorageWriteRecord and add some log to it, it seemed it's not called. As some partition is ready only, maybe cobalt write file to the ready only partition which make something fail, so is there a way to change the file path to have a try? – bitchainer Mar 13 '17 at 06:46
  • after tracing and debuging the code, I got the root cause, thanks so much for you kind help! – bitchainer Mar 13 '17 at 08:16
  • It seems like there is still a bug here in Cobalt. If the write fails, it should not hang indefinitely. This is probably worth adding to our public bug tracker, if you have time. – David Ghandehari Mar 13 '17 at 15:13
  • the root cause is that it will write date to $HOME/.starboard.storage(which is set in starboard/shared/linux/get_home_directory.cc), but on some platform, the partition is read only, and it will make the write failed and hang indefinitely. But as you said, even if failed, it should not hang. May be you can improve it in future. – bitchainer Mar 14 '17 at 00:10
  • I have posted the issue to google public bug tracker as follows: https://issuetracker.google.com/issues/36200954, and you can check it, thank you so much for your kind help! – bitchainer Mar 14 '17 at 00:40
0

The root cause is that it will write data to $HOME/.starboard.storage(which is set in starboard/shared/linux/get_home_directory.cc), but on some platform, the partition is read only, and it will make the write failed and hang indefinitely, so it need to change the file path of .starboard.storage to some writeable partition.

bitchainer
  • 535
  • 2
  • 19