2

The application's purpose is to translate lemmas of words present in the sentence from Russian to English. I'm doing it with help of sdict formatted vocabulary, which is queried by python script which is called by c++ program.

My purpose is to get the following output :

Выставка/exhibition::1 конгресс/congress::2 организаторами/organizer::3 которой/ which::4 являются/appear::5 РАО/NONE::6 ЕЭС/NONE::7 России/NONE::8 EESR/NONE::9 нефтяная/oil::10 компания/company::11 ЮКОС/NONE::12 YUKOS/NONE::13 и/and::14 администрация/administration::15 Томской/NONE::16 области/region::17 продлится/last::18 четыре/four::19 дня/day::20

The following code succeeded for the sentence, however for the second sentence and so on I get a wrong output:

Егор/NONE::1 Гайдар/NONE::2 возглавлял/NONE::3 первое/head::4 российское/first::5 правительство/NONE::6 которое/government::7 называли/which::8 правительством/call::9 камикадзе/government::10

Note: NONE is used for words lacking translation.

I'm running the following C++ code excerpt which actually calls PyRun_SimpleString:

for (unsigned int i = 0; i < theSentenceRows->size(); i++){

  stringstream ss;
  ss << (i + 1);
  parsedFormattedOutput << theSentenceRows->at(i)[FORMINDEX] << "/";
  getline(lemmaOutFileForTranslation, lemma);

  PyObject *main_module, *main_dict;
  PyObject *toTranslate_obj, *translation, *emptyString;
  /* Setup the __main__ module for us to use */
  main_module = PyImport_ImportModule("__main__");
  main_dict   = PyModule_GetDict(main_module);

  /* Inject a variable into __main__, in this case toTranslate */
  toTranslate_obj = PyString_FromString(lemma.c_str());
  PyDict_SetItemString(main_dict, "start_word", toTranslate_obj);

  /* Run the code snippet above in the current environment */
  PyRun_SimpleString(pycode);
  **usleep(2);**
  translation = PyDict_GetItemString(main_dict, "translation");
  Py_XDECREF(toTranslate_obj);

  /* writing results */
  parsedFormattedOutput << PyString_AsString(translation) << "::" << ss.str() << " ";

Where pycode is defined as:

const char *pycode =
    "import sys\n"
    "import re\n"
    "import sdictviewer.formats.dct.sdict as sdict\n"
    "import sdictviewer.dictutil\n"
    "dictionary = sdict.SDictionary( 'rus_eng_full2.dct' )\n"
    "dictionary.load()\n"
    "translation = \"*NONE*\"\n"
    "p = re.compile('( )([a-z]+)(.*?)( )')\n"
    "for item in dictionary.get_word_list_iter(start_word):\n"
    "        try:\n"
    "            if start_word == str(item):\n"
    "                instance, definition = item.read_articles()[0]\n"
    "                translation = p.findall(definition)[0][1]\n"
    "        except:\n"
    "            continue\n";

I've noticed some delay in the second sentence's output, so I added the usleep(2); to C++ while thinking that it happens because calling PyRun_SimpleString is not synchronous. It didn't help, however and I'm not sure that this is the reason. The delay bug happens for sentences that follow and increases.

So, is the call to PyRun_SimpleString synchronous? Maybe, sharing of variable values between C++ and Python is not right? Thank you in advance.

rok
  • 9,403
  • 17
  • 70
  • 126
  • 3
    Get rid of the `except:\ncontinue`, its probably masking whatever the real problem is. Add error handling code so that python will tell what errors arise. There's no point in try to debug code when you aren't letting your code speak to you. – Winston Ewert Oct 24 '12 at 19:47

1 Answers1

1

According to the docs, it is synchronous.

I would advise you to test the python code seperately from the C++ code, that would make debugging it much easier. One way of doing that is pasting the code in the interactive interpreter and executing it line by line. And when debugging, I would second Winston Ewert's comment to not discard exceptions.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94