1

I need advice on the below

Below are the request and response XML's. Request XML contains the words to be translated in the Foriegn language [String attribute inside Texts node] and the response XML contains the translation of these words in English [inside ].

REQUEST XML

    <TranslateArrayRequest>
          <AppId /> 
    <From>ru</From> 
    <Options> 
            <Category xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" ></Category> 
            <ContentType xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2">text/plain</ContentType> 
            <ReservedFlags xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" /> 
            <State xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" ></State> 
            <Uri xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" ></Uri> 
             <User xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" ></User> 
        </Options> 
        <Texts> 
        <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">вк азиза и ринат</string> 
        <string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">скачать кайда кайдк кайрат нуртас бесплатно</string>
   </Texts> 
    <To>en</To> 
</TranslateArrayRequest>

RESPONSE XML

    <ArrayOfTranslateArrayResponse xmlns="http://schemas.datacontract.org/2004/07/Microsoft.MT.Web.Service.V2" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
        <TranslateArrayResponse>
            <From>ru</From>
            <OriginalTextSentenceLengths xmlns:a="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><a:int>16</a:int>
            </OriginalTextSentenceLengths>
            <State/>
            <TranslatedText>BK Aziza and Rinat</TranslatedText>
            <TranslatedTextSentenceLengths xmlns:a="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><a:int>18</a:int>
            </TranslatedTextSentenceLengths>
        </TranslateArrayResponse>
        <TranslateArrayResponse>
            <From>ru</From>
            <OriginalTextSentenceLengths xmlns:a="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><a:int>43</a:int>  </OriginalTextSentenceLengths>
            <State/>
            <TranslatedText>Kairat kajdk Qaeda nurtas download free</TranslatedText>
            <TranslatedTextSentenceLengths xmlns:a="http://schemas.microsoft.com/2003/10/Serialization/Arrays"><a:int>39</a:int></TranslatedTextSentenceLengths>
        </TranslateArrayResponse
</ArrayOfTranslateArrayResponse>
user321956
  • 13
  • 4
  • 1
    what did you try already? What exactly does not work? – Joram Jun 03 '14 at 12:43
  • import xml.etree.ElementTree as ET #root = ET.fromstring(MS_Temp) tree = ET.ElementTree(file='MS_Temp.xml') root=tree.getroot() texts = root.find('Texts') for data in texts: print data.text – user321956 Jun 03 '14 at 13:10
  • I tried extracting the strings from the request XML. I'm kind of struggling on how to get the translated term for each of the search term from the response XML. Please advice. – user321956 Jun 03 '14 at 13:12

1 Answers1

0

So there are two ways to relate the translated text to the original text:

  1. Length of the original text; and
  2. Order in the XML file

Relating by length being the probably unreliable because the probability of translating 2 or more phrases with the same number of characters is relatively significant.

So it comes down to order. I think it is relatively safe to assume that the files were processed and written in the same order. So I'll show you a way to relate the phrases using the order of the XML files.


This is relatively simple. We simply iterate through the trees and grab the words in the list. Also, for the translated XML due to its structure, we need to grab the root's namespace:

import re
import xml.etree.ElementTree as ElementTree

def map_translations(origin_file, translate_file):
    origin_tree = ElementTree.parse(origin_file)
    origin_root = origin_tree.getroot()

    origin_text = [string.text for text_elem in origin_root.iter('Texts')
                   for string in text_elem]

    translate_tree = ElementTree.parse(translate_file)
    translate_root = translate_tree.getroot()

    namespace = re.match('{.*}', translate_root.tag).group()


    translate_text = [text.text for text in translate_root.findall(
                          './/{}TranslatedText'.format(namespace))]

    return dict(zip(origin_text, translate_text))


origin_file = 'some_file_path.xml'
translate_file = 'some_other_path.xml'

mapping = map_translations(origin_file, translate_file)

print(mapping)

Update

The above code is applicable for Python 2.7+. In Python 2.6 it changes slightly:

  • ElementTree objects do not have an iter function. Instead they have a getiterator function.

    Change the appropriate line above to this:

    origin_text = [string.text for text_elem in origin_root.iter('Texts')
                   for string in text_elem]
    
  • XPath syntax is (most likely) not supported. In order to get down to the TranslatedText nodes we need to use the same strategy as we do above:

    Change the appropriate line above to this:

    translate_text = [string.text for text in translate_root.getiterator(
                          '{0}TranslateArrayResponse'.format(namespace))
                          for string in text.getiterator(
                          '{0}TranslatedText'.format(namespace))]
    
BeetDemGuise
  • 954
  • 7
  • 11
  • Thank you. For testing purpose, I'm having the request and response XML in a file. How do I pass the origin_file and translate_file. Is the below the right way to do? python MS_Match.py MS_Temp.xml MS_Res.xml. Another question is, if the request and response xml are being generated dynamically in another script, How would I plug-in this code? – user321956 Jun 04 '14 at 14:45
  • All you need to do is place this code inside a python script, change out the `origin_file` and `translate_file` definitions, then run the script using the `python` command. – BeetDemGuise Jun 04 '14 at 14:48
  • Thank you Darin. I'm getting an error while executing. import re import xml.etree.ElementTree as ElementTree def map_translations(origin_file, translate_file): ............above code.......................... map_translations("MS_Temp.xml","MS_Res.xml"); Traceback (most recent call last): File "MS_Match.py", line 21, in map_translations("MS_Temp.xml","MS_Res.xml"); File "MS_Match.py", line 9, in map_translations origin_text = [string.text for text_elem in origin_root.iter('Texts') AttributeError: _ElementInterface instance has no attribute 'iter' – user321956 Jun 04 '14 at 18:09
  • I have updated my code above. I'm not sure it will fix your bug because I was unable to reproduce the bug. Is your `MS_Temp.xml` the same as your above **REQUEST XML**? – BeetDemGuise Jun 04 '14 at 19:17
  • Hi Darin, Yes MS_Temp.xml is the same as Request XML and MS_Res.xml is Response XML mapping = map_translations(origin_file, translate_file) File "funcprime.py", line 7, in map_translations origin_text = [string.text for text_elem in origin_root.iter('Texts') AttributeError: _ElementInterface instance has no attribute 'iter' – user321956 Jun 04 '14 at 19:51
  • Just for your info, I'm using Python 2.6 – user321956 Jun 04 '14 at 19:52
  • I fixed the issue by changing the iter function to getiterator(). The code is now giving issues with the translate_tree which I m working on. – user321956 Jun 04 '14 at 20:08
  • Another error in the translate response code['Hello', 'World'] Traceback (most recent call last): File "funcprime.py", line 30, in mapping = map_translations(origin_file, translate_file) File "funcprime.py", line 19, in map_translations '{}TranslateArrayResponse'.format(namespace)) ValueError: zero length field name in format – user321956 Jun 04 '14 at 20:53
  • Take a look [here](http://stackoverflow.com/questions/5446964/valueerror-zero-length-field-name-in-format-error-in-python-3-0-3-1-3-2) to fix your problem. – BeetDemGuise Jun 05 '14 at 13:25
  • Thank you Darin. I fixed it. One more question for you. Now I need to integrate this logic to my existing program. My existing program produces the XML request and response dynamically. So instead of passing the XML request and response from files, I need to pass the input as variables. Could you please suggest on how I should approach? – user321956 Jun 05 '14 at 15:59
  • You can look at the documentation [here](https://docs.python.org/2.6/library/xml.etree.elementtree.html). If you'r XML is in a string, you can use `ElementTree.fromstring(xml)` to get the object you are expecting. – BeetDemGuise Jun 05 '14 at 16:59
  • Hi Darin, Could you please advise me on the below issue. http://stackoverflow.com/questions/24112731/filtering-the-nones-before-updating-the-table – user321956 Jun 09 '14 at 15:37