0

I am using labelImg to draw a rectangle on the rows of image. Which gives me the xml file . With the help of this xml how to extract that text from the image table . To extract the text I have used the horizontal and vertical ine detection but do not get good result. Now I am using the labelImg which gives me the points of that text which want to extract but I do not know how to apply the method for this .Please tell me how to do that ?

My xml file :

    <annotation>
      <folder>Test Images</folder>
      <filename>FreKa.jpg</filename>
      <path>/home/sumit/Desktop/office_works/Fusion_Code/BIS_Final/Test Images/FreKa.jpg</path>
      <source>
         <database>Unknown</database>
        </source>
      <size>
         <width>679</width>
         <height>341</height>
         <depth>3</depth>
         </size>
         <segmented>0</segmented>
       <object>
         <name>Contact Type</name>
         <pose>Unspecified</pose>
         <truncated>1</truncated>
         <difficult>0</difficult>
         <bndbox>
           <xmin>1</xmin>
           <ymin>100</ymin>
           <xmax>678</xmax>
           <ymax>157</ymax>
        </bndbox>
       </object>
       </annotation>

My input images :

Input images

how to extract the contract type from the table with the help of the xml file ? Thanks...

Amit Saini
  • 136
  • 2
  • 16
  • 1
    which value in xml do you want to get ? You can use `xpath()` in module `lxml` for this - ie. `'//annotation/object/bndbox/xmin'` – furas May 13 '21 at 19:44
  • or you could use regex - `re.findall('(\d+)', text)` – furas May 13 '21 at 19:48
  • With the help of the xml I want to extract the 3rd rows of the images. – Amit Saini May 14 '21 at 03:35
  • can you tell me how to get the object name (such as contract, contract description ,etc )using xml ? – Amit Saini May 14 '21 at 08:00
  • I get all the points and extract the text .It working perfectly . One more questions I want to ask to you how to count the number of Object name which are in our xml file ? – Amit Saini May 14 '21 at 13:17
  • 1
    if you mean `` then `//annotation/object` should give you list with all `` and then you can use `len(list_with_objects)`. Or you can do the same with `//annotation/object/name` - it should give you list with all names and you can use `len(list_with_names)` – furas May 14 '21 at 13:21

1 Answers1

0

To get xmin you can use xpath() with '//annotation/object/bndbox/xmin' or even shorter '//xmin'

It always gives list (even if there is only one element or there are no elements) so it will need [0] to get first element or for-loop to work with all elements.

Using if list_of_elelemts: ... you can run code only when list has some elements.

You can also use len() to check how many elements you get.

text = '''
<annotation>
  <folder>Test Images</folder>
  <filename>FreKa.jpg</filename>
  <path>/home/sumit/Desktop/office_works/Fusion_Code/BIS_Final/Test Images/FreKa.jpg</path>
  <source>
     <database>Unknown</database>
  </source>
  <size>
     <width>679</width>
     <height>341</height>
     <depth>3</depth>
  </size>
  <segmented>0</segmented>
  <object>
     <name>Contact Type</name>
     <pose>Unspecified</pose>
     <truncated>1</truncated>
     <difficult>0</difficult>
     <bndbox>
       <xmin>1</xmin>
       <ymin>100</ymin>
       <xmax>678</xmax>
       <ymax>157</ymax>
     </bndbox>
  </object>
</annotation>
'''

import lxml.etree

tree = lxml.etree.fromstring(text)

print('xmin:', tree.xpath("//annotation/object/bndbox/xmin")[0].text)
print('xmin:', tree.xpath("//bndbox/xmin")[0].text)
print('xmin:', tree.xpath("//object//xmin")[0].text)
print('xmin:', tree.xpath("//xmin")[0].text)

print('xmin:', tree.xpath("//xmin/text()")[0])  # with `text()` instead of `.text`

for item in tree.xpath("//xmin/text()"):
    print('xmin:', item)  # with `text()` instead of `.text`

objects = tree.xpath("//object")
print('len(objects):', len(objects))

other = tree.xpath("//bndbox/other")
if other:
    print('found', len(other), 'elements')
else:
    print('there is no "other" elements')
furas
  • 134,197
  • 12
  • 106
  • 148
  • Hi @furas , Can u tell me how to count the numbers of line inside the boundary boxes ,which are filled by some words(hindi) ? – Amit Saini May 19 '21 at 11:36
  • I don't know if I understand problem: first you would have to convert boxes to strings and then you would have to check if there is some hindi chars. Or check if there are chars different then english chars - it could be simpler. – furas May 19 '21 at 12:23