How can I train the EAST text detector on my custom data. There aren't any blogs online that shows step by step procedure to do the same. What I have currently.
I have a folder that contains all the images and corresponding xml file for each of our images that tells where our text are located.
Example :
<annotation>
<folder>Dataset</folder>
<filename>FFDDAPMDD1.png</filename>
<path>C:\Users\HPO2KOR\Desktop\Work\venv\Patent\Dataset\Dataset\FFDDAPMDD1.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>839</width>
<height>1000</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>522</xmin>
<ymin>29</ymin>
<xmax>536</xmax>
<ymax>52</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>510</xmin>
<ymin>258</ymin>
<xmax>521</xmax>
<ymax>281</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>546</xmin>
<ymin>528</ymin>
<xmax>581</xmax>
<ymax>555</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>523</xmin>
<ymin>646</ymin>
<xmax>555</xmax>
<ymax>674</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>410</xmin>
<ymin>748</ymin>
<xmax>447</xmax>
<ymax>776</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>536</xmin>
<ymin>826</ymin>
<xmax>567</xmax>
<ymax>851</ymax>
</bndbox>
</object>
<object>
<name>text</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>792</xmin>
<ymin>918</ymin>
<xmax>838</xmax>
<ymax>945</ymax>
</bndbox>
</object>
</annotation>
Also I have the parsed xml file for each one of my images in the format which is used to train YOLO models.
Example
C:\Users\HPO2KOR\...\text\FFDDAPMDD1.png 522,29,536,52,0 510,258,521,281,0 546,528,581,555,0 523,646,555,674,0 410,748,447,776,0 536,826,567,851,0 792,918,838,945,0 660,918,706,943,0 63,1,108,24,0 65,51,110,77,0 65,101,109,126,0 63,151,110,175,0 63,202,109,228,0 63,252,110,276,0 63,303,110,330,0 62,353,110,381,0 65,405,109,434,0 90,457,110,482,0 59,505,101,534,0 64,565,107,590,0 61,616,107,644,0 62,670,103,694,0 62,725,104,753,0 63,778,104,804,0 62,831,100,857,0 87,887,106,912,0 98,919,144,943,0 240,916,284,943,0 378,915,420,943,0 520,918,565,942,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD2.png 91,145,109,171,0 68,192,106,218,0 92,239,111,265,0 69,286,108,311,0 92,333,107,357,0 66,379,110,405,0 90,424,111,451,0 69,472,107,497,0 91,518,109,545,0 66,564,109,590,0 90,613,110,637,0 121,644,140,670,0 279,643,322,671,0 446,645,490,668,0 615,642,661,669,0 786,643,831,667,0 954,643,997,672,0 820,22,866,50,0 823,73,866,103,0
C:\Users\HPO2KOR\...\text\FFDDAPMDD3.png 648,1,698,30,0 68,64,129,91,0 55,144,128,168,0 70,218,129,247,0 56,300,127,326,0 71,377,125,404,0 58,459,127,482,0 109,535,130,560,0 140,568,160,594,0 344,568,382,594,0 563,566,581,591,0 760,568,800,593,0 982,569,1000,591,0
What is the procedure to train this EAST text detector for my custom dataset. I am on windows.