1

I am working on a Hindi dataset for a project and did the pre-processing of the data where I am creating a word cloud for the same. I have used "gargi" font to plot the Hindi words on word cloud where I am facing an issue of accent("ि मात्रा"). In the word cloud, this accent is coming next to the letter on which it is supposed to be, for example, पुलिस is coming as पुलसि. (Kindly refer to this image attached where the word किसान has the accent(मात्रा) is on the same letter(वर्ण)). There are several other words in this word cloud that reflect a similar issue. I have tried using different fonts as well like "lohit-devnagri", "samyak-devnagri".

font = "gargi.ttf"

figure,axis = plt.subplots(2,2,figsize=(16,10))
figure.tight_layout(pad=5.0)

wordcloud_kisaan = WordCloud(width = 1000, height = 700,
                background_color ='white',
                min_font_size = 10, font_path= font).generate_from_frequencies(counter_kisaan)

axis[0][0].imshow(wordcloud_kisaan,interpolation="bilinear")
axis[0][0].axis('off')
axis[0][0].set_title('Kisaan Andolan', fontsize=22)
 
plt.axis("off")
plt.tight_layout(pad = 5.0)


plt.show()
Samya Jain
  • 11
  • 1
  • Do you have problems if you plot them in "English" ? – s510 Sep 28 '22 at 10:27
  • @the_ordinary_guy No, I haven't plot them in English since I am working on a hindi dataset. – Samya Jain Sep 29 '22 at 08:09
  • What I meant was it might be just a transformation issue! So if yes, you just needs to find a way where the transformation is correct and has no accent issues. – s510 Sep 29 '22 at 09:31

0 Answers0