0

In R I am extracting data from Pdf tables using Tabulizer library and the Name are on Nepali language and after extracting i Get this Table [1]: https://i.stack.imgur.com/Ltpqv.png

But now i want that column 2's name To change, in its English Equivalent

Is there any way to do this in R

The R code i wrote was

library(tabulizer)
location <- "https://citizenlifenepal.com/wp-content/uploads/2019/10/2nd-AGM.pdf"
out <- extract_tables(location,pages = 113)
##write.table(out,file = "try.txt")
final <- do.call(rbind,out)  

final <- as.data.frame(final) ### creating df 
col_name <- c("S.No.","Types of Insurance","Inforce Policy Count", "","Sum Assured of Inforce Policies","","Sum at Risk","","Sum at Risk Transferred to Re-Insurer","","Sum At Risk Retained By Insurer","") 
names(final) <- col_name

final <- final[-1,]
write.csv(final,file = "/cloud/project/Extracted_data/Citizen_life.csv",row.names = FALSE)
View(final)```

zx8754
  • 52,746
  • 12
  • 114
  • 209
Rustam
  • 19
  • 2
  • This looks like a hard problem. Do you know what encoding is used for Nepali text in PDF files? What comes out looks like plain ascii; I wouldn't expect that if the file is using UTF-8 or other Unicode encodings. – user2554330 Jan 13 '21 at 11:08

1 Answers1

1

It appears that document is using a non-Unicode encoding. This web site https://www.ashesh.com.np/preeti-unicode/ can convert some Nepali encodings to Unicode, which would display properly in R, assuming you have the right fonts loaded. When I tried it on the output of your code, it did something that looked okay to me, but I don't know Nepali:

> out[[1]][1,2]
[1] ";fjlws hLjg aLdf"

When I convert the contents of that string, I get

सावधिक जीवन बीमा

which looks to me something like the text on that page in the document. If it's actually written correctly, then converting it to English will need some Nepali speaker to do the translation: hopefully that's you, but if I use Google Translate, it gives

Term life insurance

So here's my suggestion: contact the owner of that www.ashesh.com.np website, and find out if they can give you the translation rules. Write an R function to implement them if you can't find one by someone else. Then do the English translations manually.

user2554330
  • 37,248
  • 4
  • 43
  • 90