How can we store the value of tag हिन्दी in a variable

Question

how can we store the value of tag<span class="sdr-full-width">हिन्दी</span> that is "हिन्दी" in a variable? I tried with xpath expression to extract it but getting \u0939\u093f\u0928\u094d\u0926\u0940 unicode characters.

Serge Ballesta · Accepted Answer · 2017-08-18T09:27:42.707

0

Then you got it right!

If your environment can display DEVANAGARI symbols, this code:

t = u"\u0939\u093f\u0928\u094d\u0926\u0940"
print t

should display

हिन्दी

with the help of the unicodedata module, I could even express it one character at a time:

>>> for c in t:
    print(c, unicodedata.name(c))


ह DEVANAGARI LETTER HA
ि DEVANAGARI VOWEL SIGN I
न DEVANAGARI LETTER NA
् DEVANAGARI SIGN VIRAMA
द DEVANAGARI LETTER DA
ी DEVANAGARI VOWEL SIGN II

I cannot say more because I really do not understand the meaning of the word...

edited Aug 18 '17 at 09:27

answered Aug 18 '17 at 08:53

Serge Ballesta

143,923
11
122
252

It is a language name (हिन्दी) speaks in India. when I tried like t = "\u0939\u093f\u0928\u094d\u0926\u0940" then i get \u0939\u093f\u0928\u094d\u0926\u0940 but when i add prefix u" before string like u"\u0939\u093f\u0928\u094d\u0926\u0940" then it gives proper result, but thing is it is already stored in a variable and I want to display it as हिन्दी. The actual problem is as below item['Languages'] = response.xpath('//p/b/text()').extract_first() and i got \u0939\u093f\u0928\u094d\u0926\u0940 as item['language'] – Pradeep Mishra Aug 18 '17 at 09:19
@PradeepMishra: My bad, I forgot the u because I did my test with Python 3.6 that treats strings as unicode. You should try `print(item['Languages'])` and `print(repr(item['Languages']))` and say what are *exactly* the displayed values. – Serge Ballesta Aug 18 '17 at 09:37
I am using item in spider so i am using it like yield item and it gives me the same \u0939\u093f\u0928\u094d\u0926\u0940, also i tried it in scrapy shell with print(repr(item['Language'])) and it gives me the same output. – Pradeep Mishra Aug 18 '17 at 09:48
@PradeepMishra: at leat `print(repr(item['Language']))` should contains quotation marks. The problem could be caused by extra quotation marks, that's the reason why I asked you for the *exact* displays. – Serge Ballesta Aug 18 '17 at 09:53
print(repr(item['Language'])) gives u'\u0939\u093f\u0928\u094d\u0926\u0940' – Pradeep Mishra Aug 18 '17 at 09:56
@PradeepMishra: and what gives *exactly* `print(item['Language'])`? – Serge Ballesta Aug 18 '17 at 10:03
oh great! it gives the output **हिन्दी** but how can I print this in json or database because in spider I am creating dictionary like item={} and storing each values like item['name'], item['language]..etc. and at the end I am using yield keyword to get item. `scrapy crawl spiderName -o name.json` to get items values in json file – Pradeep Mishra Aug 18 '17 at 10:12
@PradeepMishra: If print gives the correct output, it just means that you have extracted the correct unicode string. According to [this other SO post](https://stackoverflow.com/a/4908960/3545273), json strings explicitely contains unicode, so in Json, "u0939\u093f\u0928\u094d\u0926\u0940" is a correct representation for हिन्दी. IMHO, you really have extracted correctly what you want. – Serge Ballesta Aug 18 '17 at 10:24

How can we store the value of tag हिन्दी in a variable

1 Answers1