2

I'm trying to parse date in R (library (hms)). English, Deutsch, French etc. are parsed nice. F.e.:

    parse_date("1 Januar 2015", "%d %B %Y", locale = locale("de"))
    [1] "2015-01-01"

But if I try to parse data with months written by languages with cyrillic symbols: uk, ru, bg, by... etc. - I have an error. F.e.:

   parse_date("1 січня  2015", "%d %B %Y", locale = locale("uk"))
   Warning: 1 parsing failure.
   row col           expected                           actual
   1  -- date like %d %B %Y 1 <f1><U+00B3><f7><ed><ff>  2015
   [1] NA

Or this one:

  parse_date("31 януари 2011","%d %B %Y",locale=locale("bg"))
  Warning: 1 parsing failure.
  row col           expected                           actual
  1  -- date like %d %B %Y 31 <ff><ed><f3><e0><f0><e8> 2011
  [1] NA

Data_names are all in. F.e.:

    date_names_lang("bg")
    <date_names>
    Days:   неделя (нд), понеделник (пн), вторник (вт), сряда (ср), четвъртък (чт), петък
    (пт), събота (сб)
    Months: януари (ян.), февруари (февр.), март (март), април (апр.), май (май), юни (юни),
    юли (юли), август (авг.), септември (септ.), октомври (окт.), ноември
    (ноем.), декември (дек.)
    AM/PM:  пр.об./сл.об.

What i should do to fix this problem? Thanks.

I found a solution (in Windows). Maybe you can also propose yours. F.e.:

    date_test <- iconv("1 януари 2021","Windows-1251","UTF-8")
    date_test
    [1] "1 януари 2021"
    parse_date(date_test, "%d %B %Y", locale = locale("bg"))
    [1] "2021-01-01"
    date_test <- iconv("1 січня 2021","Windows-1251","UTF-8")
    date_test
    [1] "1 січня 2021"
    parse_date(date_test, "%d %B %Y", locale = locale("uk"))
    [1] "2021-01-01"
    date_test <- iconv("1 января 2021","Windows-1251","UTF-8")
    date_test
    [1] "1 января 2021"
    parse_date(date_test, "%d %B %Y", locale = locale("ru"))
    [1] "2021-01-01"
    date_test <- iconv("1 янв. 2021","Windows-1251","UTF-8")
    date_test
    [1] "1 янв. 2021"
    parse_date(date_test, "%d %b %Y", locale = locale("ru"))
    [1] "2021-01-01"
manro
  • 3,529
  • 2
  • 9
  • 22
  • Technically, Cyrillic is not a language, but several versions of alphabets used by several languages. – utubun Aug 23 '21 at 21:10
  • 1
    @utubun yes, u are right. I fix my question a little, moment. But I think, that the main idea of this question is clear? – manro Aug 23 '21 at 21:14
  • Yup, the question is clear. – utubun Aug 23 '21 at 21:20
  • @Henrik yes, this is from tidyverse package, readr is also there. I look and write tomorrow about successes or failures. – manro Aug 23 '21 at 21:39
  • @manro I know, parsing Cyrillic is hell, so congrats! Use back ticks to embed the code in comments. There is only one mistake in your code: everywhere, no matter what the language is, it must be [24 серпня](https://en.wikipedia.org/wiki/Independence_Day_of_Ukraine). The rest is very good. – utubun Aug 23 '21 at 23:29
  • 1
    @utubun , Henrik Guys, i found a solution. Maybe someone can propose another one. ~~~ date_test <- iconv("1 січня 2021","Windows-1251","UTF-8") date_test [1] "1 січня 2021" parse_date(date_test, "%d %B %Y", locale = locale("uk")) [1] "2021-01-01" date_test <- iconv("1 янв. 2021","Windows-1251","UTF-8") date_test [1] "1 янв. 2021" parse_date(date_test, "%d %b %Y", locale = locale("ru")) [1] "2021-01-01" ~~~ – manro Aug 23 '21 at 23:31
  • @utubun oh, i cant to embed the code, sorry((( `code` – manro Aug 23 '21 at 23:35
  • 1
    @marno `code`, refer the [markdown docs](https://www.markdownguide.org/basic-syntax/#code) – utubun Aug 23 '21 at 23:38
  • @utubun `One string of the code works right` I hope, that moderators can correct my comment above ;) ^ – manro Aug 23 '21 at 23:41

1 Answers1

0

I forgot to add my answer.

When you are working with Cyrillic symbols - function iconv() really helps.

See more information about iconv() functionality here.

I added some examples above. And one more with Belarusian (locale("be")) in R!

    library(tidyverse)

    date_test <- iconv("24 жніўня 2021", "Windows-1251", "UTF-8")
    parse_date(date_test, "%d %B %Y", locale = locale("be"))

    # [1] "2021-08-24"

    date_test <- iconv("11:15:10.12 пасля палудня", "Windows-1251", "UTF-8")
    parse_time(date_test, "%H:%M:%OS %p", locale = locale("be"))

    # [1] "23:15:10.12"

I will glad to see another solutions too.

utubun
  • 4,400
  • 1
  • 14
  • 17
manro
  • 3,529
  • 2
  • 9
  • 22