1

I developed a program in R to read a report available online and the first 2 lines are:

page1 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC03-LeisOrc-RL&municipioSelecionado=3100203&exercicioSelecionado=2014")
line1 <- grep("Leis Autorizativas",page1)

The rest of the program worked fine and I got the data I needed. Then I tried to adapt it to read a different report, but this time the second line didn't work:

page2 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC08-ConsultarDecretos-RL&municipioSelecionado=3101607&exercicioSelecionado=2013")
line2 <- grep("Decretos de Alterações",page2)

In the 1st case 'page1' is a character vector and in the 2nd case 'page2' is a large character vector. Is it possible that this difference caused the problem? If so, does anybody have a hint on how to fix it?

(Using htmltab() or readHTMLtable() didn't produce good results)

Thank you.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
ViniLima
  • 13
  • 3

1 Answers1

2

That's because "Decretos de Alterações" is not fully composed of ascii characters.

If you try with

page2 <- readLines("http://reportviewer.tce.mg.gov.br/default.aspx?server=noruega&relatorio=SICOM_Consulta/2013_2014/Modulo_AM/UC08-ConsultarDecretos-RL&municipioSelecionado=3101607&exercicioSelecionado=2013")

grep("Decretos de Altera&#231;&#245;es ", page2)

[1] 366

It works.

To know what number to put for replacement:

utf8ToInt("ç")
[1] 231

Then put the resulting number between & and ;, and replace your non ascii letters.

Best

Colin

Colin FAY
  • 4,849
  • 1
  • 12
  • 29