-3

How do i extract the address (39/4B.......700025) without \r\n from the below text?

Text<-"From :\r\nName         : NAMITA ROY\r\nAddress       : 39/4B\r\n                 GOPALNAGAR ROAD\r\n                 ALIPORE\r\n                 KOLKATA,WEST BENGAL\r\n                 700027\r\nEntity \r\nName         : SWARNABARSA PROJECTS PRIVATE LIMITED\r\nAddress       : 90A\r\n                 RAJ SEKHAR BOSE SARANI, FLAT NO.1D, 1ST FLOOR\r\n                 KOLKATA,WEST BENGAL\r\n                 INDIA - 700025\r\nFull Particulars of Remittance\r\nService Type: eFiling\r\n"
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
kroy
  • 5
  • 1

3 Answers3

3

Try

trimws(unlist(strsplit(unlist(strsplit(gsub("\r\n|\\s+", " ", Text), ":"))[4], "Entity Name"))[1])

# [1] "39/4B GOPALNAGAR ROAD ALIPORE KOLKATA,WEST BENGAL 700027"
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
  • Thank you for the code, but for this file you have considered till "entity" what about others files where there might not be the "entity" word present, right... So i think we should probably try to consider till the ZIP code. – kroy Apr 27 '18 at 12:47
2

my code takes everything from AFTER "address:" TILL and INCLUDING 6 digits (ZIP)

 strsplit(Text,"Name(\\s+)?:")[[1]][-1] %>% list %>% lapply(function(x)gsub(x=x,pattern="[\\s\\S]*?Address\\s+:([\\s\\S]*?\\d{6})[\\s\\S]*?$",replacement="\\1",perl=T)) %>%
    lapply(function(x)gsub(x=x,pattern="\\r|\\n",replacement="",perl=T)) %>% lapply(function(x)trimws(gsub(x=x,pattern="\\s+",replacement=" ",perl=T)))

result:

[[1]]
[1] "39/4B GOPALNAGAR ROAD ALIPORE KOLKATA,WEST BENGAL 700027"                            
[2] "90A RAJ SEKHAR BOSE SARANI, FLAT NO.1D, 1ST FLOOR KOLKATA,WEST BENGAL INDIA - 700025"
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
1

Try this way:

Text<-"From :\r\nName         : NAMITA ROY\r\nAddress       : 39/4B\r\n                 GOPALNAGAR ROAD\r\n                 ALIPORE\r\n                 KOLKATA,WEST BENGAL\r\n                 700027\r\nEntity \r\nName         : SWARNABARSA PROJECTS PRIVATE LIMITED\r\nAddress       : 90A\r\n                 RAJ SEKHAR BOSE SARANI, FLAT NO.1D, 1ST FLOOR\r\n                 KOLKATA,WEST BENGAL\r\n                 INDIA - 700025\r\nFull Particulars of Remittance\r\nService Type: eFiling\r\n"

#Remove redundant spaces
library(stringr)
Text<-gsub("\\s+", " ", str_trim(Text))

address_dirty<-unlist(strsplit(Text,split = "Address : "))[2]
posiz<-regexpr("[0-9]{6,}",address_dirty) #Find ZIP Code posizion
address<-substr(address_dirty,1,posiz[1]+5)
address
[1] "39/4B GOPALNAGAR ROAD ALIPORE KOLKATA,WEST BENGAL 700027"

The code extract the address beetwen strings Address and a ZIP Code.

Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Updated with the new input formatting that you provided. – Terru_theTerror Apr 27 '18 at 10:16
  • Thank you for the code, but for this file you have considered till "entity" what about others files where there might not be the "entity" word present, right... So i think we should probably try to consider till the ZIP code. – kroy Apr 27 '18 at 12:45