0

Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it

I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the movie). I want to convert the line of each scene of the movie into a row that contains "Scene 1", "Scene 2", up to "Scene 130" and achieve it sequentially.

url <- "https://www.imsdb.com/scripts/Coco.html"

coco <- read_lines("coco2.txt") #after clean 
class(coco)
typeof(coco)

"                                                                        48."      
 [782] "     arms full of offerings."                                                     
 [783] "      Once the family clears, Miguel is nowhere to be seen."                      
 [784] "      INT. NEARBY CORRIDOR"                                                       
 [785] "     Miguel and Dante hide from the patrolman.     But Dante wanders"             
 [786] "     off to inspect a side room."                                                 
 [787] "      INT. DEPARTMENT OF CORRECTIONS"                                             
 [788] "     Miguel catches up to Dante.      He overhears an exchange in a"              
 [789] "     nearby cubicle."                                                             

 [797] "                                                          49."                    
 [798] "                 And amigos, they help their amigos."                             
 [799] "                 worth your while."                                               
 [800] "     workstation."                                                                
 [801] "      Miguel perks at the mention of de la Cruz."                                 


 [809] "      Miguel follows him."                                                        
 [810] "                                                                     50." # Its scene number     
 [811] "      INT. HALLWAY"      


s <- grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)

My solution is wrong because it is not sequential

v <- gsub(s, pattern = "[^Level].[0-9].$", replacement = paste("Scene", sequence(1:130)))


[1] "                                                                   Scene1"          
  [2] "                                                                   Scene1"          
  [3] "                                                                  Scene1"           
  [4] "                                                                       Scene1"      
  [5] "                                                                    Scene1"         
  [6] "                                                                   Scene1"          
phiver
  • 23,048
  • 14
  • 44
  • 56
  • What strings in the text you show are you trying to find with `grep(coco, pattern = "[^Level].[0-9].$", value = TRUE)` – WaltS Mar 04 '20 at 17:50
  • library(readr) coco <- read_lines("https://www.imsdb.com/scripts/Coco.html", skip = 238, skip_empty_rows = TRUE, locale = default_locale()) coco <- iconv(coco, from = "Latin1", to = "UTF-8") #para diacriticos coco <- gsub("<[^<>]+>", "", coco) coco <- gsub("^ ", "", coco) coco <- gsub("\n", "", coco ) coco <- as.vector(coco[grep("[\t.+]",as.character(coco))]) coco <- as.vector(coco[grep("[.*]",as.character(coco))]) s <- grep(coco, pattern = ".[0-9].$", value = TRUE) #out level – Carlos Garibotto Mar 04 '20 at 20:05
  • pls. run the previous code and I could find the source of why I use [^ level] to not include level in my r script – Carlos Garibotto Mar 04 '20 at 20:07

1 Answers1

0

I'm not clear on what [^Level] represents. However, if the numbers at the end of lines in the text represent the Scene numbers, then you can use ( ) to capture the numbers and substitute them in your replacement text as shown below:

 v <- gsub(s, pattern = " ([0-9]{1,3})\\.$", replacement = "Scene \\1")
WaltS
  • 5,410
  • 2
  • 18
  • 24