-7

i have many text files containing text given below.

\\ Paper: hep-th/9201003

From: DIJKGRAAF%IASSNS.BITNET@pucc.PRINCETON.EDU

Date: Thu, 2 Jan 92 14:06 EST (54kb)

Title: Intersection Theory, Integrable Hierarchies and Topological Field Theory

Authors: Robbert Dijkgraaf

Comments: 73 pages, most figures are not included. Lectures given at the Cargese Summer School on `New Symmetry Principles in Quantum Field Theory,' July 16-27, 1991.

\\ In these lecture notes we review the various relations between intersection theory on the moduli space of Riemann surfaces, integrable hierarchies of KdV type, matrix models, and topological quantum field theories. We explain in particular why matrix integrals of the type considered by Kontsevich naturally appear as tau-functions associated to minimal models. Our starting point is the extremely simple form of the string equation for the topological (p,1) models, where the so-called Baker-Akhiezer function is given by a (generalized) Airy function. \\

i have 10 folders in range 1992 to 2003. every folder contain thousands of files. every files have a structure that is given above. i want to extract the last portion of every file and save in new file. this portion is abstract of the paper. every file have different abstract. i have write the following code for my problem but unable to get the target.

for(j in 1992:1992)
{
    dir.create(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\mydata\\",j, sep = ""))
    setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\dataset\\",j, sep = ""))
    listoffile=list.files()
    for(i in 1:length(listoffile))
    {
        setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\dataset\\",j, sep = ""))
        filetext=readLines(listoffile[i])
        newtext=unlist(strsplit(filetext,'\\\\'))[3]
        setwd(paste("C:\\Users\\Abdul Samad Alvi\\Desktop\\mydata\\",j, sep = ""))
        write.table(newtext,file = listoffile[i],sep = "")

    }
}
Alvi
  • 123
  • 1
  • 3
  • 14
  • You need to extract "In these lecture notes we review the various relations between intersection theory on the moduli space of Riemann surfaces, integrable hierarchies of KdV type, matrix models, and topological quantum field theories. We explain in particular why matrix integrals of the type considered by Kontsevich naturally appear as tau-functions associated to minimal models. Our starting point is the extremely simple form of the string equation for the topological (p,1) models, where the so-called Baker-Akhiezer function is given by a (generalized) Airy function." ? – amarchin Oct 30 '17 at 13:14
  • yes i want to need this – Alvi Oct 30 '17 at 13:16
  • i have many files, every file contain different text but text is between last two slashes. – Alvi Oct 30 '17 at 13:18

2 Answers2

0

strsplit should help!

text <- "\\ Paper: hep-th/9201003 From: DIJKGRAAF%IASSNS.BITNET@pucc.PRINCETON.EDU Date: Thu, 2 Jan 92 14:06 EST (54kb) Title: Intersection Theory, Integrable Hierarchies and Topological Field Theory Authors: Robbert Dijkgraaf Comments: 73 pages, most figures are not included. Lectures given at the Cargese Summer School on `New Symmetry Principles in Quantum Field Theory,'
July 16-27, 1991. \\ In these lecture notes we review the various relations between intersection theory on the moduli space of Riemann surfaces, integrable hierarchies of KdV type, matrix models, and topological quantum field theories. We explain in particular why matrix integrals of the type considered by Kontsevich naturally appear as tau-functions associated to minimal models. Our starting point is the extremely simple form of the string equation for the topological (p,1) models, where the so-called Baker-Akhiezer function is given by a (generalized) Airy function. \\"


unlist(strsplit(text,'\\\\'))[3]

Generalized length:

tail(unlist(strsplit(text,'\\\\')), 1)
amarchin
  • 2,044
  • 1
  • 16
  • 32
amrrs
  • 6,215
  • 2
  • 18
  • 27
0

If the pattern in your text is always an empty line followed by \\, then you could extract the text like this (assuming that your_text is a single string):

library(stringr)
str_extract(string = your_text, pattern = "(?<=\n\\\\)(.*)(?=\\\\)")

This should solve the biggest problem you are struggling with.

Addition in response to comment: In order to get one large string, instead of a vector of strings, you can use paste0() with the collapse argument:

filetext <- readLines("001.txt")
filetext <- paste0(filetext, collapse = " ")

Afterwards, you can apply the general case described in the beginning of the answer:

newtext <- str_extract(string = filetext, pattern = "(?<=\\s{2}\\\\\\\\)(.*)(?=\\\\\\\\)") 
KenHBS
  • 6,756
  • 6
  • 37
  • 52
  • > filetext 13] "\\\\" [14] " It is argued that the recently proposed Kazakov-Migdal model of induced gauge" [15] "theory, at large $N$, involves only the zero area Wilson loops that are" [16] "effectively trees in the gauge action induced by the scalars. This retains only" [17] "a constant part of the gauge action excluding plaquettes or anything like them" [18] "and the gauge variables drop out." [19] "\\\\" – Alvi Nov 01 '17 at 10:44
  • here is the vector, every line represent the vector element. so i cannot recognize the abstract according to your code. – Alvi Nov 01 '17 at 10:45
  • @Alvi check out the changes I made in the answer. This should help – KenHBS Nov 01 '17 at 11:31