I have a text corpus which contains many lines of sentences. I was hoping to extract the lines that contains the key words.
I wrote a simple python script but I get no value at all.
My python Script:
corpus = []
with open('CatList2.text') as f:
for line in f:
corpus.append(line.rstrip())
with open('Test.text') as f1:
with open('Text', 'a') as f2:
for line in f1.readlines():
for phrase in corpus:
if phrase in line:
f2.write(line)
The following is an example of wiki.en.text:
Alluvium (from the Latin, alluvius, from alluere, "to wash against") is loose, unconsolidated (not cemented together into a solid rock) soil or sediments, which has been eroded, reshaped by water in some form, and redeposited in a non-marine setting
Geoarchaeology is a multi-disciplinary approach which uses the techniques and subject matter of geography, geology and other Earth sciences to examine topics which inform archaeological knowledge and thought. Geoarchaeologists study the natural physical processes that affect archaeological sites such as geomorphology, the formation of sites through geological processes and the effects on buried sites and artifacts post-deposition. Geoarchaeologists' work frequently involves studying soil and sediments as well as other geographical concepts to contribute an archaeological study. Geoarchaeologists may also use computer cartography, geographic information systems (GIS) and digital elevation models (DEM) in combination with disciplines from human and social sciences and earth sciences.[1] Geoarchaeology is important to society because it informs archaeologists about the geomorphology of the soil, sediments and the rocks on the buried sites and artifacts they're researching on. By doing this we are able locate ancient cities and artifacts and estimate by the quality of soil how "prehistoric" they really are.
A Geopark is a unified area that advances the protection and use of geological heritage in a sustainable way, and promotes the economic well-being of the people who live there.[1] There are Global Geoparks and National Geoparks.
Spatial analysis or spatial statistics includes any of the formal techniques which study entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early development, using different analytic approaches and applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of geographic data.
Spatial mismatch is the mismatch between where low-income households reside and suitable job opportunities. In its original formulation (see below) and in subsequent research, it has mostly been understood as a phenomenon affecting African-Americans, as a result of residential segregation, economic restructuring, and the suburbanization of employment.
Distance decay is a geographical term which describes the effect of distance on cultural or spatial interactions. The distance decay effect states that the interaction between two locales declines as the distance between them increases. Once the distance is outside of the two locales' activity space, their interactions begin to decrease.
Cold is the presence of low temperature, especially in the atmosphere.[4] In common usage, cold is often a subjective perception. A lower bound to temperature is absolute zero, defined as 0.00 °K on the Kelvin scale, an absolute thermodynamic temperature scale. This corresponds to −273.15 °C on the Celsius scale, −459.67 °F on the Fahrenheit scale, and 0.00 °R on the Rankine scale.
My CatList which contains my search phrases is as follows:
Alluvium
Anatopism
The result I am hoping for is :
Alluvium (from the Latin, alluvius, from alluere, "to wash against") is loose, unconsolidated (not cemented together into a solid rock) soil or sediments, which has been eroded, reshaped by water in some form, and redeposited in a non-marine setting
As only Alluvism which is contained in CatList also appears in Wiki.en.text
I have no idea why I am not able to get the result. Please help me. Thank you.
Weird I gotten this error:
Traceback (most recent call last):
File "JRTry.py", line 2, in <module>
phrases = open("Test.text").readLines()
AttributeError: 'file' object has no attribute 'readLines'
I read in ( Error while using '<file>.readlines()' function) and I have placed for line in f1.readlines():
yet it still give me an error, any idea?