0

What is the most efficient way to get the range of indices for which the corresponding column content satisfy a condition .. like rows starting with tag and ending with "body" tag.

for e.g the data frame looks like this

I want to get the row index 1-3

Can anyone suggest the most pythonic way to achieve this?

import pandas as pd

df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
             ['we must preserve it </body>',6]],columns=['description','count'])

print(df.head())
user765160
  • 25
  • 1
  • 1
  • 10
  • Please don't post images of code or data. [“Can someone help me?” not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question). – wwii Nov 28 '17 at 19:34
  • @wwii I keep that in mind. Thanks for the info. – user765160 Nov 28 '17 at 20:47

2 Answers2

1

What condition are you looking to satisfy?

import pandas as pd

df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
             ['we must preserve it </body>',6]],columns=['description','count'])
print(df)
print(len(df[df['count'] != 2].index))

Here, df['count'] != 2 subsets the df, and len(df.index) returns the length of the index.

Updated; note that I used str.contains(), rather than explicitly looking for starting or ending strings.

df2 = df[(df.description.str.contains('<body>') | (df.description.str.contains('</body>')))]
print(df2)
print(len(df2.index))

help from: Check if string is in a pandas dataframe

Evan
  • 2,121
  • 14
  • 27
  • Apologies, I forgot to add the condition in the question.. well the condition that I am looking is rows starting and ending with , tag – user765160 Nov 28 '17 at 20:45
0

You can also find the index of start and end row then add the rows in between them to get all contents in between

start_index = df[df['description'].str.contains("<body>")==True].index[0]
end_index = df[df['description'].str.contains("</body>")==True].index[0]

print(df["description"][start_index:end_index+1].sum())
Shahir Ansari
  • 1,682
  • 15
  • 21