I'm using python 3.5 and pandas 0.19.2.
Using pandas.read_table, is there a way to filter when reading data?
In my example below, I read in my initial data frame and then subset the rows I want based on a condition. Is there a way to do this, or any way to dramatically speed the example below up? I couldn't see anything in the pandas.read_table docs (link), that showed how to speed this up.
Currently it takes around 3 minutes.
import pandas as pd
from datetime import datetime
start_time = datetime.now()
# reading table
df = pd.read_table('https://download.bls.gov/pub/time.series/ce/ce.data.0.AllCESSeries', sep='\t', header=0)
# subsetting
df = df[df['series_id'].str.contains("CEU0000000001")]
end_time = datetime.now()
run_time = end_time-start_time
print(run_time)