0

I have a massive 8GB CSV file that contains information about companies created in France. I managed to read the file in python using:

df = pd.read_csv('File', sep=";", encoding="latin", iterator = True, chunksize=1000)

I know that this code worked, because when I write the code below, I get a dataframe-like output with 3 random rows:

df.get_chunk(3)

The problem is, now I want to be able to manipulate the data; make extractions, using criteria on rows, as I would do in a usual dataframe, like:

df[(df.Country == "France")]

For example, in a dataframe, the code above would only keep the rows for France. But when I tried :

df.read(df[(df.Country == "France")])

I got: 'TextFileReader' object has no attribute 'Country'

How can I rows columns using criteria like "==" or ">" or "<" and store these into a dataframe?

Thank you,

Siva Kg
  • 59
  • 8

2 Answers2

1

Thank you for your answer, and sorry if my question was confusing.

I just found a solution, which is to define an empty dataframe (df_s) and to populate it with criteria (only french companies) using a for loop, like this:

df_s = []

file='Path\file.csv'
txt = pd.read_csv(file, sep=";", encoding="latin", iterator = True, chunksize=2000)

for df in txt:
    df_s.append(df[df['Country' == "France"])

df_f = pd.concat(df_s,ignore_index = True)
Siva Kg
  • 59
  • 8
0

unfortunately, I didn't get what you mean with How can I rows columns using criteria like

But if I guess correctly this could work for you

 new_df = df[(df.Country == "France") or
             (df.Country == "Germany") or 
             (df.population >= 1000000]