-1

Say we have a CSV file with the following data:

Name    Age Gender
Bob     23  Male
Ahmed   45  Male
Alice   37  Female
Ahmed   34  Male
Mariyya 10  Female
Bilal   23  Male

How can I do the following:

  • Count the number of occurrence of each name
  • Order the fields based on their frequency and show the frequency. For instance, Ahmed should come first with 2-times. Male should come first with 4-times.
  • Order based on name, and show the Gender in the result combined with the name.

Thanks for your support.

Simplicity
  • 47,404
  • 98
  • 256
  • 385

1 Answers1

2

You can use Pandas:

import pandas as pd
from io import StringIO

csv_file = StringIO("""Name    Age Gender
Bob     23  Male
Ahmed   45  Male
Alice   37  Female
Ahmed   34  Male
Mariyya 10  Female
Bilal   23  Male""")

df = pd.read_csv(csv_file, sep="\s+",index_col=None)

df['Name'].value_counts()

Output:

Ahmed      2
Mariyya    1
Bilal      1
Bob        1
Alice      1
Name: Name, dtype: int64


df['Gender'].value_counts()

Output:

Male      4
Female    2
Name: Gender, dtype: int64

df.sort_values(by='Name')

Output:

      Name  Age  Gender
1    Ahmed   45    Male
3    Ahmed   34    Male
2    Alice   37  Female
5    Bilal   23    Male
0      Bob   23    Male
4  Mariyya   10  Female
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thanks for your nice answer. Just a quick question, why did you use three double quotation marks after StringIQ? – Simplicity Jul 11 '17 at 04:21
  • See this [discussion](https://stackoverflow.com/questions/10840357/string-literal-with-triple-quotes-in-function-definitions) on the triple quotes. It handles the line breaks. – Scott Boston Jul 11 '17 at 04:46