Count separators in CSV rows with Pandas

Question

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

     0           1
1    name        age
2    something   NaN
3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

     0           1      _count_separators
1    name        age   1
2    something   NaN   0
3    tom         20    1

do you also want to count the commas if they're in the column value? — Omkar Sabade, Dec 20 '18 at 05:22
@OmkarSabade preferably just to get the number of separators that `pandas` inferred -- but either way is acceptable. — , Dec 20 '18 at 05:24

cs95 · Accepted Answer · 2018-12-20T08:37:12.077

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
      s.str.split(',', expand=True), 
      s.str.count(',').rename('_count_sep')
   ], axis=1)

           0     1  _count_sep
0       name   age           1
1  something  None           0
2        tom    20           1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

           0     1  _count_sep
0       name   age           1
1  something  None           0
2        tom    20           1

@W-B yup did not see until I posted... great minds.. huh? ;) — cs95, Dec 20 '18 at 05:39

score 6 · Answer 2 · answered Dec 20 '18 at 05:31

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0    1
1    0
2    1
3    5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse

score 0 · Answer 3 · answered Dec 20 '18 at 05:31

0

One line of code: len(df) - df[1].isna().sum()

answered Dec 20 '18 at 05:31

Quang Hoang

146,074
10
56
74

Ohk if the nan itself is a part of the dataset then? like `something,,,something`? – Bharath M Shetty Dec 20 '18 at 05:32
i'm not sure in which instance would `df = pd.read_csv('file.csv', header=None)` give a `nan` in his sample. – Quang Hoang Dec 20 '18 at 05:35
This assumes there are only two columns...? – cs95 Dec 20 '18 at 05:40

score 0 · Answer 4 · answered Dec 20 '18 at 23:39

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

           0    1  _count_separators
0       name  age                  1
1  something  NaN                  0
2        tom   20                  1

Count separators in CSV rows with Pandas

4 Answers4