13

I have a csv file as follows:

name,age
something
tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)

     0           1
1    name        age
2    something   NaN
3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode
df['_count_separators'] = len(df.raw_value.count(','))

     0           1      _count_separators
1    name        age   1
2    something   NaN   0
3    tom         20    1
cs95
  • 379,657
  • 97
  • 704
  • 746

4 Answers4

8

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)
s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

pd.concat([
      s.str.split(',', expand=True), 
      s.str.count(',').rename('_count_sep')
   ], axis=1)

           0     1  _count_sep
0       name   age           1
1  something  None           0
2        tom    20           1

Another solution for concatenation is to join on the index (this is a neat one liner):

s.str.split(',', expand=True).join(s.str.count(',').rename('_count_sep'))

           0     1  _count_sep
0       name   age           1
1  something  None           0
2        tom    20           1
cs95
  • 379,657
  • 97
  • 704
  • 746
6

Doing this

df = pd.read_csv('file', header=None)
df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 

df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 
0    1
1    0
2    1
3    5
Name: 0, dtype: int64

df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age
something
tom,20
something,,,,,somethingelse
BENY
  • 317,841
  • 20
  • 164
  • 234
0

One line of code: len(df) - df[1].isna().sum()

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

You can use the csv module for the counting delimiters. This is a two-pass solution, but not necessarily inefficient versus alternative one-pass solutions.

from io import StringIO
import csv, pandas as pd, numpy as np

x = """name,age
something
tom,20"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    delim_counts = np.fromiter(map(len, csv.reader(fin)), dtype=int)

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), header=None)
df['_count_separators'] = delim_counts - 1

print(df)

           0    1  _count_separators
0       name  age                  1
1  something  NaN                  0
2        tom   20                  1
jpp
  • 159,742
  • 34
  • 281
  • 339