3

I have an array imported from a csv of multiple datatypes. I would like to remove all commas (,) and dollar signs ($). There are three columns with commas and dollar signs.

When creating a new array for a column with commas and dollar signs, I am able to do so successfully with below.

using CSV, DataFrames
df = DataFrame!(CSV.File("F:SampleFile.csv"))
dfmo = Array(df[!,30])
dfmo = collect(skipmissing(dfmo))
dfmo = replace.(dfmo,"\$"=>"")
dfmo = replace.(dfmo,","=>"")

When trying to apply across the entire vector with below

df=replace.(df,","=>"")

I get an error:

MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
  similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\

I then tried indexing with below and also get an error for indexing into a string.

for i in df
    for j in df
        if datatype(df[i,j]) == String
            df=replace(df[i,j],","=>"")
        end
    end
end
MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
  similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\

What is the most efficient way to replace substrings across an array of multiple datatypes?

phipsgabler
  • 20,535
  • 4
  • 40
  • 60
cupoftea21
  • 31
  • 2

1 Answers1

2

Seeing your code I understand you want an in-place operation (i.e. to change the original data frame).

Using the loop approach as in your code you can do this:

for col in axes(df,2)
    for row in axes(df, 1)
        cell = df[row, col]
        if cell isa AbstractString
            df[row, col] = replace(cell, "," => "")
        end
    end
end

Using broadcasting you can achieve the same with:

helper_fun(cell) = cell isa AbstractString ? replace(cell, "," => "") : cell

df .= helper_fun.(df)
Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107