0

I have a dataframe with two columns a and b and at the moment both are looking like column a, but I want to add separators so that column b looks like below. I have tried using the package format.jl. But I haven't gotten the result I'm afte. Maybe worth mentioning is that both columns is Int64 and the column names a and b is of type symbol.

 a      |    b
150000  | 1500,00 
27      | 27,00
16614   | 166,14

Is there some other way to solve this than using format.jl? Or is format.jl the way to go?

hbrovell
  • 547
  • 6
  • 17

2 Answers2

3

Assuming you want the commas in their typical positions rather than how you wrote them, this is one way:

julia> using DataFrames, Format

julia> f(x) = format(x, commas=true)
f (generic function with 1 method)

julia> df = DataFrame(a = [1000000, 200000, 30000])
3×1 DataFrame
 Row │ a       
     │ Int64
─────┼─────────
   1 │ 1000000
   2 │  200000
   3 │   30000

julia> transform(df, :a => ByRow(f) => :a_string)
3×2 DataFrame
 Row │ a        a_string  
     │ Int64    String
─────┼────────────────────
   1 │ 1000000  1,000,000
   2 │  200000  200,000
   3 │   30000  30,000

If you instead want the row replaced, use transform(df, :a => ByRow(f), renamecols=false). If you just want the output vector rather than changing the DataFrame, you can use format.(df.a, commas=true)

You could write your own function f to achieve the same behavior, but you might as well use the one someone already wrote inside the Format.jl package.

However, once you transform you data to Strings as above, you won't be able to filter/sort/analyze the numerical data in the DataFrame. I would suggest that you apply the formatting in the printing step (rather than modifying the DataFrame itself to contain strings) by using the PrettyTables package. This can format the entire DataFrame at once.

julia> using DataFrames, PrettyTables

julia> df = DataFrame(a = [1000000, 200000, 30000], b = [500, 6000, 70000])
3×2 DataFrame
 Row │ a        b     
     │ Int64    Int64 
─────┼────────────────
   1 │ 1000000    500
   2 │  200000   6000
   3 │   30000  70000

julia> pretty_table(df, formatters = ft_printf("%'d"))
┌───────────┬────────┐
│         a │      b │
│     Int64 │  Int64 │
├───────────┼────────┤
│ 1,000,000 │    500 │
│   200,000 │  6,000 │
│    30,000 │ 70,000 │
└───────────┴────────┘
Nathan Boyer
  • 1,412
  • 1
  • 6
  • 20
2

(Edited to reflect the updated specs in the question)

julia> df = DataFrame(a = [150000, 27, 16614]);

julia> function insertdecimalcomma(n)
         if n < 100
           return string(n) * ",00"
         else
           return replace(string(n), r"(..)$" => s",\1")
         end
       end
insertdecimalcomma (generic function with 1 method)

julia> df.b = insertdecimalcomma.(df.a)

julia> df
3×2 DataFrame
 Row │ a       b       
     │ Int64   String  
─────┼─────────────────
   1 │ 150000  1500,00
   2 │     27  27,00
   3 │  16614  166,14

Note that the b column will necessarily be a String after this change, as integer types cannot store formatting information in them.

If you have a lot of data and find that you need better performance, you may also want to use the InlineStrings package:

julia> #same as before upto the function definition

julia> using InlineStrings

julia> df.b = inlinestrings(insertdecimalcomma.(df.a))
3-element Vector{String7}:
 "1500,00"
 "27,00"
 "166,14"

This stores the b column's data as fixed-size strings (String7 type here), which are generally treated like normal Strings, but can be significantly better for performance.

Sundar R
  • 13,776
  • 6
  • 49
  • 76
  • It's not really a typo. 1000000 should translate to 10 000,00. So that's ten thousand with two decimals at the end. 200000 should two thousands and 30000 should be three hundred. Now I live in Sweden so maybe it's a bit different how you separate numbers. – hbrovell Jul 06 '22 at 19:56
  • That makes sense. So the decimal point should be a comma, and the thousands separator should be just a space? The `10 000,00` seems to imply that, but the question has `1,0000,00`; that part at least seems to be a typo. Could you update the question with how exactly you want the numbers to look i.e. is it `10 000,00`, `2 000,00` and `300,00`? – Sundar R Jul 07 '22 at 07:49
  • I have updated the question. My first example maybe wasn't that clear, hope this is better. I really just want two decimal, and maybe I mixed that up with thousands separator. – hbrovell Jul 07 '22 at 09:10
  • I've updated the answer to match the updated examples. – Sundar R Jul 07 '22 at 10:58
  • I get a strange result. It looks like all numbers falls under ```if n < 100``` because 150000 becomes 150000,00. – hbrovell Jul 07 '22 at 12:09
  • Can you post the code you're using, at the end of the question perhaps? The output in my answer is all directly from the REPL, so it works as shown; either the code or the data must be different in some way on your end. – Sundar R Jul 07 '22 at 13:43