Spark SQL change format of the number

Question

After show command spark prints the following:

+-----------------------+---------------------------+
|NameColumn             |NumberColumn               |
+-----------------------+---------------------------+
|name                   |4.3E-5                     |
+-----------------------+---------------------------+

Is there a way to change NumberColumn format to something like 0.000043?

Ramesh Maharjan · Answer 1 · 2017-12-08T15:28:40.083

26

you can use format_number function as

import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))

here 5 is the decimal places you want to show

As you can see in the link above that the format_number functions returns a string column

format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

If your don't require , you can call regexp_replace function which is defined as

regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.

and use it as

import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))

Thus comma (,) should be removed for large numbers.

edited Dec 08 '17 at 15:28

answered Jul 10 '17 at 09:46

Ramesh Maharjan

41,071
6
69
97

1

But this replaces `NumberColumn` with `NumberColumn` string type. E.g. if I order by `NumberColumn` it will be ordered like string. – Cherry Jul 13 '17 at 04:41
Yes @Cherry you are correct. You can cast it to Double as `df.withColumn("NumberColumn", format_number($"NumberColumn", 6).cast("Double"))` but doing so would just produce the original exponential value. So to show all the decimal values you will have to change the datatype to string. – Ramesh Maharjan Jul 13 '17 at 05:09

score 8 · Answer 2 · answered Jul 10 '17 at 09:36

8

You can use cast operation as below:

val df = sc.parallelize(Seq(0.000043)).toDF("num")    

df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")

adjust the precision and scale accordingly.

answered Jul 10 '17 at 09:36

vdep

3,541
4
28
54

score 2 · Answer 3 · answered Jun 17 '19 at 08:23

2

In newer versions of pyspark you can use round() or bround() functions. Theses functions return a numeric column and solve the problem with ",".

it would be like:

df.withColumn("NumberColumn", bround("NumberColumn",5))

answered Jun 17 '19 at 08:23

Jose Alberto Gonzalez

81
4

Spark SQL change format of the number

3 Answers3

Linked