15

After show command spark prints the following:

+-----------------------+---------------------------+
|NameColumn             |NumberColumn               |
+-----------------------+---------------------------+
|name                   |4.3E-5                     |
+-----------------------+---------------------------+

Is there a way to change NumberColumn format to something like 0.000043?

philantrovert
  • 9,904
  • 3
  • 37
  • 61
Cherry
  • 31,309
  • 66
  • 224
  • 364

3 Answers3

26

you can use format_number function as

import org.apache.spark.sql.functions.format_number
df.withColumn("NumberColumn", format_number($"NumberColumn", 5))

here 5 is the decimal places you want to show

As you can see in the link above that the format_number functions returns a string column

format_number(Column x, int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places, and returns the result as a string column.

If your don't require , you can call regexp_replace function which is defined as

regexp_replace(Column e, String pattern, String replacement)
Replace all substrings of the specified string value that match regexp with rep.

and use it as

import org.apache.spark.sql.functions.regexp_replace
df.withColumn("NumberColumn", regexp_replace(format_number($"NumberColumn", 5), ",", ""))

Thus comma (,) should be removed for large numbers.

Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97
  • 1
    But this replaces `NumberColumn` with `NumberColumn` string type. E.g. if I order by `NumberColumn` it will be ordered like string. – Cherry Jul 13 '17 at 04:41
  • Yes @Cherry you are correct. You can cast it to Double as `df.withColumn("NumberColumn", format_number($"NumberColumn", 6).cast("Double"))` but doing so would just produce the original exponential value. So to show all the decimal values you will have to change the datatype to string. – Ramesh Maharjan Jul 13 '17 at 05:09
8

You can use cast operation as below:

val df = sc.parallelize(Seq(0.000043)).toDF("num")    

df.createOrReplaceTempView("data")
spark.sql("select CAST (num as DECIMAL(8,6)) from data")

adjust the precision and scale accordingly.

vdep
  • 3,541
  • 4
  • 28
  • 54
2

In newer versions of pyspark you can use round() or bround() functions. Theses functions return a numeric column and solve the problem with ",".

it would be like:

df.withColumn("NumberColumn", bround("NumberColumn",5))