Here is what you can do
import org.apache.spark.sql.functions._
//create a dataframe with demo data
val df = spark.sparkContext.parallelize(Seq(
(1, "Fname1", "Lname1", "Belarus"),
(2, "Fname2", "Lname2", "Belgium"),
(3, "Fname3", "Lname3", "Austria"),
(4, "Fname4", "Lname4", "Australia")
)).toDF("id", "fname","lname", "country")
//create a new column with the first letter of column
val result = df.withColumn("countryFirst", split($"country", "")(0))
//save the data with partitionby first letter of country
result.write.partitionBy("countryFirst").format("com.databricks.spark.csv").save("outputpath")
Edited:
You can also use the substring which can increase the performance as suggested by Raphel as
substring(Column str, int pos, int len)
Substring starts at pos and is
of length len when str is String type or returns the slice of byte
array that starts at pos in byte and is of length len when str is
Binary type
val result = df.withColumn("firstCountry", substring($"country",1,1))
and then use partitionby with write
Hope this solves your problem!