0

my assignment is to store the following into an array type column:

def sample_udf(df:SparkDataFrame):
  device_issues = []
  if (df['altitude'] == 0):      
    return "alt" 
  elif (df['latitude'] <= -90
      or df['latitude'] >=90):      
    return "gps_lat"
  elif (df['longitude'] <= -180
      or df['longitude'] >= 180):      
    return "gps_long"
  elif (df['direction'] < 0
      or df['direction'] > 359):      
    return "gps_direction"
  else:
    return device_issues

df_new = df.withColumn("deviceIssues", sample_udf(f.col("altitude"), f.col("latitude")))

when I run that cmd, I got this error:

TypeError: anomaly_detections() takes 1 positional argument but 2 were given

any help will be appreciated

I'm expecting that the column "deviceIssues" will be in arraytype column.

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • hi @Corralien thanks for ur effort, when i tried that one, the error is : ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions. – Jilinnie Park Jan 17 '23 at 09:12

1 Answers1

1

Try:

from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StringType

def sample_udf(altitude, longitude, latitude, direction):
    device_issues = []
    if altitude == 0:
        device_issues.append('alt')
    if (latitude <= -90) or (latitude >=90):
        device_issues.append('gps_lat')
    if (longitude <= -180) or (longitude >= 180):
        device_issues.append('gps_lon')
    if (direction < 0) or (direction > 359):
        device_issues.append('gps_direction')
    return device_issues

# the output is an ArrayType of StringType
f_sample_udf = udf(lambda *p: sample_udf(*p), ArrayType(StringType()))

df_new = df.withColumn('deviceIssues', f_sample_udf('altitude', 'longitude', 'latitude', 'direction'))

Output:

>>> df_new.show()
+--------+---------+--------+---------+--------------------+
|altitude|longitude|latitude|direction|        deviceIssues|
+--------+---------+--------+---------+--------------------+
|       0|       10|      10|      380|[alt, gps_direction]|
|      20|      187|      20|       20|           [gps_lon]|
+--------+---------+--------+---------+--------------------+
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • thanks again for ur effort, however, the deviceIssues column must be in arraytype. deviceId licensePlate owner deviceIssues asdffghJkk111 NP-PerfTest Owner1 [“gps_lat”, “gps_long”, “alt”,“gps_direction”] asdffghJkk122 NP-Testing Owner2 [ “gps_long”, “gps_direction”] this is the output that I expected :( – Jilinnie Park Jan 17 '23 at 10:15
  • yes!!! sorry, my reply was late. i didn't see 1st ur edited answer!! thanks for this one. – Jilinnie Park Jan 17 '23 at 11:42