1

I came across- the below lambda code line in PySpark while browsing a long python Jupyter notebook, I am trying to understand this piece of line. Can you explain what it does in a best possible way?

parse =  udf (lambda x: (datetime.datetime.utcnow() - timedelta(hours= x)).isoformat()[:-3] + 'Z', StringType())
ZygD
  • 22,092
  • 39
  • 79
  • 102
Rahul Diggi
  • 288
  • 2
  • 16
  • 1
    you should try running it on python to test as they're not spark specific functions. `utcnow()` gives the current datetime, `timedelta()` will create a delta object that can be added or subtracted to/from the current datetime. The rest of the part is to format the datetime as required. – samkart Jun 17 '22 at 04:39

1 Answers1

1
udf(
    lambda x: (datetime.datetime.utcnow() - timedelta(hours=x)).isoformat()[:-3] + 'Z',
    StringType()
)

udf in PySpark assigns a Python function which is run for every row of Spark df.

Creates a user defined function (UDF).

New in version 1.3.0.

Parameters:

The returnType will be a string. Removing it, we get the function body we're interested in:

lambda x: (datetime.datetime.utcnow() - timedelta(hours=x)).isoformat()[:-3] + 'Z'

In order to find out what the given lambda function does, you can create a regular function from it. You may need to add imports too.

import datetime
from datetime import timedelta

def func(x):
    return (datetime.datetime.utcnow() - timedelta(hours= x)).isoformat()[:-3] + 'Z'

To really see what's going on you can create variables out of every element and print them.

import datetime
from datetime import timedelta

def my_func(x):
    v1 = datetime.datetime.utcnow()
    v2 = timedelta(hours=x)
    v3 = v1 - v2
    v4 = v3.isoformat()
    v5 = v4[:-3]
    v6 = v5 + 'Z'

    [print(e) for e in (v1, v2, v3, v4, v5)]
    
    return v6

print(my_func(3))

# 2022-06-17 07:16:36.212566
# 3:00:00
# 2022-06-17 04:16:36.212566
# 2022-06-17T04:16:36.212566
# 2022-06-17T04:16:36.212
# 2022-06-17T04:16:36.212Z

This way you see how result changes after every step. You can print whatever you want at any step you need. E.g. print(type(v4))

ZygD
  • 22,092
  • 39
  • 79
  • 102