2

I am working with Tensorflow Extended, preprocessing data and among this data are date values (e.g. values of the form 16-04-2019). I need to apply some preprocessing to this, like the difference between two dates and extracting the day, month and year from it.

For example, I could need to have the difference in days between 01-04-2019 and 16-04-2019, but this difference could also span days, months or years.

Now, just using Python scripts this is easy to do, but I am wondering if it is also possible to do this with Tensorflow? It's important for my use case to do this within Tensorflow, because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline.

I am using Tensorflow 1.13.1, Tensorflow Extended and Python 2.7 for this.

Martijncl
  • 105
  • 2
  • 10

2 Answers2

0

Posting from similar issue on tft github.

Here's a way to do it:

import tensorflow_addons as tfa
import tensorflow as tf
from typing import TYPE_CHECKING

@tf.function(experimental_follow_type_hints=True)
def fn_seconds_since_1970(date_time: tf.string, date_format: str = "%Y-%m-%d %H:%M:%S %Z"):
    seconds_since_1970 = tfa.text.parse_time(date_time, date_format, output_unit='SECOND')
    seconds_since_1970 = tf.cast(seconds_since_1970, dtype=tf.int64)
    return seconds_since_1970

string_date_tensor = tf.constant("2022-04-01 11:12:13 UTC")

seconds_since_1970 = fn_seconds_since_1970(string_date_tensor)

seconds_in_hour, hours_in_day = tf.constant(3600, dtype=tf.int64), tf.constant(24, dtype=tf.int64)
hours_since_1970 = seconds_since_1970 / seconds_in_hour
hours_since_1970 = tf.cast(hours_since_1970, tf.int64)
hour_of_day = hours_since_1970 % hours_in_day
days_since_1970 = seconds_since_1970 / (seconds_in_hour * hours_in_day)                                                                                                                        
days_since_1970 = tf.cast(days_since_1970, tf.int64)                                                                                                                               
day_of_week = (days_since_1970 + 4) % 7 #Jan 1st 1970 was a Thursday, a 4, Sunday is a 0

print(f"On {string_date_tensor.numpy().decode('utf-8')}, {seconds_since_1970} seconds had elapsed since 1970.")

My two cents on the broader underlying issue, here the question is computing time differences, for which we want to do these computations on tensors. Then the question becomes "What are the units of these tensors?" This is a question of granularity. "The next question is what are the data types involved?" Start with a string likely, end with a numeric. Then the next question becomes is there a "native" tensorflow function that can do this? Enter tensorflow addons!

Just like we are trying to optimize training by doing everything as tensor operations within the graph, similarly we need to optimize "getting to the graph". I have seen the way datetime would work with python functions here, and I would do everything I could do avoid going into python function land as the code becomes so complex and the performance suffers as well. It's a lose-lose in my opinion.

PS - This op is not yet implemented on windows as per this, maybe because it only returns unix timestamps :)

Pritam Dodeja
  • 177
  • 1
  • 8
-1

I had a similar problem. The issue because of an if-check with in TFX that doesn't take dates types into account. As far as I've been able to figure out, there are two options:

  1. Preprocess the date column and cast it to an int (e.g. calling toordinal() on each element) field before reading it into TFX

  2. Edit the TFX function that checks types to account for date-like types and cast them to ordinal on the fly.

You can navigate to venv/lib/python3.7/site-packages/tfx/components/example_gen/utils.py and look for the function dict_to_example. You can add a datetime check there like so

def dict_to_example(instance: Dict[Text, Any]) -> tf.train.Example:
  """Converts dict to tf example."""
  feature = {}
  for key, value in instance.items():
    # TODO(jyzhao): support more types.
    if isinstance(value, datetime.datetime):  # <---- Check here
        value = value.toordinal()
    if value is None:
      feature[key] = tf.train.Feature()
   ...

value will become an int, and the int will be handled and cast to a Tensorflow type later on in the function.

Jon.H
  • 794
  • 2
  • 9
  • 23