0

I wonder how to go about creating an expression in Java that takes a date variable formatted as "1999-12-30 12:34:45" and converts it to an Int holding "19991230".

This is probably something others would appreciate, especially people extracting and cleaning up data for loading into a Data Warehouse.

My guess is one need to change date pattern, convert to string, truncate first part (8 chars) and convert to int.

Ideas?

(update) My bad in explaining the issue. The resulting int will service as a key/Fk id to a Date Dimension. A similar operation should be done for creating a key to a Time-of-Day Dimension. The int will reduce the size consumed in the facts table, in which these int keys would be stored. See related answer: https://stackoverflow.com/a/8416548/1132571

Community
  • 1
  • 1
Geuder
  • 15
  • 1
  • 5
  • 2
    The operation you described is trivial. However, I don't think anybody will use this. Most people use either true data or unixtime. – Sergio Tulentsev Jan 05 '12 at 16:17
  • Why would you store a `Date` as an `Integer`? – mre Jan 05 '12 at 16:18
  • @mre: integer is integer. Easy, small (depends), cross-platform. – Sergio Tulentsev Jan 05 '12 at 16:35
  • 1
    I think you mean truncate the first 10 characters or the first 8 digits. If you are starting with a date instead of a string, why not format it as "yyyyMMdd" from the start? – Peter Lawrey Jan 05 '12 at 16:35
  • What is delivered from the source system is in the format yyyy-MM-dd hh:mm:ss, and I would love to have a "one liner" converting that into (1) a date key (int) and (2 - separate issue) another time-of-day key (int) consisting of "hhmm" as int – Geuder Jan 05 '12 at 23:53

10 Answers10

2

Pretty easy to do this using the SimpleDateFormat class. However, as others mentioned you probably should be storing the epoch timestamp as Database TimeStamp column instead of storing the date in an int form. It will cause problem later when you are manipulating the data with SQL.

public static void main(String[] args) throws ParseException {
        String dateStr = "1999-12-30 12:34:45";
        String formatStr = "yyyy-MM-dd HH:mm:ss";
        String formatStrOther = "yyyyMMdd";
        Date testDate = null;
        SimpleDateFormat sdf= new SimpleDateFormat(formatStr);
        SimpleDateFormat sdfOther= new SimpleDateFormat(formatStrOther);
        sdf.setLenient(false);
        testDate = sdf.parse(dateStr);
        Integer otherDate = Integer.valueOf(sdfOther.format(testDate));
        System.out.println("Newly formatted date in int is: " + otherDate); //prints Newly formatted date in int is: 19991230

    }
CoolBeans
  • 20,654
  • 10
  • 86
  • 101
  • instead of storing the timestamp the foreign key to a Date Dimension will give me access to several other things than just the datetime. Thats why I want to transform the date into an int. Some external refs on date key a la Kimball: http://forum.kimballgroup.com/t334-eliminate-date-dimension-surrogate-key – Geuder Jan 05 '12 at 23:58
  • @Geuder - okay. What I provided above should do the trick for you. – CoolBeans Jan 06 '12 at 00:01
  • @Geuder Please accept an answer if it helped you. You can accept by clicking the tick mark next to the answer. This will encourage future answers for your questions, gain you reputation points, and most importantly you will have a good acceptance percentage. – CoolBeans Jan 06 '12 at 16:55
1

Simply remove special chars and then convert to int

Integer.parseInt("1999-12-30 12:34:45".replaceAll("[- :]", ""));

in your case you want only the date, not the time so it would be:

Integer.parseInt("1999-12-30 12:34:45".substring(0,10).replaceAll("-", ""));

I should mention that this only works if the original format is correct... and it is actually quite ugly, I would rather use a long and getTime - and maybe simpleDateFormat. You really do not want to throw away useful data!

Niko
  • 6,133
  • 2
  • 37
  • 49
  • Don't forget substring(0,8) after replaceAll() or substring(0,10) on the date string before replaceAll() to just return the date and not the time. – Spencer Kormos Jan 05 '12 at 16:28
  • I also wonder how this compares with SimpleDateFormat.parse(), this is less verbose for sure, but RegEx is not always performant in Java. – Spencer Kormos Jan 05 '12 at 16:32
  • 2147483647 is the maximum value of Integer, so calling parseInt on both date and time will result in a NumberFormatException. Of course, you could go and replace the int with a long or BigInteger when using the method. – Maarten Bodewes Jan 05 '12 at 23:57
0

getTime will give you a long with this information, int won't be large enough.

http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Date.html#getTime()

KingCronus
  • 4,509
  • 1
  • 24
  • 49
  • The user wants a yearmonthday format in int, getTime() returns the milliseconds since the last epoch. Not the same thing. – Spencer Kormos Jan 05 '12 at 16:25
  • um, the idea is to "clean" data for use with a datawarehouse...storing the date in a format that cannot be trivially used in SQL doesn't quite make sense to me – Julien Chappuis Jan 05 '12 at 16:29
0

While I agree with the commenters finding this is strange, if I really needed to transform a date into a long I'd use the date.getTime() which returns a long representing the amount of millisec since 01.01.1970.

The good stuff is that it can easily be turned back into a date (Java has a constructor for this, I suspect other languages do as well)

Julien Chappuis
  • 349
  • 1
  • 11
0

Assuming that the format remains the same:

String dateStr = "1999-12-30 12:34:45";
dateStr = dateStr.substring(0, 10);
dateStr = dateStr.replace("-", "");
int n = Integer.parseInt(dateStr);
System.out.println(n);

Prints: 19991230

Bhesh Gurung
  • 50,430
  • 22
  • 93
  • 142
0

you could just use something like:

var dateObj=new Date("1999-12-30 12:34:45");

and then all you need to do is get the value of the date object, like so:

var timeStamp=dateObj.valueOf();

this will give you milliseconds since 1970, a universal time system which is used by most computer systems.

to convert back use:

var newTime=new Date(timeStamp);
Jimmery
  • 9,783
  • 25
  • 83
  • 157
0

Use SimpleDateFormat, SimpleDateFormat.parse, SimpleDateFormat.format and Integer.parseInt [ http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html ]

1) create a SimpleDateFormat to parse the string into a Date object

2) create a SimpleDateFormat to format the Date object into the String representation of the in

3) use Integer.parseInt to change the newly formatted String into an int.

However, storing the date as an int for warehousing does not make much sense to me.

I guess that there is some restriction preventing you using either Date or a String in ISO 8601 format.

graney
  • 1,365
  • 2
  • 13
  • 20
  • In my case I already have a date object. How wold you create a one liner including step 2 and 3 of your solution? – Geuder Jan 06 '12 at 10:07
  • Also, it's not a restriction "forcing" this requirement. Having the foreign key to the date dimension as an int memory effective and indexable -> speed in analysis. – Geuder Jan 06 '12 at 10:09
0

And another one. (I don't like parsing strings, too error prone with dates. ;-))

DateFormat format = new SimpleDateFormat("dd-MM-yyyy");

Date d;
try {

    d = (Date)format.parse("30-12-1999");
    Calendar c = Calendar.getInstance();
    c.setTime(d);

    int result = c.get(Calendar.YEAR) * 100*100 + 
                (c.get(Calendar.MONTH) + 1) * 100 + 
                 c.get(Calendar.DAY_OF_MONTH);

    System.out.println(result);

} catch (ParseException e) {
    e.printStackTrace();
}
afrischke
  • 3,836
  • 17
  • 30
0

Thanks for all suggestions, input and suggestions. This was what I was looking for:

Integer.parseInt(new SimpleDateFormat("yyyyMMdd").format(row.date));

Where row.date (from Talend Open Studio) holds the date I wanted to transform into an int date key.

Marcus
  • 12,296
  • 5
  • 48
  • 66
Geuder
  • 15
  • 1
  • 5
0

In many ways you're better off leaving date keys as dates, although you may wish to truncate at days for the key. If you need the actual date-time as well, store it separately. There are a number of reasons for this, mainly to do with date arithmetic.

  • Calculating date ranges on the fact table can be done without joining against the date dimension.

  • Queries will return the date - this is quite efficient if you're just doing an ad-hoc query on the table.

  • Special values for 'before', 'after', 'error' can be selected so they're still fairly obvious ('1800-01-01', '9000-01-01' '6666-01-01' etc.).

  • Self-joins by date or date range are more efficient if you don't have to involve a date dimension.

  • Most importantly, date arithmetic can be done directly on the table without having to join against the dimension. Calculating a date 180 days before '2012-01-15' is easier than doing it for 20120115. Also, SQL does not have a native conversion from 20120015 to the corresponding date without looking it up in a dimension.

ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197