3

I am trying to do the following

  1. load some data with string columns
measurement_df = pd.read_csv('data/tag_measurements/all_measurements.csv')
measurement_df.head(3)
measurement_df
>> prints
.  timestamp               tag_1      tag_2        tag_3    
0   2018-01-01 11:09:00 0.729193    -0.236627   -1.968651   
1   2018-01-02 05:56:00 -2.812988   0.394632    -1.151147   
2   2018-01-03 00:37:00 0.363185    -0.089076   -1.509133   

at this point the timestamp column is of type str:

type(measurement_df.iloc[0]['timestamp'])
>> prints
str
  1. convert it to Vaex
vdf = vx.from_pandas(measurement_df)
vdf.head(3)
>> prints
#           tag_1          tag_2                  tag_3           index
0   0.7291933972260769  -0.2366268009370677  -1.9686509728501898    0
1   -2.8129876800434737 0.3946317890604529   -1.1511473058592252    1
2   0.3631852302577519  -0.08907562484360453 -1.5091330993605443    2 

somehow I lose the timestamp column. Any ideas what could be going wrong?

amirdel
  • 53
  • 1
  • 5
  • what is the dtype of 'timestamp'? (did you make sure it's 'datetime64'?) – FObersteiner Jun 02 '20 at 07:02
  • It is ```str``` (see edit to the post). I converted it to ```np.datetime64```: ```measurement_df['timestamp'] = [np.datetime64(i) for i in measurement_df['timestamp'].values]```. after the conversion it still does not work, i.e. the timestamp column disappears. – amirdel Jun 03 '20 at 00:32

1 Answers1

2

If you would like to preserve the date/time format, especially while reading CSVs, i suggest you do :

df = pd.read_csv('myfile.csv', parse_dates=['datetime_col_1', 'datetime_col_2'])

you can also do:

df = vaex.read_csv('myfile.csv', parse_dates=['datetime_col_1', 'datetime_col_2'])

it is the same since it is using the pandas method in the background.

Joco
  • 803
  • 4
  • 7
  • looking at [Vaex APIs](https://vaex.readthedocs.io/en/latest/api.html), I think the closest thing they have is `from_csv`. I tried calling `vx.from_csv('my_files.csv', parse_dates=['timestamp'])`, it leads to the following error and the `timestamp` is dropped: `could not convert column timestamp, error: AssertionError("dtype not supported: dtype('",), will try to convert it to string` – amirdel Jun 22 '20 at 21:23
  • 1
    I just check, and I was using an old version of vaex. Update to 3.0.0 (`conda install -c conda-forge vaex=3.0.0 `) and your proposed solution works! Thank you @Joco. – amirdel Jun 22 '20 at 21:43
  • FYI for anyone reading this one.. vaex is currently at version v4.5.x. Dunno why but sometimes pip/conda do not install the latest version – Joco Oct 28 '21 at 09:48