2

I am saving a kdb table to a dataframe then saving the dataframe to a csv. This works, however, the csv file and if i print(dataframe); null values are showing as " b" ", and all other string values are showing as " b'STRING' ".

Running Python 3.7, pandas 0.24.2 and qpython 2.0.0.

df = pandas.DataFrame(qpython query)
df.to_csv(path_or_buf="",
          sep=",", na_rep='',
          float_format=None,
          columns=None,
          header=True, index=False,
          index_label=None, mode='w+', compression=None, quoting=None, quotechar='"',
          line_terminator="\n", chunksize=50, tupleize_cols=None, date_format=None,
          doublequote=True,
          escapechar=None, decimal='.', encoding='utf-8')

I expected the KDB table to output to the csv correctly, with nulls being an empty column and strings just showing the string, without " b'STRING' ".

Any advice or help would be greatly appreciated. If anyone needs any more information, I'd be happy to provide.

Example in csv:

Null cells show as : b"

Cells containing strings show as:" b'Euro' " when in fact should just show "Euro"

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Might be a silly question but why not write directly to csv from kdb? Why do it via a dataframe? – terrylynch Jul 30 '19 at 09:54
  • @terrylynch I'm quite new to QPython, do you think you could point me in the right direction to do this or let me know how - I naturally just use pandas where i can due to familiarity – breddie forden Jul 30 '19 at 10:06
  • You can find pandas solution here: https://stackoverflow.com/questions/53884308/qpython-type-conversion-of-kdb-response-data – Rahul Jul 30 '19 at 10:10
  • Thanks Rahul, this was beneficial as well – breddie forden Jul 30 '19 at 10:21

2 Answers2

4

qPython has some functionality for converting a kdb table to a pandas dataframe. I begin by creating a table in kdb "t" that has 4 columns where the third column is a column of symbols and the 4th is a column of characters. The entrys in the first row are entirely nulls.

t:([] a: 0N, til 99; b: 0Nf, 99?1f; c: `, 99?`3; d: " ", 99?" ")
a b         c   d
-----------------

0 0.4123573 iee x
1 0.8397208 app l
2 0.3392927 ncm w
3 0.285506  pjn c

The table can then be read into Python using QConnection. If we convert the table to a dataframe after it is read in we can see that the symbols and chars are converted to bytes and the nulls are not converted correctly.

df=pandas.DataFrame(q('t'))
df.head()
                       a           b         c     d
0   -9223372036854775808         NaN       b''  b' '
1                      0    0.412357    b'iee'  b'x'
2                      1    0.839721    b'app'  b'l'
3                      2    0.339293    b'ncm'  b'w'
4                      3    0.285506    b'pjn'  b'c'

However if we use the pandas=True argument with our q query then most of the table is converted appropriately as desired:

df=q('t', pandas=True)
df.head()
      a        b         c  d
0   NaN      NaN       b''  
1   0.0 0.412357    b'iee'  x
2   1.0 0.839721    b'app'  l
3   2.0 0.339293    b'ncm'  w
4   3.0 0.285506    b'pjn'  c

However notice that entries stored as symbols in kdb are not converted as desired. In this case the following code will manually decode any columns specified in string_cols from bytes into strings using a similar method to the one suggested by Callum.

string_cols = ['c']
df[string_cols] = df[string_cols].applymap(lambda s : s.decode('utf-8')) 

giving an end result of:

df.head()
      a        b      c d
0   NaN      NaN        
1   0.0 0.412357    iee x
2   1.0 0.839721    app l
3   2.0 0.339293    ncm w
4   3.0 0.285506    pjn c

Which can easily be converted to a csv file.

Hope this helps

  • Thanks Jamie, this was insightful. I definitely should of used this method to start and then followed by Callums decoding functions. Would of been a lot easier! – breddie forden Jul 31 '19 at 13:18
1

I would have expected strings in kdb to be handled fine, as QPYTHON should convert null strings to python null strings. Null symbols, however, are converted to _QNULL_SYM. In this case, I think the 'b' prefix indicates a byte literal. You can try to decode the byte objects before saving to a csv

Normally in python I would do something along the following

df['STRINGCOL'] = df['STRINGCOL'].apply(lambda s: s.decode('utf-8'))

I don't have much experience with QPYTHON but I believe using qnull() will convert the null to a pythonic value.

df['STRINGCOL'] = df['STRINGCOL'].apply(lambda s: qnull(s))
Callum Biggs
  • 1,539
  • 5
  • 13
  • The first suggestion works great. Thanks a lot. So I'm presuming I will have to create a loop to apply this to every collumn? – breddie forden Jul 30 '19 at 10:19
  • 1
    Yeah mate, for all the columns that are having this issue. To take a hammer to it you can just apply a lambda to the whole dataframe along the column axis, something along the lines of ```df.apply(lambda s: s.decode('utf-8'), axis=1)``` – Callum Biggs Jul 30 '19 at 10:33
  • Thats what I attempted to do, however it is throwing up an error: AttributeError: ("'Series' object has no attribute 'decode'", 'occurred at index 0').. Any thoughts? – breddie forden Jul 30 '19 at 11:26
  • Some of the columns have already been decoded. Try a for loop with a try except catch, with the except just returning the original series. – Callum Biggs Jul 30 '19 at 11:33