0

I have struggle understanding the following anomaly, though I am sure it is ridiculously simple.

I have a list raw_ticker, with the following values:

raw_ticker = ['t1INCH:USD', 3.5881, 17907.680602819994, 3.5945, 10610.799208380002, -0.0172, -0.0048, 3.5982, 300068.50303883, 3.9639, 2.7685]

I want to convert it to a numpy array ticker, and specify the data types:

ticker = np.array(raw_ticker, 
        dtype=[
        ('symbol', str), 
        ('bid', float), 
        ('bid_size', float), 
        ('ask', float), 
        ('ask_size', float), 
        ('day_chg', float), 
        ('day_chg_p', float), 
        ('last', float), 
        ('vol', float), 
        ('high', float), 
        ('low', float)])

I get the following error :

could not convert string to float: 't1INCH:USD'

which I don't get as I explicitly specified that this field is a string, not a float.

samuel guedon
  • 575
  • 1
  • 7
  • 21
  • 4
    numpy arrays only store values with the same data type. Are you sure that's what you want to create? – norie May 20 '21 at 16:44
  • 1
    @norie, he's specifying one `dtype`, a compound one. OP - the data for a structured array needs to a tuple or list of tuples. – hpaulj May 20 '21 at 16:47
  • But again, the question is: Do you actually want to use a _structured array_ here? – cadolphs May 20 '21 at 16:48
  • What @norie stated indeed. You do something like `ticker = np.array(raw_ticker[1:])`, and then load the floats. You could also load multiple tickers into e.g. a dict[str, List[relevant datatype]] and use something like Pandas DataFrames or PyArrow Tables to organize your data on columnar basis. – jrbergen May 20 '21 at 16:52
  • First see @norie 's comment. If the values aren't the same, numpy will try to make them the same. For example `a = np.array(['1', 2])` becomes `['1', '2']` – Have a nice day May 20 '21 at 16:54
  • Apparently I am not using the proper terminology, though np.array can manage heterogeneous data types, and yes that's what I need. @hpaulj provided the answer. – samuel guedon May 20 '21 at 17:00

1 Answers1

1

Providing the data as a tuple or list of tuples:

In [363]: raw_ticker = ['t1INCH:USD', 3.5881, 17907.680602819994, 3.5945, 10610.799208380002, -
     ...: 0.0172, -0.0048, 3.5982, 300068.50303883, 3.9639, 2.7685]
In [364]: ticker = np.array(tuple(raw_ticker),
     ...:         dtype=[
     ...:         ('symbol', str),
     ...:         ('bid', float),
     ...:         ('bid_size', float),
     ...:         ('ask', float),
     ...:         ('ask_size', float),
     ...:         ('day_chg', float),
     ...:         ('day_chg_p', float),
     ...:         ('last', float),
     ...:         ('vol', float),
     ...:         ('high', float),
     ...:         ('low', float)])
In [365]: ticker
Out[365]: 
array(('', 3.5881, 17907.68060282, 3.5945, 10610.79920838, -0.0172, -0.0048, 3.5982, 300068.50303883, 3.9639, 2.7685),
      dtype=[('symbol', '<U'), ('bid', '<f8'), ('bid_size', '<f8'), ('ask', '<f8'), ('ask_size', '<f8'), ('day_chg', '<f8'), ('day_chg_p', '<f8'), ('last', '<f8'), ('vol', '<f8'), ('high', '<f8'), ('low', '<f8')])

The 'symbol' dtype needs a string width, e.g. 'U20'.

hpaulj
  • 221,503
  • 14
  • 230
  • 353