5

While doing data analysis in Ipython I often have to look at the data by just printing its contents to the shell. Numpy have the facility to show only the margins of huge objects when they are too long themselves. I really like this feature of ndarrays but when I print internal python object (eg. dictionary with 15k objects in it) they are dumped to the screen or sometimes truncated in not very friendly fashion. So for example for a huge dictionary I would like to see in output something like this

{ '39416' : '1397',  
  '39414' : '1397',  
  '7629'  : '7227',  
  ...,  
  '31058' : '9606',  
  '21097' : '4062',  
  '32040' : '9606' }  

It would be perfect if alignment and nested data structures could be taken care of. Is their a special module which can provide such functionality for python basic classes (list, dict)? Or there are some ipython configuration tricks I know nothing about?

vsminkov
  • 10,912
  • 2
  • 38
  • 50
Pommy
  • 381
  • 2
  • 10

3 Answers3

1

There is a good built-in library pprint. Take a look at it.

>>> from pprint import pprint
>>> pprint({x: list(range(x)) for x in range(10)})
{0: [],
 1: [0],
 2: [0, 1],
 3: [0, 1, 2],
 4: [0, 1, 2, 3],
 5: [0, 1, 2, 3, 4],
 6: [0, 1, 2, 3, 4, 5],
 7: [0, 1, 2, 3, 4, 5, 6],
 8: [0, 1, 2, 3, 4, 5, 6, 7],
 9: [0, 1, 2, 3, 4, 5, 6, 7, 8]}
vsminkov
  • 10,912
  • 2
  • 38
  • 50
1

If your dictionary is well structured, you could convert it to a Pandas dataframe for viewing.

import numpy as np
import pandas as pd

>>> pd.DataFrame({'random normal': np.random.randn(1000), 
                  'random int': np.random.randint(0, 10, 1000)})
     random int  random normal
0             6       0.850827
1             7       0.486551
2             4      -0.111008
3             9      -1.319320
4             6      -0.393774
5             1      -0.878507
..          ...            ...
995           2      -1.882813
996           3      -0.121003
997           3       0.155835
998           5       0.920318
999           2       0.216229

[1000 rows x 2 columns]
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • I think it is an overkill for viewing purposes. I could use numpy structured array for the same purpose but I don't think it is a good solution. – Pommy Aug 29 '16 at 16:46
0

The numpy formatter has an ellipsis functionality; as a default it kicks in with 1000+ items.

pprint can make the display nicer, but I don't think it has an ellipsis functionality. But you can study its docs.

With a list I may use a slice

list(range(100))[:10]

to see a limited number of the values.

That's harder to do with a dictionary. With some trial and error, this works tolerably:

{k:dd[k] for k in list(dd.keys())[:10]}

(I'm on Py3 so need the extra list).

It wouldn't be hard to write your own utility functions if you can't find something in pprint. It's also possible that some package on pypi does this. For example a quick search turned up

https://pypi.python.org/pypi/pprintpp

pprintpp which claims to be actually pretty. But like the stock pprint it seems to be more concerned with the nesting depth of lists and dictionaries, and not so much their length.

hpaulj
  • 221,503
  • 14
  • 230
  • 353