Is a Python list the same list as defined by my data structures and algorithms textbook?

Question

I am reading a book about data structures and basic algorithms to get a better grasp on programming. The book I am reading defines some data structures, e.g. list, array, etc, and specifies the interfaces to these, and also how they can be implemented (or constructed) using existing data structures in different languages. For example, it discusses how a list can be constructed as an array or as linked cells.

However, when I start looking at these data types in different languages I find that they do not always follow the interfaces that are defined in the book.

For example, the book says that in a list you should be able to step through the different elements one at the time, starting only at one of the ends. For example, if I want the third element, I would have to start with the first element and use a "next" operation until I get to the element I want. But lists in Python have integer indices and I can directly access any element directly by using the corresponding index. In the book, the index type only has to be an ordered type with a comparison operation for equality. (I think the book tries to stay general by abstracting things).

Does this mean that Python's lists are not lists in the "true sense"? Or is it that the data structures defined in the book are only suggestions on how the interfaces for these can look, and in practice there can be a lot of variation?

I understand that the idea about the book is to get a better understanding for how to think about data structures, and to introduce some typical ones, so I am inclined to think that the latter answer is the correct one.

Python lists are closer to arrays that have an auto-resize mechanism than they are linked-lists. Data structures, as taught, are a basis upon which many variations and extensions are created — Iain Shelvington, Dec 28 '19 at 22:34
OK, so the definition for a list (or perhaps linked list) is quite general then with some variations? I think I might have picked a bad example since it seems like I was misled by the name. Although I have seen some that calls the usual Python list (e.g. `[1, 2, 'hat']`) arrays. However, that did not either fit into the book's definition of an array, since they are supposed to be static. I think your comment together with the reply by @Gelineau answered my question. — JezuzStardust, Dec 28 '19 at 22:40
Think of a Python list as an abstraction over an array. When you define a list the interpreter actually creates an array of pointers to Python objects and allows you to perform index based lookups, the abstraction maintains the current length of the list and when it grows to be bigger than the originally allocated array it reallocates the array to accommodate any future growth — Iain Shelvington, Dec 28 '19 at 22:48

ggorlen · Accepted Answer · 2019-12-29T20:56:10.310

There are multiple names for many data structures (hash[map/table]/dictionary/associative array comes to mind). You're referring to two data structures which are unrelated beyond the fact that they both store a linear sequence of items. The name similarity is an unfortunate source of confusion, but the TL;DR is that lists in Python's terminology are very similar to, and are built upon, arrays while linked lists, which your text refers to as "lists", aren't.

It's important to separate implementation from interface: the underlying data structure behind a queue, for example, could be an array or a linked list, but this should be an implementation detail that the client of the interface doesn't know or care about. The interface should enforce guarantees for the time complexity of various operations. It's common for data structure texts to implement abstract data types using various underlying "primitive" data structures like arrays or linked lists, many of which may impact the resulting complexity of various operations (prompting student analysis).

Linked lists are simple, dynamic (expandible) data structures that offer fast O(1) add and removal operations on both ends. Their nodes may be doubly-linked and are usually structs or objects in memory with pointers or references to next and (if doubly-linked) previous elements as well as a data property per node. There is no random access for linked lists--you have to walk through the list from an endpoint to locate a single element by following the pointers.

Linked lists are typically used to implement queues, stacks and more complex data structures like Java's LinkedHashMap, which combines a doubly linked list and a hash map. Python offers collections.deque and Queue which are linked list-based data structures offering fast access to front and back elements. Other libraries, like Java's ArrayDeque, use circular arrays to implement the same deque interface.

Tree nodes are a variant of singly linked list nodes with two child pointers rather than a single next pointer. As with linked lists (and their nodes), typically tree nodes and trees will be an implementation detail behind an interface such as the C++ map. You can implement trees with arrays, for example, as is typically the case with heaps.

Python's lists (also known as ArrayLists (Java), arrays (JS/Ruby), vectors (C++/Haskell) and List (C#)) are a dynamic abstraction on primitive arrays, which are fixed in size. The list abstraction adds a length property and many functions for manipulating the underlying array like append/push/push_back, pop, shift, splice, etc (names are dependent on language). The underlying array will be automatically resized to fit the number of elements it contains. Inserting or removing elements at the front or middle of the list is an O(n) operation since as many as all elements need to be shifted to accommodate the adjustment to the array. This shifting is part of the abstraction and is hidden to the client.

The advantages of lists are the same as those of arrays: fast random access to any element. Additions to the end of a list are also fast and constant-time because the occasional reallocations of the underlying array are amortized across multiple append operations.

An important consideration for lists versus primitive arrays is the memory scheme. Python lists are comprised of objects and suffer from many of the problems of linked lists in that pointers to heap-allocated data need to be referenced and may have poor locality. Using a numpy array provides the advantages of a C array in that you get chunks of sequential memory which benefit from fast access (sweeping a memory-aligned offset over the data) and locality. Increased overhead is the typical cost of abstractions like lists and it can have a major performance impact on certain applications.

To add to the confusion, at least two high level languages, C++ and Haskell, have a "list" data structure, but these are actually linked lists rather than dynamic array(list)s.

Regardless of the language you're using, it's important to distinguish what the data structures you're using really are (in terms of time complexity for typical operations, primarily) to avoid inadvertent misuse and select the correct tool for the job.

Ok, thanks! This clarifies a lot! The book is indeed talking about _linked_ lists (both singly and doubly). Also, I now more clearly see the point of studying the subject (what you wrote in your last paragraph). I also finally got the point of having linked lists to start with (to build construct e.g. queues, which is done later in the book). — JezuzStardust, Dec 28 '19 at 23:12
Just to clarify, you can make a queue with an array as well (check the `ArrayDeque` source, for example), so it's worth implementing it both ways. As you mention, sometimes texts will say "list" and you have to infer whether they're talking about linked lists or dynamic arrays based on context. For a foundational DS&A textbook, lists probably refer to linked lists as you mention. — ggorlen, Dec 28 '19 at 23:29

score 1 · Answer 2 · answered Dec 28 '19 at 22:35

1

Python lists are not linked lists. They are dynamic arrays (https://en.wikipedia.org/wiki/Dynamic_array#Language_support)

answered Dec 28 '19 at 22:35

Gelineau

2,031
4
20
30

Is a Python list the same list as defined by my data structures and algorithms textbook?

2 Answers2