Python/Matplotlib: Randomly select "sample" scatter points for different marker

Question

Pretty much exactly what the question states, but a little context:

I'm creating a program to plot a large number of points (~10,000, but it will be more later on). This is being done using matplotlib's plt.scatter. This command is part of a loop that saves the figure, so I can later animate it.

What I want to be able to do is randomly select a small portion of these particles (say, maybe 100?) and give them a different marker than the rest, even though they're part of the same data set. This is so I can use them as placeholders to see the motion of individual particles, as well as the bulk material.

Is there a way to use a different marker for a small subset of the same data?

For reference, the particles are uniformly distributed just using the numpy random sampler, but my code for that is:

for i in range(N): # N number of particles
    particle_position[i] = np.random.uniform(0, xmax)  # Initialize in spatial domain
    particle_velocity[i] = np.random.normal(0, 5)      # Initialize in velocity space

for i in range(maxtime):
    plt.scatter(particle_position, particle_velocity, s=1, c=norm_xvel, cmap=br_disc, lw=0)

The position and velocity change on each iteration of the main loop (there's quite a bit of code), but these are the main initialization and plotting routines.

I had an idea that perhaps I could randomly select a bunch of i values from range(N), and use an ax.scatter() command to plot them on the same axes?

have you tried using two plt.scatter() where the other one contains the random particle subset? This answer here might help: http://stackoverflow.com/questions/11190735/python-matplotlib-superimpose-scatter-plots?rq=1 — jtitusj, Apr 04 '16 at 08:02

Reblochon Masque · Answer 1 · 2016-04-04T13:53:31.967

Here is a possible solution to have a subset of your points identified with a different marker:

import matplotlib.pyplot as plt
import numpy as np

SIZE = 100
SAMPLE_SIZE = 10

def select_subset(seq, size):
    """selects a subset of the data using ...
    """
    return seq[:size]

points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)

plt.scatter(points_x, points_y, marker=".", color="blue")
plt.scatter(select_subset(points_x, SAMPLE_SIZE), 
            select_subset(points_y, SAMPLE_SIZE), 
            marker="o", color="red")

plt.show()

It uses plt.scatter twice; once on the full data set, the other on the sample points.

You will have to decide how you want to select the sample of points - it is isolated in the select_subset function..

You could also extract the sample points from the data set to prevent marking them twice, but numpy is rather inefficient at deleting or resizing.

Maybe a better method is to use a mask? A mask has the advantage of leaving your original data intact and in order.

Here is a way to proceed with masks:

import matplotlib.pyplot as plt
import numpy as np
import random

SIZE = 100
SAMPLE_SIZE = 10

def make_mask(data_size, sample_size):
    mask = np.array([True] * sample_size + [False ] * (data_size - sample_size))
    np.random.shuffle(mask)
    return mask

points_x = np.random.uniform(-1, 1, size=SIZE)
points_y = np.random.uniform(-1, 1, size=SIZE)
mask = make_mask(SIZE, SAMPLE_SIZE)
not_mask = np.invert(mask)

plt.scatter(points_x[not_mask], points_y[not_mask], marker=".", color="blue")
plt.scatter(points_x[mask], points_y[mask], marker="o", color="red")

plt.show()

As you see, scatter is called once on a subset of the data points (the ones not selected in the sample), and a second time on the sampled subset, and draws each subset with its own marker. It is efficient & leaves the original data intact.

I like the solution with the mask, in the end it is more memory efficient. You only have to store the mask, I am storing a full copy of the data. — Chiel, Apr 04 '16 at 13:02
Yes, it is both memory efficient, fast, and preserves the data. — Reblochon Masque, Apr 04 '16 at 13:13

Chiel · Accepted Answer · 2016-04-04T08:22:28.393

0

The code below does what you want. I have selected a random set v_sub_index of N_sub indices in the correct range (0 to N) and draw those (with _sub suffix) from the larger samples particle_position and particle_velocity. Please note that you don't have to loop to generate random samples. Numpy has great functionality for that without having to use for loops.

import numpy as np
import matplotlib.pyplot as pl

N = 100
xmax = 1.
v_sigma = 2.5 / 2. # 95% of the samples contained within 0, 5
v_mean  = 2.5      # mean at 2.5

N_sub = 10
v_sub_index = np.random.randint(0, N, N_sub)

particle_position = np.random.rand (N) * xmax
particle_velocity = np.random.randn(N)

particle_position_sub   = np.array(particle_position[v_sub_index])
particle_velocity_sub   = np.array(particle_velocity[v_sub_index])
particle_position_nosub = np.delete(particle_position, v_sub_index)
particle_velocity_nosub = np.delete(particle_velocity, v_sub_index)

pl.scatter(particle_position_nosub, particle_velocity_nosub, color='b', marker='o')
pl.scatter(particle_position_sub  , particle_velocity_sub  , color='r', marker='^')
pl.show()

edited Apr 04 '16 at 08:22

answered Apr 04 '16 at 08:13

Chiel

6,006
2
32
57

So effectively, I'd be putting two markers over some particle? Since the N_sub particles would still be contained in the main particle_() indexes. – Yoshi Apr 04 '16 at 08:18
I have edited the example, where I use the `delete` function from `numpy` to remove the samples. – Chiel Apr 04 '16 at 08:18
Thanks! This is probably the easiest way to do it. I did it a slightly different way, just screwing around, but it was mostly to do with selecting the random sample (the easy part) than the plotting (which was the hard part). I'm glad to hear that 'scatter' superimposes automatically! Much easier than I thought. – Yoshi Apr 04 '16 at 08:25
Each numpy.delete forces a copy of the entire array. – Reblochon Masque Apr 06 '16 at 17:33
@ReblochonMasque I agree, I put a comment in the answer below, which is a better solution than mine. – Chiel Apr 06 '16 at 18:13

Python/Matplotlib: Randomly select "sample" scatter points for different marker

2 Answers2