17

I have a data set with 9 columns. 7 features are used for feature selection and one of them for the classification. I used tsne library for feature selection in order to see how much my data could be classified.The result from tsne is shown in picture.

However, I want to visualize my data in another way. I would like to set a color for each observation based on the column f1 (id). for example :

f1(id) f2 f3 ... f9(class label)
1      66 77 ... A
1      44 88 ... A
2      33 55 ... B
2      77 88 ..  B

colors come from f1 and shapes come from f9 . I do not know how to do it! I would appreciate for your comments or give me some references to learn more about visualization part. enter image description here this is my code:

plt.scatter(visualize_x, visualize_y, c= y,marker='^', cmap=plt.cm.get_cmap("jet", 10))
Elham
  • 827
  • 2
  • 13
  • 25
  • 1
    You'll have to explain more. How would you like to color the points? Each one a different color? Or all those with output variable == 1 as one color, and the rest as another? – bnaecker Dec 06 '17 at 22:44
  • so the color `c=y`, which contains `0` and `1`s? that's why you see only the colors at the top and bottom of your color bar. – innisfree Dec 06 '17 at 22:44
  • Which values of visualize_x and visualize_y correspond to each of the 7 features? You would like to see a scatter plot with 7 colors, one for each feature, right? A preliminary problem is then to get the x and y values associated with a given color – kevinkayaks Dec 06 '17 at 22:46
  • @bnaecker I would like to have a different colors for different observation but using a shape to label them for example the first person have a color red and belongs to the class 1 with represents by + .I hope that would be clear – Elham Dec 06 '17 at 22:49
  • @AlterNative OK, it sounds like you want the color of the point to represent the observation (so each point a different color), and the marker style to represent the class. Is that right? – bnaecker Dec 06 '17 at 22:52
  • @innisfree yes , y has two values 0 and 1. I should changed my color bar to 2 instead of 10 to makes more sense I think. – Elham Dec 06 '17 at 22:52
  • @bnaecker exactly – Elham Dec 06 '17 at 22:53
  • @AlterNative That sounds unlikely to be helpful. What extra information would the color tell you? Using symbols to indicate the class of every point is fine, but the color seems like it would be either unhelpful, or redundant with the spatial position of the point. Are you just trying to make each point more clearly distinct from its neighbors? – bnaecker Dec 06 '17 at 23:01
  • @bnaecker I would like to compare with other plots of the same inputs to see when there is misclassification, are they belong to the same person or not! – Elham Dec 06 '17 at 23:11
  • @AlterNative And you're OK with just visually scanning all those data points to find the one with *exactly* the same color? How do you expect to match up points from one observation across plots? – bnaecker Dec 06 '17 at 23:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160662/discussion-between-alter-native-and-bnaecker). – Elham Dec 06 '17 at 23:52

1 Answers1

42

Is this the type of thing you're after?

from matplotlib import pyplot as plt 

#generate a list of markers and another of colors 
markers = ["." , "," , "o" , "v" , "^" , "<", ">"]
colors = ['r','g','b','c','m', 'y', 'k']

#make a sample dataset
x = np.arange(0,10)  #test x values.. every feature gets the same x values but you can generalize this
y = [s*x for s in np.arange(7)] #generate 7 arrays of y values 


for i in range(7): #for each of the 7 features 
    mi = markers[i] #marker for ith feature 
    xi = x #x array for ith feature .. here is where you would generalize      different x for every feature
    yi = y[i] #y array for ith feature 
    ci = colors[i] #color for ith feature 
    plt.scatter(xi,yi,marker=mi, color=ci) 
plt.show() 

enter image description here

kevinkayaks
  • 2,636
  • 1
  • 14
  • 30
  • 2
    I think it's safe to say with that vague reply I can't attempt to help you further at all. What do you need? – kevinkayaks Dec 06 '17 at 23:26
  • 2
    I understand you have seven categories, each with some set of (x,y,c) where x,y are coordinates in the plane and c is 0 or 1 Then you wanted different colors and markers for each category. Or do you want different markers for each category and different colors for each c? – kevinkayaks Dec 06 '17 at 23:32
  • @bnaecker Sorry for confusion. the colors belong to each group of the same id from my dataset and each shapes belong to the class. – Elham Dec 06 '17 at 23:36
  • class as in 0 or 1 ? You have 7 groups and two classes per group? – kevinkayaks Dec 06 '17 at 23:37
  • consider this ,f1 ,f2,f3,f4,f5,f6,f7,f8,f9 - f1 is the id and f2 ..f8 are features for classification and f9 is the label. I need to color by f1 and shape it by f9 – Elham Dec 06 '17 at 23:50
  • 1
    so construct lists of the appropriate points x,y corresponding to each marker and color combination. Then, loop over the number of marker/color combinations, using the appropriate x,y,marker, and color values for each call in plt.scatter(). The matplotlib structure you need is indicated above – kevinkayaks Dec 08 '17 at 01:12