0

I am currently working on the following: data - with the correct index

for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    kmeans.fit(data_values)
    wcss.append(kmeans.inertia_)
    kmeans = KMeans(n_clusters=2).fit(data_values)
    y = kmeans.fit_predict(data_values)  # prediction of k
    df= pd.DataFrame(y,index = data.index)
....

#got here multiple dicts

Example of y:

[1 2 3 4 5 2 2 5 1 0 0 1 0 0 1 0 1 4 4 4 3 1 0 0 1 0 0 ...]




f = pd.DataFrame(y, columns = [buster] )
f.to_csv('busters.csv, mode = 'a')

y = clusters after determination I dont know how did I stuck on this.. I am iterating over 20 dataframes, each one consists of one columns and values from 1-9. The index is irrelevent. I am trying to append all frame together but instead it just prints them one after the other. If I put ".T" to transpose it , I still got rows with irrelevent values as index, which I cant remove them because they are actually headers.

Needed result enter image description here

TheUndecided
  • 187
  • 1
  • 12
  • Can you please confirm that the `dict`s look like `{'Buster1': [0, 2, 2, 4, 5]}, {'Buster2': [1, 2, 3, 4, 5]} ...` (and that the values are lists with exactly 5 elements), or if not, how do they look? Ideally, please provide an example of `y` and where the column name is coming from. There may be a more efficient way to create your final dataframe. – Nikolaos Chatzis Nov 05 '20 at 14:53
  • The dict looks the same way you wrote, not only 5 elements, I edited the question. – TheUndecided Nov 05 '20 at 16:16

1 Answers1

1

If the dicts produced in each iteration look like {'Buster1': [0, 2, 2, 4, 5]}, {'Buster2': [1, 2, 3, 4, 5]} ..., using 5 elements here for illustration purposes, and all the lists, i.e., values in the dicts, have the same number of elements (as it is the case in your example), you could create a single dict and use pd.DataFrame directly. (You may also want to take a look at pandas.DataFrame.from_dict.)

You may have lists with more than 5 elements, more than 3 dicts (and thus columns), and you will be generating the dicts with a loop, but the code below should be sufficient for getting the idea.

>>> import pandas as pd
>>> 
>>> d = {}
>>> # update d in every iteration
>>> d.update({'Buster 1': [0, 2, 2, 4, 5]})
>>> d.update({'Buster 2': [1, 2, 3, 4, 5]})
>>> # ...
>>> d.update({'Buster n': [0, 9, 3, 0, 0]})
>>>
>>> pd.DataFrame(d, columns=d.keys())
   Buster 1  Buster 2  Buster n
0         0         1         0
1         2         2         9
2         2         3         3
3         4         4         0
4         5         5         0

If you have the keys, e.g., 'Buster 1', and values, e.g., [0, 2, 2, 4, 5], separated, as I believe is the case, you can simplify the above (and make it more efficient) by replacing d.update({'Buster 1': [0, 2, 2, 4, 5]}) with d['Buster 1']=[0, 2, 2, 4, 5].

I included columns=d.keys() because depending on your Python and pandas version the ordering of the columns may not be as you expect it to be. You can specify the ordering of the columns through specifying the order in which you provide the keys. For example:

>>> pd.DataFrame(d, columns=sorted(d.keys(),reverse=True))
   Buster n  Buster 2  Buster 1
0         0         1         0
1         9         2         2
2         3         3         2
3         0         4         4
4         0         5         5

Although it may not apply to your use case, if you do not want to print the index, you can take a look at How to print pandas DataFrame without index.

Nikolaos Chatzis
  • 1,947
  • 2
  • 8
  • 17
  • Thank you very much, it worked. I kept the "d.update" but I deleted the parentheses from the "y" so it will print only the values and not values inside a list, e.g, d.update({buster: y}). – TheUndecided Nov 05 '20 at 21:16