I want to save the mean (by row) of different set of dataframe columns and store them in a new dataframe

Question

For doing so, I have a list of lists (which are my clusters), for example:

asset_clusts=[[0,1],[3,5],[2,4, 12],...]

and original dataframe(in my code I call it 'x') is as : return time series of s&p 500 companies

I want to choose column [0,1] of the original dataframe and compute the mean (by row) of them and store it in a new dataframe, then compute the mean of columns [3, 5], and add it to the new dataframe, and so on ...

mu=pd.DataFrame() 
for j in range(get_number_of_elements(asset_clusts)):
    mu=x.iloc[:,asset_clusts[j]].mean(axis=1)

but, it gives to me only a column and i checked, this one column is the mean of last cluster columns

in case of ambiguity, function of get_number_of_elements is:

def get_number_of_elements(clist):
    count = 0
    for element in clist:
        count += 1
    return count

Please provide a [MRE] with your example date as text not as an image. — Michael Szczesny, Dec 31 '21 at 09:25
Change variable name "list" in `get_number_of_elements` function as `list` is keyword you can not use it as variable — Deven Ramani, Dec 31 '21 at 10:22
I do not know in addition to this, what remains to provide. what part you can not reproduce? @MichaelSzczesny — Farhad, Dec 31 '21 at 12:13

score 0 · Answer 1 · answered Dec 31 '21 at 10:23

0

def get_number_of_elements(clust_list):
    count = 0
    for element in clust_list:
        count += 1
    return count

answered Dec 31 '21 at 10:23

Deven Ramani

751
4
10

Thank you, but my problem is not the function. I did the change you suggested but again "mu" dataframe is just the mean of the last cluster columns and does not show me other clusters mean columns @Deven – Farhad Dec 31 '21 at 11:59

score 0 · Accepted Answer · answered Dec 31 '21 at 13:42

I solved it and in case if it would be helpful for others, here is the final function:

def clustered_series(x, org_asset_clust):
    """
    x:return data
    org_asset_clust: list of clusters
    ----> mean of each cluster returns by row
    """
    def get_number_of_elements(org_asset_clust):
        count = 0
        for element in org_asset_clust:
            count += 1
        return count
    mu=[]
    for j in range(get_number_of_elements(org_asset_clust)):
        mu.append(x.iloc[:,org_asset_clust[j]].mean(axis=1))
        cluster_mean=pd.concat(mu, axis=1)
        
    return cluster_mean

I want to save the mean (by row) of different set of dataframe columns and store them in a new dataframe

2 Answers2