0

I have a huge dataframe with a lot of zero values. And, I want to calculate the average of the numbers between the zero values. To make it simple, the data shows for example 10 consecutive values then it renders zeros then values again. I just want to tell python to calculate the average of each patch of the data.

The pic shows an example

enter image description here

ThePyGuy
  • 17,779
  • 5
  • 18
  • 45

1 Answers1

0

first of all I'm a little bit confused why you are using a DataFrame. This is more likely being stored in a pd.Series while I would suggest storing numeric data in an numpy array. Assuming that you are having a pd.Series in front of you and you are trying to calculate the moving average between two consecutive points, there are two approaches you can follow.

  1. zero-paddding for the last integer:

  2. assuming circularity and taking the average between the first and the last value

Here is the expected code:


import numpy as np
import pandas as pd

data_series = pd.Series([0,0,0.76231, 0.77669,0,0,0,0,0,0,0,0,0.66772, 1.37964, 2.11833, 2.29178, 0,0,0,0,0])
np_array = np.array(data_series)


#assuming zero_padding
np_array_zero_pad = np.hstack((np_array, 0))
mvavrg_zeropad = [np.mean([np_array_zero_pad[i], np_array_zero_pad[i+1]]) for i in range(len(np_array_zero_pad)-1)]


#asssuming circularity
np_array_circ_arr = np.hstack((np_array, np_array[-1]))
np_array_circ_arr = [np.mean([np_array_circ_arr[i], np_array_circ_arr[i+1]]) for i in range(len(np_array_circ_arr)-1)]
Sunshine
  • 181
  • 1
  • 3
  • 15
  • It so so huge data with multiple column that will later do other operation on them. what I want to do here simply: I want to calculate the average for each patch. the average of [0.762, 0.766] alone, and [0.66, 1.37, 2.11, 2.29] alone and so forth till the end of the Dataframe. – Gamaal Heikal Sep 04 '21 at 11:39
  • Ok this is actually no problem. One question beforehand: does this repeat according to the same index? if yes than it's pretty easy. You only select the specific indices and apply np.mean() – Sunshine Sep 04 '21 at 15:38
  • no nothing here is repeating, I tried to find any anchor for the selection but I could not. I just want python to create extra column and put these averages accordingly because later I will plot it with other columns and a lot to happen. – Gamaal Heikal Sep 04 '21 at 17:19
  • well what you could do is to always add the values as long as its not zero and append it to a list. is that something that helps you? It's difficult to give you advice if the minimal example doesnt really sute your problem correctly. – Sunshine Sep 04 '21 at 17:29
  • please have a look here, I reposted the question. I am new here and I am struglling :D https://stackoverflow.com/questions/69057749/calculating-average-for-segments-in-dataframe – Gamaal Heikal Sep 04 '21 at 17:55