1

I have a pandas dataframe and I want to add random NA and random noise in the data

    exp_TSPAN6  exp_TNMD    exp_DPM1    exp_SCYL3   exp_C1orf112
0   7.951917    3.524705    12.043700   7.605068    8.214067
1   8.079243    9.545859    5.6445321   8.509788    6.853905
2   11.335783   12.45859    12.254986   6.617365    8.196391

Example Output

    exp_TSPAN6  exp_TNMD    exp_DPM1    exp_SCYL3   exp_C1orf112
0   8.951917    4.524705    11.043700   7.605068    8.214067
1   8.079243    NA          NA          8.509788    6.853905
2   11.335783   NA          12.254986   6.617365    9.196391

I have tried the following code to add NA, but I could not add random noise

for col in data.columns:
data.loc[data.sample(frac=0.1).index, col] = pd.np.nan

1 Answers1

1

Why don't you try what is suggested here: Adding gaussian noise to a dataset of floating points and save it (python)

  1. Load the data into a pandas dataframe clean_signal = pd.read_csv("data_file_name")
  2. Use numpy to generate Gaussian noise with the same dimension as the dataset.
  3. Add gaussian noise to the clean signal with signal = clean_signal + noise