Pandas DataFrame: Grouping Rows?

Question

Two challenges here to what I'm trying to accomplish.

A DataFrame where the same company is listed for 2 consecutive rows. The first row associated with each company is related to Apple (iOS) and the second is for Android.

I need to have the 'App Views' column represented as an int and then the other columns would be a % of the views. (so if there are 5000 app views the next column for Apple would be installs and I want to show the % of users who viewed the app, then installed it). For this I'll need several columns beyond instal but to keep it short I am just leaving it like this!

That's the first part of the challenge. For the 2nd part of the challenge:

I really need to be able to make a big DataFrame full of fake data. Maybe Faker? The way the fake data needs to be populated would be with random values. So for each company I need a random number for Apple Views and then a 0 for Android, and in the next row a random number for Android views and a 0 for Apple. Then I'll need to take a % of those views and have randomized %'s for the next column.

The table is the result I am looking for:

( If this seems like a terrible idea to do in python and would be easier to do in excel somehow that's a great answer too just need someone can point me in the right direction if that is the case then I could then import a .CSV into a DataFrame! )

   Company Name     Apple App Views  Apple Install   Droid View  DoidInstall
0    Zynga               5000             0.50          0.00         0.00
1    Zynga               0                0             15000        0.33
2    EA Mobile           22000            0.57          0.00         0.00
3    EA Mobile           0                0             26000        0.49

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, Jul 30 '22 at 12:03

Nemo20k · Accepted Answer · 2022-07-29T19:50:30.883

import numpy as np
import pandas as pd

# create array with selected values
app_views = [4000, 2222, 9999]
app_install = [0, 0.3, 0.83]

# generate a numpy array with 3 random integeres between 1000 to 10,000
random_app_views = np.random.randint(1000, 10000, size=3)

# generate a numpy array with 3 random numbers between 0 to 1
random_app_install = np.random.uniform(0, 1, size=3)

df = pd.DataFrame({
     'app_views': app_views,
     'app_install_rate': app_install,
     'random_app_views': random_app_views,
     'random_app_install': random_app_install
})

will produce a DataFrame like:

	app_views	app_install	random_app_views	random_app_install
0	4000	0.00	2196	0.626350
1	2222	0.30	6917	0.412264
2	9999	0.83	3291	0.303517

hope this would suffice, good luck!

Pandas DataFrame: Grouping Rows?

1 Answers1