0

How could I select in column 'Funding' all the values ending with "M" and then eliminate M,$ and add "0," before value.

ex. from $535M to 0,535

That's beacuase I have Billion and Million values, I've decided to formatting the column in billion so, values in millions must be 0,...

Here is the example. Df is Unicors companies from Kaggle

df.head(10).to_dict()  

{'Company': {0: 'Bytedance',
1: 'SpaceX',
2: 'SHEIN',
3: 'Stripe',
4: 'Klarna',
5: 'Canva',
6: 'Checkout.com',
7: 'Instacart',
8: 'JUUL Labs',
9: 'Databricks'},
'Valuation': {0: '$180B',
1: '$100B',
2: '$100B',
3: '$95B',
4: '$46B',
5: '$40B',
6: '$40B',
7: '$39B',
8: '$38B',
9: '$38B'},
'Date Joined': {0: '2017-04-07',
1: '2012-12-01',
2: '2018-07-03',
3: '2014-01-23',
4: '2011-12-12',
5: '2018-01-08',
6: '2019-05-02',
7: '2014-12-30',
8: '2017-12-20',
9: '2019-02-05'},
'Industry': {0: 'Artificial intelligence',
1: 'Other',
2: 'E-commerce & direct-to-consumer',
3: 'Fintech',
4: 'Fintech',
5: 'Internet software & services',
6: 'Fintech',
7: 'Supply chain, logistics, & delivery',
8: 'Consumer & retail',
9: 'Data management & analytics'},
'City': {0: 'Beijing',
1: 'Hawthorne',
2: 'Shenzhen',
3: 'San Francisco',
4: 'Stockholm',
5: 'Surry Hills',
6: 'London',
7: 'San Francisco',
8: 'San Francisco',
9: 'San Francisco'},
'Country': {0: 'China',
1: 'United States',
2: 'China',
3: 'United States',
4: 'Sweden',
5: 'Australia',
6: 'United Kingdom',
7: 'United States',
8: 'United States',
9: 'United States'},
'Continent': {0: 'Asia',
1: 'North America',
2: 'Asia',
3: 'North America',
4: 'Europe',
5: 'Oceania',
6: 'Europe',
7: 'North America',
8: 'North America',
9: 'North America'},
'Year Founded': {0: 2012,
1: 2002,
2: 2008,
3: 2010,
4: 2005,
5: 2012,
6: 2012,
7: 2012,
8: 2015,
9: 2013},
'Funding': {0: '$8B',
1: '$7B',
2: '$2B',
3: '$2B',
4: '$4B',
5: '$572M',
6: '$2B',
7: '$3B',
8: '$14B',
9: '$3B'},
'Select Investors': {0: 'Sequoia Capital China, SIG Asia Investments, Sina Weibo, Softbank Group', 1: 'Founders Fund, Draper Fisher Jurvetson, Rothenberg Ventures', 2: 'Tiger Global Management, Sequoia Capital China, Shunwei Capital Partners', 3: 'Khosla Ventures, LowercaseCapital, capitalG', 4: 'Institutional Venture Partners, Sequoia Capital, General Atlantic', 5: 'Sequoia Capital China, Blackbird Ventures, Matrix Partners', 6: 'Tiger Global Management, Insight Partners, DST Global', 7: 'Khosla Ventures, Kleiner Perkins Caufield & Byers, Collaborative Fund', 8: 'Tiger Global Management', 9: 'Andreessen Horowitz, New Enterprise Associates, Battery Ventures'}}

I did a similar manipulation with Valuation, here is how I did. I hope it's right.

df['Valuation'] = df['Valuation'].str.replace(
    "B","").str.replace(
    "$","").astype(int)

I've tried in several way but none of them works. Here are some of them:

df['Funding'] = np.where(df.Funding.str.contain("M"), 
                         df['Funding'] = ('0,'+ df['Funding']),
                        pass)
df['Funding'] = df['Funding'].str.replace(
    "B", "").str.replace(
    "$","").str.replace(
    "M","0,")
if df['Funding'].str.contains("M").any():
    df['Funding'] = df['Funding'].str.replace("M", "") 
asd = "M"
if any(("M" in asd) for M in df['Funding']):
    df['Funding'].join((df['Funding'][:0],'0,',df['Funding'][0:])) and replace("M", "")

Thank to all who want to help me. It's my first time with Python, I'm more familiare with R

1 Answers1

0

If you want all your column values in billions, you can use:

df["Valuation"] = df["Funding"].str[1:-1].astype(int).where(df["Funding"].str.endswith("B"),df["Funding"].str[1:-1].astype(int).div(1000))

>>> df
  Funding  Valuation
0     $8B      8.000
1     $2B      2.000
2   $535M      0.535
Input df:
df = pd.DataFrame({"Funding": ["$8B", "$2B", "$535M"]})
not_speshal
  • 22,093
  • 2
  • 15
  • 30