How to capitalize middle row of a column in PySpark or Pandas

Question

I have a CSV file three columns values

    1st Column   2nd Column   3rd Column
    ram          karthi       bruce
    RAM          KATHI        BRUCE
    ram          karthi       bruce

I want to capitalize the 2nd row like ram,karthi and bruce in Pyspark or Pandas... I am not able to do. Please help. Solution in PySpark will be more helpful..

Spark does not have a concept of index, so '2nd row' is not a well-defined row. It's not something that is meaningful in spark. — mck, Nov 12 '20 at 09:01
No it cannot. See https://stackoverflow.com/questions/36938976/why-spark-sql-considers-the-support-of-indexes-unimportant — mck, Nov 12 '20 at 12:25
https://stackoverflow.com/questions/43406887/spark-dataframe-how-to-add-a-index-column-aka-distributed-data-index — himanish, Nov 12 '20 at 12:42
The code with zipWithIndex does not work for me in Pyspark ..Please help me.. — himanish, Nov 12 '20 at 12:42

score 1 · Answer 1 · answered Nov 12 '20 at 07:55

1

In Pandas, you can do it using df.loc and Series.str.upper:

In [1619]: df
Out[1619]: 
  1st_Column 2nd_Column 3rd_Column
0        ram     karthi      bruce
1        ram     karthi      bruce
2        ram     karthi      bruce

In [1620]: df.loc[1] = df.loc[1].str.upper()

In [1621]: df
Out[1621]: 
  1st_Column 2nd_Column 3rd_Column
0        ram     karthi      bruce
1        RAM     KARTHI      BRUCE
2        ram     karthi      bruce

answered Nov 12 '20 at 07:55

Mayank Porwal

33,470
8
37
58

I am not too sure how to do this in `PySpark`. – Mayank Porwal Nov 12 '20 at 08:04

How to capitalize middle row of a column in PySpark or Pandas

1 Answers1