2

I have a CSV file three columns values

    1st Column   2nd Column   3rd Column
    ram          karthi       bruce
    RAM          KATHI        BRUCE
    ram          karthi       bruce
    

I want to capitalize the 2nd row like ram,karthi and bruce in Pyspark or Pandas... I am not able to do. Please help. Solution in PySpark will be more helpful..

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
himanish
  • 21
  • 2
  • Spark does not have a concept of index, so '2nd row' is not a well-defined row. It's not something that is meaningful in spark. – mck Nov 12 '20 at 09:01
  • I guess it can be done.. – himanish Nov 12 '20 at 11:48
  • No it cannot. See https://stackoverflow.com/questions/36938976/why-spark-sql-considers-the-support-of-indexes-unimportant – mck Nov 12 '20 at 12:25
  • https://stackoverflow.com/questions/43406887/spark-dataframe-how-to-add-a-index-column-aka-distributed-data-index – himanish Nov 12 '20 at 12:42
  • The code with zipWithIndex does not work for me in Pyspark ..Please help me.. – himanish Nov 12 '20 at 12:42

1 Answers1

1

In Pandas, you can do it using df.loc and Series.str.upper:

In [1619]: df
Out[1619]: 
  1st_Column 2nd_Column 3rd_Column
0        ram     karthi      bruce
1        ram     karthi      bruce
2        ram     karthi      bruce

In [1620]: df.loc[1] = df.loc[1].str.upper()

In [1621]: df
Out[1621]: 
  1st_Column 2nd_Column 3rd_Column
0        ram     karthi      bruce
1        RAM     KARTHI      BRUCE
2        ram     karthi      bruce
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58