5

This is my working code, which is downloading an excel file from the website. It takes about 40 seconds.

Once you run this code, you will notice that Key1, Key2 and Key3 columns are object dtypes. I cleaned up the dataframe such that key1 and key2 only have alphanumeric values. Still pandas is keeping it as object dtype. I need to concatenate (as in MS Excel) Key1 and Key2 to create a separate column called deviceid. I realize that I cannot join the two columns since they are object dtypes. How would I convert to string so that I can create my new column?

import pandas as pd
import urllib.request
import time

start=time.time()
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20170816_da_bcsf.xls"
cnstsfxls = urllib.request.urlopen(url)
xlsf = pd.ExcelFile(cnstsfxls)
dfsf = xlsf.parse("Sheet1",skiprows=3)
dfsf.drop(dfsf.index[len(dfsf)-1],inplace=True)
dfsf.drop(dfsf[dfsf['Device Type'] == 'UN'].index, inplace=True)
dfsf.drop(dfsf[dfsf['Device Type'] == 'UNKNOWN'].index, inplace=True)
dfsf.drop(['Constraint Name','Contingency Name', 'Constraint Type','Flowgate Name'],axis=1, inplace=True)
end=time.time()
print("The entire process took - ", end-start, " seconds.")
vestland
  • 55,229
  • 37
  • 187
  • 305
Shyama Sonti
  • 321
  • 1
  • 5
  • 16

1 Answers1

0

I may be missing the point here. But if what you want to do is to construct a column where, for example, deviceid = RCH417 when Key1 = RCH and Key2 = 417, then dfsf['deviceid'] = dfsf['Key1'] + dfsf['Key2'] will work fine even though both columns are of type object.

Try this:

# Check value types
dfsf.dtypes

# Add your desired column
dfsf['deviceid'] = dfsf['Key1']  + dfsf['Key2']

# Inspect columns of interest
keep = ['Key1', 'Key2', 'deviceid']
df_keys = dfsf[keep]
print(df_keys.dtypes)

enter image description here

print(df_keys.head())

enter image description here

vestland
  • 55,229
  • 37
  • 187
  • 305