4

I know that Tableau is fairly popular so I cannot be the only person who is using the Python TDE API to get data into the compressed Tableau format. Documentation

1) This API is only Python 2.7 and it is literally the only reason why I need 2.7 installed

2) I use pandas to process and clean raw data and prepare it for Tableau, but then I need to read an intermediate .csv in a 2.7 environment with csv reader and do row based operations (so pandas creates a .csv, then I open that with Python 2.7 and do my TDE stuff)

Here is an example code that I have. The first portion is just creating an empty .tde file and defining the data type per column which can be done separately. The last chunk is what I wish I could do in pandas with a to_tde method. It is inserting each row of data into the tde file along with the data type.

#this doesnt need to be included, it is creating the empty tde table
    ExtractAPI.initialize()
    start_time = time.clock()

    if os.path.isfile(extract_file):
        print('Deleting file', extract_file)
        os.remove(extract_file)

    try:
        tde_file = Extract(extract_file)
    except:
        os.remove(extract_file)
        tde_file = Extract(extract_file)

    csv = csv.reader(open(data_location + csv_filename, 'rb'), delimiter=',', quotechar='"')
    print('Done reading .csv')

    #this stuff doesn't really matter, this is just creating the empty tde file

    dates = [list of column names]
    strings = [list of column names]
    doubles = [list of column names]
    ints = [list of column names]

    def create_column(column, dtype):
        tabledef.addColumn(column, dtype)

    if tde_file.hasTable('Extract'):
        print('Error')
        table = tde_file.openTable('Extract')
        tabledef = table.getTableDefinition()
    else:
        tabledef = TableDefinition()
        for column in dates:
            create_column(column, Type.DATE)
        for column in strings:
            create_column(column, Type.CHAR_STRING)
        for column in doubles:
            create_column(column, Type.DOUBLE)
        for column in ints:
            create_column(column, Type.INTEGER)

    table = tde_file.addTable('Extract', tabledef)     
    print('Done')

#this is the part that I wish I could do in pandas with a to_tde function. it is inserting each row one at a time based on the datatype which was defined    

    newrow = Row(tabledef)
    next(csv)

    for line in csv:
        date = dt.datetime.strptime(line[0], "%Y-%m-%d")
        newrow.setDate(0, date.year, date.month, date.day)
        date = dt.datetime.strptime(line[1], "%Y-%m-%d")
        newrow.setDate(1, date.year, date.month, date.day)
        date = dt.datetime.strptime(line[2], "%Y-%m-%d")
        newrow.setDate(2, date.year, date.month, date.day)    

        newrow.setCharString(3, str(line[3]))
        newrow.setCharString(4, str(line[4]))
        newrow.setCharString(5, str(line[5]))
        newrow.setCharString(6, str(line[6]))
        newrow.setCharString(7, str(line[7]))
        newrow.setCharString(8, str(line[8]))
        newrow.setCharString(9, str(line[9]))
        newrow.setCharString(10, str(line[10]))

        newrow.setDouble(11, float(line[11]))
        newrow.setDouble(12, float(line[12]))
        newrow.setDouble(13, float(line[13]))
        newrow.setDouble(14, float(line[14]))
        newrow.setDouble(15, float(line[15]))

        newrow.setInteger(16, int(line[16]))
        newrow.setInteger(17, int(line[17]))
        newrow.setInteger(18, int(line[18]))
        newrow.setInteger(19, int(line[19]))
        newrow.setInteger(20, int(line[20]))
        table.insert(newrow)

    tde_file.close()
    ExtractAPI.cleanup()
    print('Extract created in {} seconds'.format(time.clock() - start_time))

Is this something which is being worked on by anyone? Does it seem like it would be easy to do (even if not directly added to the main pandas library)?

trench
  • 5,075
  • 12
  • 50
  • 80
  • I would love this feature! I have not seen anything like it in pandas. You might find the workarounds described in this conversation interesting/useful: https://community.tableau.com/thread/157193 – p_sutherland Mar 09 '18 at 22:08

2 Answers2

2

There are various open source repo's:

TableauSDKPy3

It's a python library that can write to tde or hyper from Pandas DataFrame or pyodbc cursor. It's shipped with Tableau API, you don't need to install them as a preparation

https://github.com/JunjieW/TableauSDKPy3

pandleau#

It's a python module that can write Pandas DataFrame to tde and hyper. However, you need to install Tableau API locally first:

https://github.com/bwiley1/pandleau

Community
  • 1
  • 1
jwbam
  • 21
  • 2
  • I would also add pantab to the list for Hyper extracts - it's implemented mostly in C so it has a huge focus on performance and scalability that other libraries or hand-written code might not offer https://pantab.readthedocs.io/en/latest/ – Will Ayd Jan 09 '20 at 21:48
-4

I stumbled upon your question searching for a solution myself. But, found another link. Maybe the following link could be of help: https://www.doingdata.org/blog/how-to-create-a-tableau-data-extract-using-python