5

I have a PostgreSQL database hosted on a Windows 2008 Server RT Virtual Machine (Yes I know it should be hosted on a Linux VM but this is what my organization has dictated it be on. Sigh...)

Our GIS guys dump a lot of shapefiles into a repository. We would like to have an autoprocess that walks through folder as a scheduled task. We would like to add these into our Postgres geodatabase for some other processes we are currently developing

I am looking to walk through large amounts of shapefiles and have their geometries and file names loaded into out database.

Here's the gist of the core portions of the ingest function I have working so far

import os, subprocess
base_dir = r"c:\shape_file_repository"
full_dir = os.walk(base_dir)
shapefile_list = []
for source, dirs, files in full_dir:
    for file_ in files:
        if file_[-3:] == 'shp':
            #print "Found Shapefile"
            shapefile_path = base_dir + '/' + file_
            shapefile_list.append(shapefile_path)
for paths in shapefile_list:  
    #This is the part where I keep running into trouble. os.system also didnt work
    temp_bat = open(r"c:\temp\temp_shp.bat", "w")
    temp_bat.write(r'start /D c:\Program Files (x86)\PostgreSQL\8.4\bin\shp2pgsql.exe' + \
                   paths + "new_shp_table | psql -d geometry_database")
    temp_bat.close()
    subprocess.Popen(r"c:\temp\temp_shp.bat")

Once the geometries are loaded into the new database table I already have code setup to pull the geometry out of the temporary table and load that plus the shapefile name into our main database table. My problem is I can do this through command prompt, however running the windows commands through python or outputting them to batch file and then running them does not seem to be working at all.

AlienAnarchist
  • 186
  • 4
  • 15

1 Answers1

4

Here are some modifications that should make thing work. Note that it would need further modification if you need to be notified if any of the commands fail. Note that it will fail for more than one shapefile, since a new_shp_table table will already exist until you have further logic to move or rename that table elsewhere, or to load it with a unique name.

Also, note that PostgreSQL 8.4 will reach it's end-of-life later this year, so you might want to plan to upgrade to a more recent release before it's too late.

import os, subprocess

# Choose your PostgreSQL version here
os.environ['PATH'] += r';C:\Program Files (x86)\PostgreSQL\8.4\bin'
# http://www.postgresql.org/docs/current/static/libpq-envars.html
os.environ['PGHOST'] = 'localhost'
os.environ['PGPORT'] = '5432'
os.environ['PGUSER'] = 'someuser'
os.environ['PGPASSWORD'] = 'clever password'
os.environ['PGDATABASE'] = 'geometry_database'

base_dir = r"c:\shape_file_repository"
full_dir = os.walk(base_dir)
shapefile_list = []
for source, dirs, files in full_dir:
    for file_ in files:
        if file_[-3:] == 'shp':
            shapefile_path = os.path.join(base_dir, file_)
            shapefile_list.append(shapefile_path)
for shape_path in shapefile_list:
    cmds = 'shp2pgsql "' + shape_path + '" new_shp_table | psql '
    subprocess.call(cmds, shell=True)
Mike T
  • 41,085
  • 18
  • 152
  • 203
  • I use psycopg2 to handle the SQL code for manipulating the database, so I have a handler to move the data from the new_shp_table to our master table then drop it as we iterate through each shapefiles. The code works perfectly, however as it iterates through is asks me for the postgres superuser password every single time. I would like this to run automatically. I updated my PGPASSFILE environ variable with the location of pgpass.conf but It's still not working. How do I account for this? – AlienAnarchist Mar 14 '14 at 16:50
  • You did answer my main question so I accepted the answer. However I'm still stuck on the password portion of the problem. – AlienAnarchist Mar 17 '14 at 17:38
  • It looks something like `os.environ['PGPASSFILE'] = r'C:\My Folder\pgpass.conf'`? I typically get pgAdmin to generate this file for me. – Mike T Mar 17 '14 at 20:51
  • I added that argument but it still requires a user to manually enter a password. Is there any way I can have postgres not require this? We are in an internal network so In terms of security this isnt much of an issue. We can always transfer the data to our public database later. – AlienAnarchist Mar 25 '14 at 15:45
  • That's odd. Another strategy is to `trust` the internal network IP address(es) in the pg_hba.conf file on the server. – Mike T Mar 25 '14 at 18:16
  • I figured it out thanks to this posting http://stackoverflow.com/questions/11949959/no-password-prompt-for-postgresql-superuser Specifically added this line of code os.environ['PGPASSWORD'] = 'insert password here' – AlienAnarchist Mar 25 '14 at 20:44
  • Also figured out that the double geometry in QGIS was because the SRID wasn't being set in the shp2pgsql command, so modifying to the following resolved that issue: cmds = 'shp2pgsql -s 4326 "' + shape_path + '" new_pgsql_table_name | psql ' – DPSSpatial Aug 07 '17 at 18:04