I have a fixed width file from the U.S. Census. It's the one in the zip called "orgeo2010.sf1". The zip is a big file. I want to read that file into a table in PostgreSQL 12.1. This is how I built the table.
create table census_2010.geo_header_sf1
(
fileid varchar(6),
stusab varchar(2),
sumlev varchar(3),
geocomp varchar(2),
chariter varchar(3),
cifsn varchar(2),
logrecno integer PRIMARY KEY,
region varchar(1),
division varchar(1),
state varchar(2),
county varchar(3),
countycc varchar(2),
countysc varchar(2),
cousub varchar(5),
cousubcc varchar(2),
cousubsc varchar(2),
place varchar(5),
placecc varchar(2),
placesc varchar(2),
tract varchar(6),
blkgrp varchar(1),
block varchar(4),
iuc varchar(2),
concit varchar(5),
concitcc varchar(2),
concitsc varchar(2),
aianhh varchar(4),
aianhhfp varchar(5),
aianhhcc varchar(2),
aihhtli varchar(1),
aitsce varchar(3),
aits varchar(5),
aitscc varchar(2),
ttract varchar(6),
tblkgrp varchar(1),
anrc varchar(5),
anrccc varchar(2),
cbsa varchar(5),
cbsasc varchar(2),
metdiv varchar(5),
csa varchar(3),
necta varchar(5),
nectasc varchar(2),
nectadiv varchar(5),
cnecta varchar(3),
cbsapci varchar(1),
nectapci varchar(1),
ua varchar(5),
uasc varchar(2),
uatype varchar(1),
ur varchar(1),
cd varchar(2),
sldu varchar(3),
sldl varchar(3),
vtd varchar(6),
vtdi varchar(1),
reserve2 varchar(3),
zcta5 varchar(5),
submcd varchar(5),
submcdcc varchar(2),
sdelem varchar(5),
sdsec varchar(5),
sduni varchar(5),
arealand integer,
areawatr integer,
name varchar(90),
funcstat varchar(1),
gcuni varchar(2),
pop100 integer,
hu100 integer,
intptlat varchar(11),
intptlon varchar(12),
lsadc varchar(2),
partflag varchar(1),
reserve3 varchar(6),
uga varchar(5),
statens varchar(8),
countyns varchar(8),
cousubns varchar(8),
placens varchar(8),
concitns varchar(8),
aianhhns varchar(8),
aitsns varchar(8),
anrcns varchar(8),
submcdns varchar(8),
cd113 varchar(2),
cd114 varchar(2),
cd115 varchar(2),
sldu2 varchar(3),
sldu3 varchar(3),
sldu4 varchar(3),
sldl2 varchar(3),
sldl3 varchar(3),
sldl4 varchar(3),
aianhhsc varchar(2),
csasc varchar(2),
cnectasc varchar(2),
memi varchar(1),
nmemi varchar(1),
puma varchar(5),
reserved varchar(18)
);
I tried to read from the file directly with:
Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:\projects_and_data\data\PostgreSQL\data\data\or2010.sf1\orgeo2010.sf1';
ERROR: value too long for type character varying(6)
CONTEXT: COPY geo_header_sf1, line 1, column fileid: "SF1ST OR04000000 00000014941
When that didn't work, I thought maybe I could import it into R (I know enough R for manipulating R) and edit it and write a new F.W.F. file. I tried with my new file and got the same result.
Decennial_2010=# COPY census_2010.geo_header_sf1
Decennial_2010-# FROM 'D:/projects_and_data/data/PostgreSQL/data/data/geo_a' CSV HEADER;
ERROR: value too long for type character varying(6)
CONTEXT: COPY geo_header_sf1, line 2, column fileid: "SF1ST OR 40000 14941
This is the first line of the data frame that holds the file in R and the write.fwf()
to write the new file that didn't work. I know I should use dput(td[1,])
, but then it prints all the levels for all the factors in all the rows (200,000+) and then the output from dput()
doesn't even fit in all the printable space of the console. So I'll just copy and paste the line as it appears by default. Sorry.
td[1,]
fileid stusab sumlev geocomp chariter cifsn logrecno region division state
1 SF1ST OR 40 00 0 NA 1 4 9 41
county countycc countysc cousub cousubcc cousubsc place placecc placesc tract
1 NA NA NA NA NA NA NA
blkgrp block iuc concit concitcc concitsc aianhh aianhhfp aianhhcc aihhtli
1 NA NA NA NA NA NA NA NA
aitsce aits aitscc ttract tblkgrp anrc anrccc cbsa cbsac metdiv csa necta
1 NA NA NA NA NA NA NA NA NA NA
nectasc nectadiv cnecta cbsapci nectapci ua uasc uatype ur cd sldu sldl vtd
1 NA NA NA NA NA NA NA NA NA NA NA
vtdi reserve2 zcta5 submcd submcdcc sdelem sdsec sduni arealand
1 NA NA NA NA NA NA NA 248607802255
areawatr
1 6191433228
name
1 Oregon
funcstat gcuni pop100 hu100 intptlat intptlon lsadc partflag reserve3 uga
1 A ! 3831074 1675562 43.97152 -120.6226 00 NA NA
statens countyns cousubns placens concitns aianhhns aitsns anrcns submcdns
1 1155107 NA NA NA NA NA NA NA NA
cd113 cd114 cd115 sldu2 sldu3 sldu4 sldl2 sldl3 sldl4 aianhhsc csasc cnectasc
1 NA NA NA NA NA NA NA NA NA NA NA NA
memi nmemi puma reserved
1 NA NA NA NA
library(gdata)
write.fwf(td, "D:/projects_and_data/data/PostgreSQL/data/data/geo_a", sep="")
What needs to be changed in the table or the starting file to get the file to copy into the table in PostgreSQL?