0

Hello i am currently try different data automation processes with python and postgreSQL. I automated the cleaning and upload of a dataset with 40.000 data emtries into my Database. Due to some flaws in my process i had to truncate some tables or data entries.


i am using: python 3.9.7 / postgeSQL 13.3 / pgAdmin 4 v.5.7

Problem

Currently i have ID's of tables who start at the ID of 44700 instead of 1 (due do my editing).

For Example a table of train stations begins with the ID 41801 and ends with ID of 83599.

Question

How can reorganize my index so that the ID starts from 1 to 41801?

After looking online i found topics like "bloat" or "reindex". I tired Vacuum or Reindex but nothing really showed a difference in my tables? As far as now my tables have no relations to each other. What would be the approach to solve my problem in postgreSQL. Some hidden Function i overlooked? Maybe it's not a problem at all, but it definitely looks weird. At some point of time i end up with the ID of 250.000 while only having 40.000 data entries in my table.

black_hole_sun
  • 908
  • 11
  • 41

1 Answers1

1

Do you use a Sequence to generate ID column of your table? You can check it in pgAdmin under your database if you have a Sequence object in your database: Schemas -> public -> Sequences.

You can change the current sequence number with right-click on the Sequence and set it to '1'. But only do this if you deleted all rows in the table and before you start to import your data again.

As long as you do not any other table which references the ID column of your train station table, you can even update the ID with an update statement like:

UPDATE trainStations SET ID = ID - 41801 WHERE 1 = 1;
retoen
  • 379
  • 1
  • 7
  • Schemas -> public -> Sequences did not work for me, i had no Sequence. But I tried your Code Line with a small train agencies table (37 rows). The First time i ended up with `ID 0`. But i figured out no matter what your first `id` is, 500 for example take '- 499' then it starts with`ID 1`. I did not really understand the `WHERE 1 = 1`. I tried a bit do understand and mix a table up. But as a side effect i figured out `truncate restart identity` will do the job also, while `truncate`starts witit the last ID `truncate restart identity``starts with 1. Thanks – black_hole_sun Oct 01 '21 at 10:52
  • 1
    the `where 1 = 1` is just to enforce the update statement to all rows of the table, as 1 always equals to 1, obviously. – retoen Oct 01 '21 at 11:17
  • So `where 20 = 20` equals to update only every twentieth row? Thats why i ended up with unsorted numbers. Great inside will use it later. – black_hole_sun Oct 01 '21 at 11:23
  • 1
    No! thats not true, dont do this! the `20 = 20` behaves exactly like `1 = 1` as both result in `true`. It is common usage to add a WHERE clause to any UPDATE Statement. Usually, if you only want to update one specific row, you would have a WHERE clause like `WHERE ID = `. But since you want to update every row, you just give a condition which always evaluates to true, like `WHERE 1 = 1` or even `WHERE 841 = 841`. – retoen Oct 01 '21 at 11:37
  • Ok. Seems a odd condition. I mean i understand `WHERE ID = ` but `WHERE 1 = 1` simply to make it `true`. I just tried it without the condition and postgres added 100 or took 100 to all my rows. Is it some syntax for precaution? I may have to look further into it. – black_hole_sun Oct 01 '21 at 12:03
  • 1
    Yes, probably you can also use just `WHERE true`, but then this probably does not work in MsSQL. So the `WHERE 1 = 1` is more database independent. I have started to ALWAYS have a WHERE clause in my update statements, also as precaution, because if you forget to add it, you will manipulate the whole table, even if you just wanted to update only one row initially. And then you can just hope to have a backup available, or you're screwed. It only happens once to every one of us ;) – retoen Oct 01 '21 at 12:14