0

Is there some sort of performance difference for inserting, updating, or deleting data when you use the TEXT data type?

I went here and found this:

Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

This makes me believe there should not be a performance difference, but my friend, who is much more experienced than I am, says inserts, updates, and deletes are slower for the TEXT data type.

I had a table that was partitioned with a trigger and function, and extremely heavily indexed, but the inserts did not go that slow.

Now I have another table, with 5 more columns all of which are text data type, the same exact trigger and function, no indexes, but the inserts are terribly slow.

From my experience, I think he is correct, but what do you guys think?

Edit #1: I am uploading the same exact data, just the 2nd version has 5 more columns.

Edit #2: By "Slow" I mean with the first scenario, I was able to insert 500 or more rows per second, but now I can only insert 20 rows per second.

Edit #3: I didn't add the indexes to the 2nd scenario like they are in the 1st scenario because indexes are supposed to slow down inserts, updates, and deletes, from my understanding.

Edit #4: I guarantee it is exactly the same data, because I'm the one uploading it. The only difference is, the 2nd scenario has 5 additional columns, all text data type.

Edit #5: Even when I removed all of the indexes on scenario 2 and left all of them on scenario 1, the inserts were still slower on scenario 2.

Edit #6: Both scenarios have the same exact trigger and function.

Edit #7: I am using an ETL tool, Pentaho, to insert the data, so there is no way for me to show you the code being used to insert the data.

I think I might have had too many transformation steps in the ETL tool. When I tried to insert data in the same transformation as the steps that actually transform the data, it was massively slow, but when I simply inserted the data already transformed into an empty table and then inserted data from this table into the actual table I'm using,the inserts were much faster than scenario 1 at 4000 rows per second.

The only difference between scenario 1 and scenario 2, other than the increase in columns in scenario 2, is the number of steps in the ETL transformation.Scenario two has about 20 or more steps in the ETL transformation. In some cases, there are 50 more.

I think I can solve my problem by reducing the number of transformation steps, or putting the transformed data into an empty table and then inserting the data from this table into the actual table I'm using.

  • `... no indexes, but the inserts are terribly slow.` Maybe add the indexes, just like in the first table? (and dont forget to run `VACUUM ANALYZE` after that) – joop Sep 26 '16 at 14:16
  • Are you inserting exactly the same data? It's difficult to comment unless we see the actual test scripts. – David Aldridge Sep 26 '16 at 14:37
  • I responded to both of you by editing my post above. Edits 3 and 4. – Chicken Sandwich No Pickles Sep 26 '16 at 14:50
  • "*but my friend, who is much more experienced than I am, says inserts, updates, and deletes are slower for the TEXT data type*" - your friend is wrong. She/he is probably confusing that with the `text` data type in other DBMS (e.g. SQL Server or MySQL) –  Sep 26 '16 at 14:53
  • Horse_With_No_Name: That is what I am thinking, but I just wanted to see what others thought since it started going slower for some reason. – Chicken Sandwich No Pickles Sep 26 '16 at 15:21
  • @LunchBox: the "getting slower" is most probably caused by the indexes getting bigger and bigger and thus the index maintenance is getting more time consuming. But the data type (`text` vs `varchar`) won't make a difference. –  Sep 26 '16 at 17:26
  • I edited my post to respond to you horse with no name. – Chicken Sandwich No Pickles Sep 26 '16 at 17:57
  • You should add the scripts (or the program) you use to test this. When I run a million inserts into a table with 5 `text` columns, this takes exactly the same time as a million inserts with a table that has 5 `varchar` columns. So unless you show us _your_ code I am not going to believe you. –  Sep 26 '16 at 17:59
  • I edited my post to respond to your comment. – Chicken Sandwich No Pickles Sep 26 '16 at 18:08

1 Answers1

1

PostgreSQL text and character varying are the same, with the exception of the (optional) length limit for the latter. They will perform identically.

The only reasons to prefer character varying are

  • you want to impose a length limit

  • you want to conform with the SQL standard

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263