Is there any reason to worry about the column order in a table?

Question

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother? Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?

Andomar · Accepted Answer · 2009-06-01T09:23:04.457

108

Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:

Primary key columns first
Foreign key columns next.
Frequently searched columns next
Frequently updated columns later
Nullable columns last.
Least used nullable columns after more frequently used nullable columns

An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:

 SomeId int,
 SomeString varchar(100),
 SomeValue int

The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:

 SomeId int,
 SomeValue int,
 SomeString varchar(100)

Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.

EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.

edited Jun 01 '09 at 09:23

answered May 21 '09 at 19:04

Andomar

232,371
49
380
404

Every database engine I know of reserves 100 bytes for SomeString, even if it's null – James L May 21 '09 at 19:06
Wow, hadn't know this. Doesn't it have to fetch the entire block anyway, so you're not really saving any time in IO, just computation speed when calculating the offset. – Allain Lalonde May 21 '09 at 19:06
So this would have some impact on "select SomeValue from t"... (surely more to do with returning that value from lots of rows rather than about index lookups?) But how much impact? – araqnid May 21 '09 at 19:07
@TopBanana: Postgresql will not reserve 100 chars, but then with MVCC you don't rewrite the value if it gets updated so there wouldn't be value in doing it, I guess – araqnid May 21 '09 at 19:08
4

@TopBanana: not with varchars, that's what diffentiates them to normal char columns. – Allain Lalonde May 21 '09 at 19:09
How do you know the engine has to guess? It could simply reorder the columns on the disk storage. It could store the varchars as pointers to the end of the record where the variable data is stored. The whole row could be stuffed in a text file in a CSV format. You have no idea how a "generic" DB is going to store this, or what optimizations it may have to solve this exact problem. If you want to know for a specific database, contact the vendor. But it's a silly "guideline" or rule of thumb. – Will Hartung May 21 '09 at 19:10
1

I don't think the order of the columns IN THE TABLE makes any difference - it definitely makes a difference in the INDEXES you might create, true. – marc_s May 21 '09 at 19:11
@Allain: no, varchars and chars both reserve the space. (Although I don't know about Postgresql!) – James L May 21 '09 at 19:13
4

@TopBanana: not sure if you know Oracle or not, but it doesn't reserve 100 bytes for a VARCHAR2(100) – Quassnoi May 21 '09 at 19:15
@Andomar: I wouldn't call this impact "considerable", but there cetrainly is some impact on finding where the field starts. +1. – Quassnoi May 21 '09 at 19:19
@Quassnoi: Thanks! The "considerable" is something experience has taught me, I've seen column order can change the speed of a query by 2-3x. I might not be giving the best or right reason for it. – Andomar May 21 '09 at 19:22
@Andomar: if it's Oracle, it must be putting NULL columns to the end of the table. – Quassnoi May 21 '09 at 19:23
1

@Quassnoi: the biggest impact was on Sql Server, on a table with many nullable varchar() columns. – Andomar May 21 '09 at 19:25
@Andomar: this must be row chaining then – Quassnoi May 21 '09 at 19:26
@Andomar: the point on CLUSTERED INDEXES in not so good :) Physical table layout will not depend on the logical position of the PRIMARY KEY: it will always be stored first, despite the logical position of a column (of columns) in the table definition. – Quassnoi May 21 '09 at 19:37
@Quassnoi: Makes sense, I'll remove it. – Andomar May 21 '09 at 19:40
1

@TopBanana - I actually made the test with SQL Server, and Varchar's DON'T reserve the space (although I thought they would). I created 2 identical DBs, one table each, one with CHARs, one with VARCHARs, and I filled them up with identical random data (or random length). The VARCHAR one took about half the space in the hard drive. – Daniel Magliola May 21 '09 at 21:07
1

Do you have a reference for the edit at the end of your answer? How about scripts for SQL Server 2000 or MySQL the demonstrate the performance implications of column order? This is a fascinating answer. – Nick Chammas Nov 05 '11 at 01:12
I'm going to -1 this. "On disk" order for SQL Server is irrelevant to the order defined in the table. – gbn Jan 24 '12 at 11:17
@gbn: But it is relevant, as this thread demonstrates. In SQL server 2008, create a large table with 4 3900 byte columns. A query over the first two will be much slower than a query over the first and third. – Andomar Jan 24 '12 at 13:05
@Andomar: that is row overflow, not column order. And won't be seen in real life except for, say, OLAP type loads with wide tables – gbn Jan 24 '12 at 13:09
@gbn: Row overflow means column order does matter. Like the answer says, column order matters less in SQL2005+ than it does in MySQL, but it still matters a little bit. – Andomar Jan 24 '12 at 13:11
7

The URL in this answer no longer works, does anyone have an alternate? – scunliffe Jul 27 '13 at 21:53
1

i try this on Mysql Innodb . What i change put nullable column in last and time difference for same data. Before 0.18( ms) After 0.00(ms) – Navrattan Yadav Mar 28 '16 at 08:04

score 44 · Answer 2 · edited May 23 '17 at 11:55

Update:

In MySQL, there may be a reason to do this.

Since variable datatypes (like VARCHAR) are stored with variable lengths in InnoDB, the database engine should traverse all previous columns in each row to find out the offset of the given one.

The impact may be as big as 17% for 20 columns.

See this entry in my blog for more detail:

Choosing column order

In Oracle, trailing NULL columns consume no space, that's why you should always put them to the end of the table.

Also in Oracle and in SQL Server, in case of a large row, a ROW CHAINING may occur.

ROW CHANING is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.

Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra I/O operation.

See this page for illustration of ROW CHAINING in Oracle:

That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be NULL, to the end of the table.

Important note:

If you like this answer and want to vote for it, please also vote for @Andomar's answer.

He answered the same thing, but seems to be downvoted for no reason.

So you're saying this would be slow : select tinyTable.id, tblBIG.firstColumn, tblBIG.lastColumn from tinyTable inner join tblBIG on tinyTable.id = tblBIG.fkID If tblBIG records are over 8KB (in which case some row chaining would occur) and the join would be synchronous... But this would be fast: select tinyTable.id, tblBIG.firstColumn from tinyTable inner join tblBIG on tinyTable.id = tblBIG.fkID Since I wouldn't use the column in other blocks hence no need to traverse the linked list Did I get this right? — jfrobishow, May 21 '09 at 20:15
I get only 6%, and that is for col1 versus _any_ other column. — Rick James, Jun 09 '15 at 18:40

score 7 · Answer 3 · answered May 21 '09 at 19:04

7

During Oracle training at a previous job, our DBA suggested that putting all the non-nullable columns before the nullable ones was advantageous... although TBH I don't remember the details of why. Or maybe it was just the ones that were likely to get updated should go at the end? (Maybe puts off having to move the row if it expands)

In general, it shouldn't make any difference. As you say, queries should always specify columns themselves rather than relying on the ordering from "select *". I don't know of any DB that allows them to be changed... well, I didn't know MySQL allowed it until you mentioned it.

answered May 21 '09 at 19:04

araqnid

127,052
24
157
134

5

He was right, Oracle does not write trailing NULL columns to disk, saving some bytes. See http://www.dba-oracle.com/oracle_tips_ault_nulls_values.htm – Andomar May 21 '09 at 19:11
absolutely, it can make a big difference in the size on disk – Alex May 21 '09 at 19:20
Is that the link you meant? It's related to the non-indexing of null in indices rather than column order. – araqnid May 21 '09 at 19:29
Wrong link, and can't find the original. Though you can google for it, for example http://www.tlingua.com/new/articles/Chapter2.html – Andomar May 21 '09 at 19:43

score 5 · Answer 4 · answered May 21 '09 at 19:02

5

Readability of the output when you have to type:

select * from <table>

in your database management software?

It's a very spurious reason, but at the moment I can't think of anything else.

answered May 21 '09 at 19:02

ChrisF

134,786
31
255
325

score 5 · Answer 5 · answered May 21 '09 at 19:03

5

Some badly-written applications might be dependent on column order / index instead of column name. They shouldn't be, but it does happen. Changing the order of the columns would break such applications.

answered May 21 '09 at 19:03

Craig Walker

49,871
54
152
212

4

Application developers that make their code dependent on column order in a table DESERVE to have their applications broken. But the users of the application don't deserve the outage. – spencer7593 May 23 '09 at 22:22

HoldOffHunger · Answer 6 · 2020-10-16T15:39:16.170

In 2002, Bill Thorsteinson posted on the Hewlett Packard forums his suggestions for optimizing MySQL queries by reordering the columns. His post has since been literally copied and pasted at least a hundred times on the Internet, often without citation. To quote him exactly...

General rules of thumb:

Primary key columns first.

Foreign key columns next.

Frequently-searched columns next.

Frequently-updated columns later.

Nullable columns last.

Least-used nullable columns after more-frequently used nullable columns.

Blobs in own table with few other columns.

Source: HP Forums.

But that post was made all the back in 2002! This advice was for MySQL version 3.23, more than six years before MySQL 5.1 would be released. And there are no references or citations. So, was Bill right? And how exactly does the storage engine work at this level?

Yes, Bill was right.
It all comes down to a matter of chained rows and memory blocks.

To quote Martin Zahn, an Oracle-certified professional, in an article on The Secrets of Oracle Row Chaining and Migration...

Chained rows affect us differently. Here, it depends on the data we need. If we had a row with two columns that was spread over two blocks, the query:

SELECT column1 FROM table

where column1 is in Block 1, would not cause any «table fetch continued row». It would not actually have to get column2, it would not follow the chained row all of the way out. On the other hand, if we ask for:

SELECT column2 FROM table

and column2 is in Block 2 due to row chaining, then you would in fact see a «table fetch continued row»

The rest of the article is a rather good read! But I am only quoting the part here that is directly relevant to our question at hand.

More than 18 years later, I gotta say it: thanks, Bill!

score 3 · Answer 7 · answered Dec 08 '16 at 17:56

Beyond the obvious performance tuning, I just ran into a corner case where reordering columns caused a (previously functional) sql script to fail.

From the documentation "TIMESTAMP and DATETIME columns have no automatic properties unless they are specified explicitly, with this exception: By default, the first TIMESTAMP column has both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP if neither is specified explicitly" https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html

So, a command ALTER TABLE table_name MODIFY field_name timestamp(6) NOT NULL; will work if that field is the first timestamp (or datetime) in a table, but not otherwise.

Obviously, you can correct that alter command to include a default value, but the fact that a query that worked stopped working due to a column reordering made my head hurt.

marc_s · Answer 8 · 2009-05-21T20:41:50.227

3

No, the order of the columns in a SQL database table is totally irrelevant - except for display / printing purposes. There's no point in reordering columns - most systems don't even provide a way to do that (except dropping the old table and recreating it with the new column order).

Marc

EDIT: from the Wikipedia entry on relational database, here's the relevant portion which to me clearly shows that column order should never be of concern:

A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition. Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations. This insight is still being used today.

edited May 21 '09 at 20:41

answered May 21 '09 at 19:01

marc_s

732,580
175
1,330
1,459

1

I've seen column difference have a big impact with my own eyes, so I can't believe this is the right answer. Even though the voting puts it first. Hrm. – Andomar May 21 '09 at 19:17
What SQL environment would that be in? – marc_s May 21 '09 at 19:46
1

The biggest impact I've seen was on Sql Server 2000, where moving a foreign key forward sped up some queries by 2 to 3 times. Those queries had big table scans (1M+ rows) with a condition on the foreign key. – Andomar May 21 '09 at 19:50
The whole concept of SQL and RDBMS revolves around "tuples", which are - by definition - unordered bags of elements. So I would almost say if a particular RDBMS *depends* on a given order of columns, it's not a *REAL* RDBMS. – marc_s May 21 '09 at 19:51
6

RDBMS don't depend on table ordering *unless you care for performance*. Different implementations will have different performance penalties for the order of the columns. It could be huge or it could be tiny, it depends on the implementation. Tuples are theoretical, RDBMS are practical. – Esteban Küber May 21 '09 at 20:51
I agree - tuples are a theoretical concept. And I agree with other folks in this thread that having a good consistent layout makes sense. But I still think neither performance nor functionality of any decent RDBMS should really depends on the order of your columns. At least not in any serious RDBMS in the 21st century. – marc_s May 21 '09 at 21:04
@marc_s: Sure, there are databases that perform equally poor despite the column order :) – Quassnoi May 21 '09 at 21:14
4

-1. All relational databases I've used DO have column ordering at some level. If you select * from a table, you don't tend to get columns back in random order. Now on-disk vs display is a different debate. And citing math theory to back up an assumption about practical implementations of databases is just nonsense. – DougW Sep 12 '13 at 19:54

score 2 · Answer 9 · answered May 21 '09 at 19:05

2

As is often the case, the biggest factor is the next guy who has to work on the system. I try to have the primary key columns first, the foreign key columns second, and then the rest of the columns in descending order of importance / significance to the system.

answered May 21 '09 at 19:05

James L

16,456
10
53
70

We typically start out with the last column being "created" (timestamp for when the row is inserted). With older tables, of course, it can have several columns added after that... And we have the occasional table where a compound primary key was changed to a surrogate key so the primary key is several columns over. – araqnid May 21 '09 at 19:10

score 2 · Answer 10 · answered May 21 '09 at 19:05

The only reason I can think about is for debugging and fire-fighting. We have a table whose "name" column's appears about 10th on the list. It's a pain when you do a quick select * from table where id in (1,2,3) and then you have to scroll across to look at the names.

But that's about it.

score 1 · Answer 11 · answered May 21 '09 at 19:05

1

If you're going to be using UNION a lot, it makes matching columns easier if you have a convention about their ordering.

answered May 21 '09 at 19:05

Allain Lalonde

91,574
70
187
238

Sounds like your database needs normalizing! :) – James L May 21 '09 at 19:09
Hey! Take it back, I didn't say my database. :) – Allain Lalonde May 21 '09 at 19:11
There are licit reasons to use UNION ;) See http://www.postgresql.org/docs/current/static/ddl-partitioning.html and http://stackoverflow.com/questions/863867/database-speed-optimization-few-tables-with-many-rows-or-many-tables-with-few-r – Esteban Küber May 21 '09 at 21:00
can you UNION with the order of columns in 2 tables being in different order? – Monica Heddneck Jul 01 '16 at 00:44
Yes, you just need to specify the columns explicitly when querying the tables. With tables A[a,b] B[b,a], that means (SELECT a.a, a.b FROM A) UNION (SELECT b.a, b.b FROM B) insead of (SELECT * FROM A) UNION (SELECT * FROM B). – Allain Lalonde Jul 02 '16 at 02:04

score 1 · Answer 12 · answered May 21 '09 at 22:37

As noted, there are numerous potential performance issues. I once worked on a database where putting very large columns at the end improved performance if you didn't reference those columns in your query. Apparently if a record spanned multiple disk blocks, the database engine could stop reading blocks once it got all the columns it needed.

Of course any performance implications are highly dependent not just on the manufacturer that you're using, but also potentially on the version. A few months ago I noticed that our Postgres could not use an index for a "like" comparison. That is, if you wrote "somecolumn like 'M%'", it wasn't smart enough to skip to the M's and quit when it found the first N. I was planning to change a bunch of queries to use "between". Then we got a new version of Postgres and it handled the like's intelligently. Glad I never got around to changing the queries. Obviously not directly relevant here but my point is that anything you do for efficiency considerations could be obsolete with the next version.

Column order is almost always very relevant to me because I routinely write generic code that reads the database schema to create screens. Like, my "edit a record" screens are almost always built by reading the schema to get the list of fields, and then displaying them in order. If I changed the order of columns, my program would still work, but the display might be strange to the user. Like, you expect to see name / address / city / state / zip, not city / address / zip / name / state. Sure, I could put the display order of the columns in code or a control file or something, but then every time we added or removed a column we'd have to remember to go update the control file. I like to say things once. Also, when the edit screen is built purely from the schema, adding a new table can mean writing zero lines of code to create an edit screen for it, which is way cool. (Well, okay, in practice usually I have to add an entry to the menu to call the generic edit program, and I've generally given up on generic "select a record to update" because there are too many exceptions to make it practical.)

score 0 · Answer 13 · answered May 21 '09 at 19:04

The only time you'll need to worry about column order is if your software specifically relies on that order. Typically this is due to the fact that the developer got lazy and did a select * and then referred to the columns by index rather than by name in their result.

score 0 · Answer 14 · answered May 21 '09 at 19:08

In general what happens in SQL Server when you change column order through Management Studio, is that it creates a temp table with the new structure, moves the data to that structure from the old table, drops the old table and renames the new one. As you might imagine, this is a very poor choice for performance if you have a large table. I don't know if My SQL does the same, but it is one reason why many of us avoid reordering columns. Since select * should never be used in a production system, adding columns at the end is not aproblem for a well-designed system. Order of columns inthe table should in genral not be messed with.

Is there any reason to worry about the column order in a table?

14 Answers14

Linked

Related