5

Hope you are all doing great. We have a huge mysql table called 'posts'. It has about 70,000 records and has gone up to about 10GB is size.

My boss says that something has to be done to make it easy for us to handle this huge table because what if that table gets corrupted then it would take us a lot of time to recover the table. Also at times its slow.

What the are possible solutions so that handling this table becomes easier for as in all aspects.

The structure of the table is as follows:

CREATE TABLE IF NOT EXISTS `posts` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `thread_id` int(11) unsigned NOT NULL,
  `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `first_post` mediumtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `publish` tinyint(1) NOT NULL,
  `deleted` tinyint(1) NOT NULL,
  `movedToWordPress` tinyint(1) NOT NULL,
  `image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `video_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `video_image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `section_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `urlToPost` varchar(280) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  `views` int(11) DEFAULT NULL,
  `forum_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `subject` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `visited` int(11) DEFAULT '0',
  `replicated` tinyint(4) DEFAULT '0',
  `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `urlToPost` (`urlToPost`,`forum_name`),
  KEY `thread_id` (`thread_id`),
  KEY `publish` (`publish`),
  KEY `createdOn` (`createdOn`),
  KEY `movedToWordPress` (`movedToWordPress`),
  KEY `deleted` (`deleted`),
  KEY `forum_name` (`forum_name`),
  KEY `subject` (`subject`),
  FULLTEXT KEY `first_post` (`first_post`,`thread_title`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=78773 ;

Thanking You.

UPDATED

Note: although I am great-full for the replies but almost all answers have been about optimizing the current database and not about how to generally handle large tables. Although I can optimize the database based on the replies I got, it really does not answer the question about handling huge databases. Right now I am talking about 70,000 records but during the next few months if not weeks we are going to grow a magnitude. Each record can be about 300kb in size.

Imran Omar Bukhsh
  • 7,849
  • 12
  • 59
  • 81
  • Why you are using MyISAM in the first place? If the only reason is the fulltext index, consider migrating to InnoDB and handling full-text either outside mysql (lucene/solr, sphinx) or by yourself. – Maxim Krizhanovsky Jun 22 '11 at 12:23
  • @Darhazer - thanx for your comment though what would be the advantages of moving to InnoDB. Am not really aware of of that. What the the disadvantages of MyISAM? Why should I handle fulltext outside mysql? – Imran Omar Bukhsh Jun 23 '11 at 06:59
  • @Darhazer - do look at the updated section below in my question – Imran Omar Bukhsh Jun 23 '11 at 09:08
  • @Imran: 70K rows is not a huge table. Some would say it's tiny. Having 300KB in one row smells bad db design. I'd be interested to see the design of the other tables as well. – ypercubeᵀᴹ Jun 23 '11 at 09:17
  • @ypercube : we have a crawling engine. we store whole html pages in the database. thats the field which takes the most space. – Imran Omar Bukhsh Jun 23 '11 at 09:20
  • @Imran: I can see you have one longtext field, one mediumtext (`first_post`) and 2 text ones that take much space. This is one issue. Second is, are `first_post`, `thread_title` and `section_title` unique? If not, then the table is not normalized. Same for `forum_name`. Third issue is the tinyint fields. And fourth the full text search. – ypercubeᵀᴹ Jun 23 '11 at 09:30
  • 2
    MyISAM uses table-level locks, while InnoDB uses row-level. If there are a lot of inserts/updates, this is a huge impact on the performance. Also, MyISAM is going to crash exactly in write-intensive applications (it's optimised for reading). We are handling millions of records, dozens of GB tables in MySQL with InnoDB. InnoDB has a number of features like transactions, that are not supported by MyISAM. The disadvantage is that it does not support full text indexes. Moving search outside the DB however frees the DB server from this operation and it's a general performance tip. – Maxim Krizhanovsky Jun 23 '11 at 09:41

3 Answers3

6

My answer's also an addition to two previous comments.

You've indexed half of your table. But if you take a look at some indexes (publish, deleted, movedToWordPress) you'll notice they are 1 or 0, so their selectivity is low (number of rows divided by number of distinct values of that column). Those indexes are a waste of space.

Some things also make no sense. tinyint(4) - that doesn't actually make it a 4 digit integer. Number there is display length. tinyint is 1 byte, so it's got 256 possible values. I'm assuming something went wrong there.

Also, 10 gigs in size for just 75k records? How did you measure the size? Also, what's the hardware you got?

Edit in regards to your updated question:

There are many ways to scale databases. I'll link one SO question/answer so you can get the idea what you can do: here it is. The other thing you might do is get better hardware. Usually, the reason why databases are slow when they increase in size is the HDD subsystem and available memory left to work with the dataset. The more RAM you have - the faster it all gets.

Another thing you could do is split your table into two in such a way that one table holds the textual data and the other holds the data relevant to what your system requires to perform certain searching or matching (you'd put integer fields there). Using InnoDB, you'd gain huge performance boost if the two tables were connected via some sort of a foreign key pointing to primary key. Since InnoDB is such that primary key lookups are fast - you are opening several new possibilities to what you can do with your dataset. In case your data gets increasingly huge, you can get enough RAM and InnoDB will try to buffer the dataset in RAM. There's an interesting thing called HandlerSocket that does some neat magic with servers that have enough RAM and are using InnoDB.

In the end it really boils down to what you need to do and how you are doing it. Since you didn't mention that, it's hard to give an estimate here of what you should do. My first step to optimizing would definitely be to tweak MySQL instance and to back that big table up.

Community
  • 1
  • 1
Michael J.V.
  • 5,499
  • 1
  • 20
  • 16
  • The database files are available on lamp. I can directly access them. Thats how I checked the file size. I am using just a regular HP Pavilion dm4, i7 2.33mghz running UBuntu, 4GB RAM – Imran Omar Bukhsh Jun 23 '11 at 07:02
  • Did you tweak MySQL settings? – Michael J.V. Jun 23 '11 at 08:41
  • @Michael J.V. - do look at the updated section below in my question – Imran Omar Bukhsh Jun 23 '11 at 09:08
  • @Michael: +1. I would consider splitting the table into more than 2 tables - what you advise plus one table in MyISAM to not lose the full text search. Or even splitting every bit field (wrongly implemented as tinyint) in a separate table to avoid any NULLs in those. And all these definitely work better with InnoDB engine. – ypercubeᵀᴹ Jun 23 '11 at 10:00
2

I guess you have to change some columns.

You can start by reduce your var char variables.

image_src/video_src/video_image_src VARCHAR(500) is a little too much i think. (100 varchars is enough i would say)

thread_title is text but should be a VARCHAR(200?) if you say me same with section_title

Ok here is your problem content longtext

Do you really need longtext here? longtext is up to 4GB of space. I think if you change this column to text it would be a lot smaller

    TINYTEXT    256 bytes    
    TEXT    65,535 bytes    ~64kb
    MEDIUMTEXT   16,777,215 bytes   ~16MB
    LONGTEXT    4,294,967,295 bytes ~4GB

Edit: i see you use a fulltext index. I am quite sure that is saving a lot a lot a lot of data. You should use another mechanism for searching full text.

Michael Koper
  • 9,586
  • 7
  • 45
  • 59
  • In addition, some of his int(11) could be reduced, such as Views, Posts, Visited... even here on S/O, how many posts have you seen that have more than 99,999 views let alone posts against a question... – DRapp Jun 22 '11 at 10:39
  • varchars does not take all of the 500 bytes of memory, just the stored content + some extra bytes. I believe same goes to text fields (although there are some memory issues with using larger field type, for example in prepared statements). MyISAM however is faster with fixed-width columns. As for redusing int to lets say short ints - I agree – Maxim Krizhanovsky Jun 22 '11 at 12:29
  • @Darhazer. You are right it doesnt take extra HD space. But it does take extra RAM memory. So always better to do it as little as possible. source: http://dev.mysql.com/doc/refman/5.0/en/char.html – Michael Koper Jun 22 '11 at 16:25
0

In addition to what Michael has commented, the slowness can be an issue based on how well the queries are optimized, and proper indexes to match. I would try to find some of the culprit queries that are taking longer time than you hope and post here at S/O to see if someone can help in optimizing options.

DRapp
  • 47,638
  • 12
  • 72
  • 142