17

I need to store a very big amount of text in mysql database. It will be millions of records with field type LONGTEXT and database size will be huge.

So, I want ask, if there is a safe way to compress text before storing it into TEXT field to save space, with ability to extract it back if needed?

Something like:

$archived_text = compress_text($huge_text);
// saving $archived_text to database here
// ...

// ...
// getting compressed text from database
$archived_text = get_text_from_db();
$huge_text = uncompress_text($archived_text);

Is there a way to do this with php or mysql? All the texts are utf-8 encoded.

UPDATE

My application is a large literature website where users can add their texts. Here is the table I have:

CREATE TABLE `book_parts` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `book_id` int(11) NOT NULL,
  `title` varchar(200) DEFAULT NULL,
  `content` longtext,
  `order_num` int(11) DEFAULT NULL,
  `views` int(10) unsigned DEFAULT '0',
  `add_date` datetime DEFAULT NULL,
  `is_public` tinyint(3) unsigned NOT NULL DEFAULT '1',
  `published_as_draft` tinyint(3) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `key_order_num` (`order_num`),
  KEY `add_date` (`add_date`),
  KEY `key_book_id` (`book_id`,`is_public`,`order_num`),
  CONSTRAINT FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 

Currently it has about 800k records and weights 4 GB, 99% of queries are SELECT. I have all reasons to think that numbers increase diagrammatically. I wouldn't like to store texts in the files because there is quite heavy logic around and my website has quite a few hits.

Silver Light
  • 44,202
  • 36
  • 123
  • 164
  • If you're storing binary data, use a BLOB field, not text. – Brad Nov 22 '11 at 15:06
  • 1
    You may want to reconsider your veto on file-based storage. I'm not sure what you mean about 'heavy logic,' but I don't see how storing the texts in the database will automatically be better than files for a site with lots of traffic. – grossvogel Nov 22 '11 at 15:42

6 Answers6

19

Are you going to index these texts. How big is read load on this texts? Insert load?

You can use InnoDB data compression - transparent and modern way. See docs for more info.

If you have realy huge texts (say, each text is above 10MB), than good idea is not to store them in Mysql. Store compressed by gzip texts in file system and only pointers and meta in mysql. You can easily expand your storage in future and move it to e.g. DFS.

Update: another plus of storing texts outside Mysql: DB stays small and fast. Minus: high probability of data inconsistence.

Update 2: if you have much programming resourses, please, take a look on projects like this one: http://code.google.com/p/mysql-filesystem-engine/.

Final Update: according to your info, you can just use InnoDB compression - it is the same as ZIP. You can start with these params:

CREATE TABLE book_parts
 (...) 
 ENGINE=InnoDB
 ROW_FORMAT=COMPRESSED 
 KEY_BLOCK_SIZE=8;

Later you will need to play with KEY_BLOCK_SIZE. See SHOW STATUS LIKE 'COMPRESS_OPS_OK' and SHOW STATUS LIKE 'COMPRESS_OPS'. Ratio of these two params must be close to 1.0: Docs.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
Oroboros102
  • 2,214
  • 1
  • 27
  • 41
  • If you're using InnoDB, that's the way to go. Using an engine that doesn't do compression, however... Also, I see a ton of reasons not to "only store pointers to files" in the database (and a ton of reasons to do it - TIMTOWTDI, and it really depends what you need) – Romain Nov 22 '11 at 15:12
  • @Oroboros102 please take a look at update. Did I understand correctly, that innodb compression compresses indexes only? Than this is not my case... – Silver Light Nov 22 '11 at 15:29
  • No, compression is used both for data and all indexes (PK, secondary, compound). – Oroboros102 Nov 22 '11 at 15:56
  • Thank you for the answer. I'll go this way, since this requires minimum changes in my application. – Silver Light Nov 22 '11 at 16:22
  • Please, don't forget, that compression requires a lot of requirements to be met. Begining from mysql version to special way of storing tables. You'll need "file per table", as I rememeber. All this you can find in docs: http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-enabling.html – Oroboros102 Nov 22 '11 at 19:59
10

If you're compressing (eg. gzip), then don't use TEXT fields of any sort. They're not binary-safe. Data going into/coming out of text fields is subject to character set translation, which probably (though not necessarily) mangle the compressed data and give you a corrupted result when you retrieve/uncompress the text.

Use BLOB fields instead, which are binary-transparent and do not to any translation of the data.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Thank you for the info about datatype. But what about compressing? – Silver Light Nov 22 '11 at 15:08
  • Compressing would remove the ability to search the text, since you'd have to uncompress to get the raw test out again. If you're never going to use the DB to rummage around the text, then don't store the compressed (or raw) text in the DB in the first place. Store it externally in a file and store a few references (filename/path) in the DB instead. – Marc B Nov 22 '11 at 15:11
  • gzcompress works very well with this approach but please consider: http://www.mysqlperformanceblog.com/2008/01/11/mysql-blob-compression-performance-benefits/ – Narcis Radu Nov 22 '11 at 15:16
6

It might be better to define the text field as blob, and compress the data in PHP to save costs in communication.

CREATE TABLE book_parts (
    ......
    content blob default NULL,
    ......
)

In PHP, use gzcompress and gzuncompress.

$content = '......';
$query = sprintf("replace into book_parts(content) values('%s') ",
        mysql_escape_string(gzcompress($content)) );
mysql_query($query); 


$query = "select * from book_parts where id = 111 ";
$result = mysql_query($query);
if ($result && $row = mysql_fetch_assoc($result))
    $content = gzuncompress($row['content']);
user1931858
  • 10,518
  • 1
  • 17
  • 6
2

You may also want to use a COMPRESS option to enable compression of packets. Read some information about this option:

For PHP I have found this - MYSQLI_CLIENT_COMPRESS for mysqli_real_connect function.

Devart
  • 119,203
  • 23
  • 166
  • 186
1

You could use php functions gzdeflate and gzinflate for text.

user447951
  • 16,212
  • 2
  • 16
  • 10
0

There are no benefits in compressing large texts into a database.

Here are the problems you might face in the long run:

  • If the server crashes the data may be hard to recover.
  • Not ideal for search.
  • It takes additional time to transfer the data between the mysql server and the browser.
  • Time consuming for backup (not using replication).

I think storing these large texts into a disk file will be easier for:

  • Distributed backup (rsync).
  • PHP to handle file upload.
Rosenthal
  • 149
  • 1
  • 2
  • 11
ajreal
  • 46,720
  • 11
  • 89
  • 119
  • 2
    I disagree. The time it takes to decompress a gzip stream is irrelevant on any normal server, compared with online latency you can completely ignore it. You do not need to search every text field, often you just need to access it. – John Oct 12 '16 at 16:10