1

I have understanding wih MySQL Cluster.

I have one table:

  • 38 fields total
  • 22 fields are described as 22 indexes (field type: int)
  • Other fields double and bigint values
  • The table doesn't have defined Primary Key

My Environment (10 nodes):

  • data nodes: 8 (AWS EC2 instances, m4.xlarge 16GB RAM, 750GB HDD)
  • management nodes: 2 (AWS EC2 instances, m4.2xlarge 32GB RAM)
  • sql nodes: 2 (the same VM as in management nodes)

MySQL Cluster settings (config.ini) are set to:

[NDBD DEFAULT]
NoOfReplicas=2
ServerPort=2200
Datadir=/storage/data/mysqlcluster/
FileSystemPathDD=/storage/data/mysqlcluster/
BackupDataDir=/storage/data/mysqlcluster//backup/
#FileSystemPathUndoFiles=/storage/data/mysqlcluster/
#FileSystemPathDataFiles=/storage/data/mysqlcluster/
DataMemory=9970M
IndexMemory=1247M
LockPagesInMainMemory=1

MaxNoOfConcurrentOperations=100000
MaxNoOfConcurrentTransactions=16384

StringMemory=25
MaxNoOfTables=4096
MaxNoOfOrderedIndexes=2048
MaxNoOfUniqueHashIndexes=512
MaxNoOfAttributes=24576
MaxNoOfTriggers=14336

### Params for REDO LOG 
FragmentLogFileSize=256M
InitFragmentLogFiles=SPARSE
NoOfFragmentLogFiles=39
RedoBuffer=64M

TransactionBufferMemory=8M

TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100

TimeBetweenEpochsTimeout=0

### Params for LCP 
MinDiskWriteSpeed=10M
MaxDiskWriteSpeed=20M
MaxDiskWriteSpeedOtherNodeRestart=50M
MaxDiskWriteSpeedOwnRestart=200M
TimeBetweenLocalCheckpoints=20

### Heartbeating 
HeartbeatIntervalDbDb=15000
HeartbeatIntervalDbApi=15000

### Params for setting logging 
MemReportFrequency=30
BackupReportFrequency=10
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15

### Params for BACKUP 
BackupMaxWriteSize=1M
BackupDataBufferSize=24M
BackupLogBufferSize=16M
BackupMemory=40M

### Params for ODIRECT 
#Reports indicates that odirect=1 can cause io errors (os err code 5) on some systems. You must test.
#ODirect=1

### Watchdog 
TimeBetweenWatchdogCheckInitial=60000

### TransactionInactiveTimeout  - should be enabled in Production 
TransactionInactiveTimeout=60000
### New 7.1.10 redo logging parameters 
RedoOverCommitCounter=3
RedoOverCommitLimit=20
### REALTIME EXTENSIONS 
#RealTimeScheduler=1
### REALTIME EXTENSIONS FOR 6.3 ONLY
#SchedulerExecutionTimer=80
#SchedulerSpinTimer=40

### DISK DATA 
SharedGlobalMemory=20M
DiskPageBufferMemory=64M
BatchSizePerLocalScan=512

After importing 75M records to my table I get the error (The table 'test_table' is full) and can not import data any more to the table. I don't undersdtand why it is so.

I look at information_scheme and can see that avg_record_size is 244. The full table size is: ~19G

Also if I look at DataMemory used on each data node I see: ~94%. IndexMemory used is: ~22%

But I have 8 data nodes with DataMemory total with *8*9970M = 80GB*

My table is 19GB only. So even I have replicas. The memory used muse be: 19*2=38GB.

Could somebody explain me what is the situation. And how can I configure the Cluster and import max possible records. The full table in production will have: 33 Billion records.

For tests on the given cluster I need to test 100M and 1B data sets.

Thanks.

  • 8 nodes / 2 replicas = 4 nodes' worth of space. 4 nodes * 9970M = 39GB for data. 39GB/244 = 160M rows should fit. Are the numbers and arithmetic correct? So, no I can't explain why 75M rows croaked. But it is clear that 33B rows will not fit! Is the data supposed to be RAM-resident? Or can it spill to disk? If so, you will need more than 2TB on each node. Or a _lot_ more nodes. – Rick James Feb 03 '16 at 23:33
  • Well, maybe... Where did the 244 come from? How much "overhead" is there per row, per block, etc? (I am not familiar with NDB at this level.) For InnoDB, the numbers are about right -- 75M rows could easily take 160MB. – Rick James Feb 03 '16 at 23:35
  • > Are the numbers and arithmetic correct? - it looks ok. But I also don't understand why the 75M records is limit for this claster. It is possible to spill some fields to disk but as I understood NDB indexed fields must be in the memory. I would like to understand some points: 1) is it possible to place replicas on disk only 2) is it possible to hold index fields on disk. 3) Is it possible to use MySQL Cluster for 33Billion records and 20-30 indexes in the table with high query performance? 244 - is Avg_row_length of execution query (SHOW TABLE STATUS FROM test;) – vladimir-user5480135 Feb 04 '16 at 15:36
  • The "index" must be in RAM, not the "indexed fields". I'm pretty sure the index is a separate structure. The "data" contains all fields; it can spill to disk. – Rick James Feb 04 '16 at 16:24
  • 20-30 indexes is usually "mis-design". Look at your `SELECTs` to decide what indexes are useful. Consider "composite" indexes when appropriate. – Rick James Feb 04 '16 at 16:26

0 Answers0