In this blog, mainly introduce 3 different Disk based DB engines of MySQL
- MyRocks(LSM tree)
In MySQL Server side, there are three important layers:
- SQL Layer: all operations before sending requests to DB engine are handled in this layer, including SQL parser, Query Optimizer etc.
- Store Engine Layer, Db Engine is plugable in MySQL, user can change different db engines flexibly.
- Data layer, connect with file system to write&read from disk or other devices.
MyISAM was the MySQL default storage engine prior to version 5.7
How could we retrieve records sequentially or randomly by one or more keys with good performance, rather than scan whole table?
Core solution: B+ Tree
MyISAM from ISAM = Indexed Sequential Access Method
Two main properties:
- Sequential query (Insertion order)
- Randomly access (Index based)
MyISAM use B+Tree as main index method, so it can support range query by index order.
In MyISAM, there are 2 types of file:
- Index file (key, pointer -> data file)
- Data file (rows by insertion order)
In index file, the leaf node will save the pointer of a real record in data file, and rows(records) are saved in data file by insertion order.
MyISAM is a naive solution to implement index method so that it can retrieve records sequentially or randomly with good performance.
But it doesn’t support more advanced but useful features like transaction, MVCC, Foreign key
InnoDB is a general-purpose storage engine that balances high reliability and high performance.
In MySQL 5.7, InnoDB is the default MySQL storage engine
What is a more general-purpose storage engine?
- Better performance for both read and write
- Support transaction(ACID)
- Support more concurrent requests
- Stronger Constraints
We can find the features not supported by MyISAM are exactly what optimization implemented by InnoDB
1. Clustered Index
Innodb leverage B+ tree as well, but it also import clustered Index
- Clustered Index (Primary Index): tuples(rows) sorted by primary key
The most difference between MyISAM and InnoDB’s B+ tree is the leaf node doesn’t save pointer to data file but the actual records. Comparing with MyISAM, one get operation in InnDB will save at least one disk IO because it doesn’t need go to another data file.
All the other indices in InnoDB are called Non-clustered index, and in the leaf node it will save the primary key in clustered index tree.
On the other side, there are no difference between primary index and secondary index in MyISAM
Multiversion Concurrency Control
- Improve concurrency performance but not break transaction isolation
- Try to reduce lock number
- Deal with read-write conflict
In InnoDB, MVCC is implemented by two hidden columns + undo log
- A 6-byte
DB_TRX_IDfield indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted.
- A 7-byte
DB_ROLL_PTRfield called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.
Every transaction will be assigned increasing and unique transaction id, and when we enable MVCC, the transaction can only read records with equal or smaller transaction id.
InnoDB also support transaction and foreign key, which are not involved in this post.
Comparation between MyISAM and InnoDB
OLTP (RW): Transaction, MVCC, Row-level lock
OLAP (Read only): Simple index, no transaction, very light. BUT MIGHT BE NOT?
There is an article showing some workload tests, and looks like even in most Read Only cases, InnoDB is also looks better than MyISAM.
MySQL DB Engine based on RocksDB, an LSM tree based embedded DB Created by FB
Facebook’s User Database(UDB) use MySQL as backend with InnoDB engine. And they met 2 main challenges:
- Write amplification
- Large space usage (Compression inefficiency)
They need some better solution with write and space optimization
Implement a Log-structured merge tree based DB engine instead of InnoDB
- Log-structure tree(LSM) Tree
- Disk file(Sorted Strings Tables)
- WAL for durability
MemTable is an in-memory data structure holding data before they are flushed to SST files. It serves both read and write.
In RocksDB, there are 4 kinds of memtable:
- Concurrent Insert(CAS, Compare and Set)
- In-place update(default false)
Sorted String Tables
In RocksDB, it use Sorted String Table as disk residency component with level compaction method
MyRocks vs InnoDB