MySQL MyISAM vs InnoDB vs Myrocks

In this blog, mainly introduce 3 different Disk based DB engines of MySQL

  • MyISAM(B+tree)
  • InnoDB(B+tree)
  • MyRocks(LSM tree)

MySQL Architecture

MySQL architecture diagram showing connectors, interfaces, pluggable storage engines, the file system with files and logs.

In MySQL Server side, there are three important layers:

  • SQL Layer: all operations before sending requests to DB engine are handled in this layer, including SQL parser, Query Optimizer etc.
  • Store Engine Layer, Db Engine is plugable in MySQL, user can change different db engines flexibly.
  • Data layer, connect with file system to write&read from disk or other devices.

MyISAM

MyISAM was the MySQL default storage engine prior to version 5.7

Motivations:

How could we retrieve records sequentially or randomly by one or more keys with good performance, rather than scan whole table?

Core solution: B+ Tree

MyISAM from ISAM =  Indexed Sequential Access Method

Two main properties:

  • Sequential query (Insertion order) 
  • Randomly access (Index based)

MyISAM use B+Tree as main index method, so it can support range query by index order.

In MyISAM, there are 2 types of file:

  • Index file (key, pointer -> data file)
  • Data file (rows by insertion order)

In index file, the leaf node will save the pointer of a real record in data file, and rows(records) are saved in data file by insertion order.

Summary

MyISAM is a naive solution to implement index method so that it can retrieve records sequentially or randomly with good performance.

But it doesn’t support more advanced but useful features like transaction, MVCC, Foreign key

InnoDB

InnoDB is a general-purpose storage engine that balances high reliability and high performance. 

In MySQL 5.7, InnoDB is the default MySQL storage engine

Motivation

What is a more general-purpose storage engine?

  • Better performance for both read and write
  • Support transaction(ACID)
  • Support more concurrent requests
  • Stronger Constraints

We can find the features not supported by MyISAM are exactly what optimization implemented by InnoDB

1. Clustered Index

Innodb leverage B+ tree as well, but it also import clustered Index

  • Clustered Index (Primary Index): tuples(rows) sorted by primary key

The most difference between MyISAM and InnoDB’s B+ tree is the leaf node doesn’t save pointer to data file but the actual records. Comparing with MyISAM, one get operation in InnDB will save at least one disk IO because it doesn’t need go to another data file.

All the other indices in InnoDB are called Non-clustered index, and in the leaf node it will save the primary key in clustered index tree.

On the other side, there are no difference between primary index and secondary index in MyISAM

2. MVCC

Multiversion Concurrency Control 

  • Improve concurrency performance but not break transaction isolation
    • Try to reduce lock number
    • Deal with read-write conflict

In InnoDB, MVCC is implemented by two hidden columns + undo log

  • A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted.
  • A 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated.

Every transaction will be assigned increasing and unique transaction id, and when we enable MVCC, the transaction can only read records with equal or smaller transaction id.

Other features:

InnoDB also support transaction and foreign key, which are not involved in this post.

Comparation between MyISAM and InnoDB

InnoDB:

OLTP (RW): Transaction, MVCC, Row-level lock

MyISAM:

OLAP (Read only): Simple index, no transaction, very light. BUT MIGHT BE NOT?

There is an article showing some workload tests, and looks like even in most Read Only cases, InnoDB is also looks better than MyISAM.

http://dimitrik.free.fr/blog/archives/2015/12/mysql-performance-revisiting-innodb-vs-myisam-with-mysql-57.html

MyRocks

MySQL DB Engine based on RocksDB, an LSM tree based embedded DB Created by FB

Motivactions

Facebook’s User Database(UDB) use MySQL as backend with InnoDB engine. And they met 2 main challenges:

  • Write amplification
  • Large space usage (Compression inefficiency)

They need some better solution with write and space optimization

Core Solutions

Implement a Log-structured merge tree based DB engine instead of InnoDB

  • Log-structure tree(LSM) Tree
    • Append-only
    • Immutable
  • Memtable
  • Disk file(Sorted Strings Tables)
  • WAL for durability

Memtable

MemTable is an in-memory data structure holding data before they are flushed to SST files. It serves both read and write.

In RocksDB, there are 4 kinds of memtable:

  • Skiplist(default)
    • Concurrent Insert(CAS, Compare and Set)
    • In-place update(default false)
  • HashSkiplist
  • HashLinkList
  • Vector

Sorted String Tables

In RocksDB, it use Sorted String Table as disk residency component with level compaction method

MyRocks vs InnoDB

References

  1. https://dev.mysql.com/doc/refman/8.0/en/
  2. https://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
  3. https://www.zhihu.com/question/304037770
  4. https://phoenixnap.com/kb/myisam-vs-innodb
  5. https://juejin.cn/post/6871046354018238472
  6. http://dimitrik.free.fr/blog/archives/2015/12/mysql-performance-revisiting-innodb-vs-myisam-with-mysql-57.html
  7. https://github.com/facebook/rocksdb/wiki
  8. https://engineering.fb.com/2016/08/31/core-data/myrocks-a-space-and-write-optimized-mysql-database/
  9. https://scontent-sjc3-1.xx.fbcdn.net/v/t39.8562-6/240834586_4644225018921150_5834039345161000687_n.pdf?_nc_cat=110&ccb=1-5&_nc_sid=ad8a9d&_nc_ohc=mda2IW5xMT8AX_8iO7y&_nc_ht=scontent-sjc3-1.xx&oh=00_AT8-OU-xu9PZ8w7zl3qCBu7KdF6V3PkTKrAYsrCLXUHOoQ&oe=61E656C6