Are You Sure You Want to Use MMAP in Your Database Management System? https://db.cs.cmu.edu/mmap-cidr2022/
What's MMAP (💩)?
MMAP is a system call that maps a file into memory. It is a common technique for database systems to access data.
MMAP OverView
Figure 1 shows a step-by-step overview of how to access a file (“cidr.db”) with mmap.
- A program calls mmap and receives a pointer to the memory-mapped file contents.
- The OS reserves part of the program’s virtual address space but does not load any part of the file.
- The program accesses the file’s contents using the pointer.
- The OS attempts to retrieve the page.
- Since no valid mapping exists for the specified virtual address, the OS triggers a page fault to load the referenced part of the file from secondary storage into a physical memory page.
- The OS adds an entry to the page table that maps the virtual address to the new physical address.
- The initiating CPU core also caches this entry in its local translation lookaside buffer (TLB) to accelerate future accesses.
MMAP Benifits
- 方便实现内存管理
- 不会显示调用系统调用(i.e., read/write)
- 避免了在用户空间copy buffer
Problems with MMAP
- 事务安全
- I/O 延迟(an I/O request that does complete, or that takes excessive time to complete)
- 错误处理
- Performance
可以注意到mmap虽然有上面的那些优点,但是在性能上并没有和文件I/O有明显的优势
When you should not use mmap in your DBMS:
- You need to perform updates in a transactionally safe fashion.
- You want to handle page faults without blocking on slow I/O or need explicit control over what data is in memory.
- You care about error handling and need to return correct results.
- You require high throughput on fast persistent storage devices.
When you should maybe use mmap in your DBMS:
- Your working set (or the entire database) fits in memory and the workload is read-only.
- You need to rush a product to the market and do not care about data consistency or long-term engineering headaches.
- Otherwise, never.
Thoughts
对于我目前所做的时序数据库(GreptimeDB
)方面的一些工作来说,虽然时序数据库没有事务安全的要求,但是时序数据库的数据量一般都是很大的,一般都是写多读少,用的是append-only的,而且数据的存储也可能在Amazon S3,Azure Blob Storage,GCS这些地方,总的来说,感觉mmap不太适合时序数据库场景的。(By the way, InfluxDB 2020年开始不用mmap了