leveldb笔记之16:seek_compaction && size_compaction

#leveldb

major compaction 一个首要问题,就是要 compact 哪个文件。这个计算是在version完成的,与之相关的主要是两个属性allowed_seeks compaction_score

1. seek_compaction

一个直观的想法是如果文件多次 seek 但是没有查找到数据,那么就应该被 compact 了,否则会浪费更多的 seek。用一次 compact 来解决长久空 seek 的问题,本质上,还是如何平衡读写的思想。

具体的,当一个新文件更新进入版本管理,计算该文件允许 seek 但是没有查到数据的最大次数,当超过该次数后,就应该 compact 该文件了。

对应到代码实现,是在VersionSet::Builder::Apply阶段:https://izualzhy.cn/leveldb-version#331-apply

主要用到的变量是allowed_seeks

      // We arrange to automatically compact this file after
      // a certain number of seeks.  Let's assume:
      //   (1) One seek costs 10ms
      //   (2) Writing or reading 1MB costs 10ms (100MB/s)
      //   (3) A compaction of 1MB does 25MB of IO:
      //         1MB read from this level
      //         10-12MB read from next level (boundaries may be misaligned)
      //         10-12MB written to next level
      // This implies that 25 seeks cost the same as the compaction
      // of 1MB of data.  I.e., one seek costs approximately the
      // same as the compaction of 40KB of data.  We are a little
      // conservative and allow approximately one seek for every 16KB
      // of data before triggering a compaction.
      //
      // 1. 1次seek花费10ms
      // 2. 1M读写花费10ms
      // 3. 1M文件的compact需要25M IO(读写10-12MB的下一层文件),为什么10-12M?经验值?
      // 因此1M的compact时间 = 25次seek时间 = 250ms
      // 也就是40K的compact时间 = 1次seek时间,保守点取16KB,即t = 16K的compact时间 = 1次seek时间
      // compact这个文件的时间: file_size / 16K
      // 如果文件seek很多次但是没有找到key,时间和已经比compact时间要大,就应该compact了
      // 这个次数记录到f->allowed_seeks
      f->allowed_seeks = (f->file_size / 16384);//16KB
      if (f->allowed_seeks < 100) f->allowed_seeks = 100;

当查找文件而没有查找到时,allowed_seeks--,降为0时该文件标记到file_to_compact_

bool Version::UpdateStats(const GetStats& stats) {
  FileMetaData* f = stats.seek_file;
  if (f != nullptr) {
    f->allowed_seeks--;
    //如果allowed_seeks降到0,则记录该文件需要compact了
    if (f->allowed_seeks <= 0 && file_to_compact_ == nullptr) {
      file_to_compact_ = f;
      file_to_compact_level_ = stats.seek_file_level;
      return true;
    }
  }
  return false;
}

2. size_compaction

compaction 另外一个直观的想法就是,当某一层文件变得很大,往往意味着冗余数据过多,应该 compact 以避免占用磁盘以及读取过慢的问题。

level 越大,我们可以认为数据越“冷”,读取的几率越小,因此大的 level,能“容忍”的程度就越高,给的文件大小阈值越大。

具体的,当产生新版本时,遍历所有的层,比较该层文件总大小与基准大小,得到一个最应当 compact 的层。

这个步骤,在VersionSet::Finalize完成。

//计算compact的level和score,更新到compaction_level_&&compaction_score_
void VersionSet::Finalize(Version* v) {
  // Precomputed best level for next compaction
  int best_level = -1;
  double best_score = -1;

  //level 0看文件个数,降低seek的次数,提高读性能,个数/4
  //level >0看文件大小,减少磁盘占用,大小/(10M**level)
  //例如:
  //level 0 有4个文件,score = 1.0
  //level 1 文件大小为9M,score = 0.9
  //那么compact的level就是0,score = 1.0
  for (int level = 0; level < config::kNumLevels-1; level++) {
    double score;
    if (level == 0) {
      // We treat level-0 specially by bounding the number of files
      // instead of number of bytes for two reasons:
      //
      // (1) With larger write-buffer sizes, it is nice not to do too
      // many level-0 compactions.
      //
      // (2) The files in level-0 are merged on every read and
      // therefore we wish to avoid too many files when the individual
      // file size is small (perhaps because of a small write-buffer
      // setting, or very high compression ratios, or lots of
      // overwrites/deletions).
      score = v->files_[level].size() /
          static_cast<double>(config::kL0_CompactionTrigger);
    } else {
      // Compute the ratio of current size to size limit.
      const uint64_t level_bytes = TotalFileSize(v->files_[level]);
      score =
          static_cast<double>(level_bytes) / MaxBytesForLevel(options_, level);
    }

    if (score > best_score) {
      best_level = level;
      best_score = score;
    }
  }

  v->compaction_level_ = best_level;
  v->compaction_score_ = best_score;
}

//10**level,即level 1 = 10M, level 2 = 100M, ...                                                                  static double MaxBytesForLevel(const Options* options, int level) {
  // Note: the result for level zero is not really used since we set
  // the level-0 compaction threshold based on number of files.

  // Result for both level-0 and level-1
  double result = 10. * 1048576.0;//10M
  while (level > 1) {
    result *= 10;
    level--;
  }
  return result;
}

可以看到 level 0 与其他层不同,看的是文件个数,因为 level 0 的文件是重叠的,每次读取都需要遍历所有文件,所以文件个数更加影响性能。

每层的基准大小为10M << ${level - 1},level = 1 则MaxBytes = 10M, level = 2 则MaxBytes=100M,依次类推.

逐层比较后,得到最大的得分以及对应层数:compaction_score_ compaction_level_

major compact 选择文件时就会用到上述两个条件。

  // We prefer compactions triggered by too much data in a level over
  // the compactions triggered by seeks.
  const bool size_compaction = (current_->compaction_score_ >= 1);//文件数过多
  const bool seek_compaction = (current_->file_to_compact_ != nullptr);//seek了多次文件但是没有查到,记录到的file_to_compact_

因此,版本管理的作用之三,就是在新增或者遍历文件的过程中,为 major compact 筛选文件。