MDEV-39092 Copy Aria data and logs as part of backup#4971
MDEV-39092 Copy Aria data and logs as part of backup#4971mariadb-andrzejjarzabek wants to merge 19 commits into
Conversation
|
|
4ef94be to
c143ff2
Compare
efbc62b to
c77278b
Compare
e89625e to
06fd556
Compare
4d6a19c to
d86a6a1
Compare
c6f5119 to
9ebb23e
Compare
8565956 to
04e3bc2
Compare
14ca552 to
c9429eb
Compare
d35cd47 to
824afeb
Compare
23ac784 to
2dfc033
Compare
This introduces a basic driver Sql_cmd_backup, storage engine interfaces, and basic copying of InnoDB data files. On Windows, we pass a target directory name; elsewhere, we pass a target directory handle. backup_target: A structured data type to represent a directory or a stream. On Microsoft Windows, we must use directory paths because there is no variant of CopyFileEx() that would work on file handles. copy_entire_file(): A file copying service for POSIX systems. copy_file(): A sparse file-copying service for all systems. backup_context: An InnoDB backup context, attached to trx->lock.backup so that context can exist between InnoDB_backup::end(), which is releasing all locks, and InnoDB_backup::fini() in the same thread, which is expected to finalize the backup without modifying files in the server data directory. fil_space_t::write_or_backup: Keep track of in-flight page writes and pending backup operation. We must not allow them concurrently, because that could lead into torn pages in the backup. fil_space_t::backup_end: The first page number that is not being backed up (by default 0, to indicate that no backup is in progress). fil_space_t::BACKUP_BATCH_SIZE: The number of preceding pages that will be covered by fil_space_t::backup_end. This is the unit of "page range locking" during InnoDB backup. TRX_STATE_BACKUP: A special InnoDB transaction state indicating association with BACKUP SERVER, which allows us to pass some context in trx_t from innodb_backup_end() to innodb_backup_finalize(). log_t::backup: Whether BACKUP SERVER is in progress. The purpose of this is to make BACKUP SERVER prevent the concurrent execution of SET GLOBAL innodb_log_archive=OFF or SET GLOBAL innodb_log_file_size when innodb_log_archive=OFF. log_sys.archived_checkpoint: Keep track of the earliest available checkpoint, corresponding to log_sys.archived_lsn. This reflects SET GLOBAL innodb_log_recovery_start (which is settable now), for incremental backup. buf_flush_list_space(): Check for concurrent backup before writing each page. This is inefficient, but this function may be invoked from multiple threads concurrently, and it cannot be changed easily, especially for fil_crypt_thread().
2dfc033 to
01a069c
Compare
| backup_target target; | ||
| #ifndef _WIN32 | ||
| const int datadir_fd; | ||
| #endif |
There was a problem hiding this comment.
Why is there a declaration of datadir_fd in the first place? It turns out that it is never being assigned, only passed to openat(2). This seems to rely on zero-initialization and some undefined behaviour. For example, in Linux, AT_FDCWD is defined as -100.
There was a problem hiding this comment.
I noticed later that this data member is being initialized from a call to open(…, O_DIRECTORY) in the constructor. I think that this is bad practice; we should allow the singleton object to be initialized statically.
It is unclear when if ever the directory handle would be closed. I suspect that we would hold the handle open until the server is shut down.
There was a problem hiding this comment.
datadir_fd is initialized in the Aria_backup constructor.
There was a problem hiding this comment.
Is it a reasonable trade-off to have a descriptor permanently opened for backups for the purpose of just one plugin? Should each plugin that performs the backup keep an open descriptor to the data directory? When adding another bit of functionality (not backup) to any part of the server or a plugin, with its own class/module/translation unit, which needs to open files in the data directory, should it also maintain a file descriptor to the data directory? There may be a case for defining an API for the server to maintain a single descriptor and make it available for other code, but given that there isn't one, I would argue that opening a descriptor for the duration of the backup and closing it on backup end. A backup isn't a fine-grained operation which we expect to be performed multiple times per second (or even minute), but we expect that many hours may pass between successive backups. And every backup will typically open hundreds or thousands of files, so saving the time and I/O of one open/close is not significant and it's not clear that maintaining an additional open descriptor the whole time the server is running is is balanced out by that.
In any case I propose putting that discussion off for a possible future improvement, where we could discuss defining an API to make a data directory descriptor available to the whole server.
This reverts commit 75ff32f.
6905bda to
b6549cc
Compare
4b496a7 to
9debff9
Compare
0cfc392 to
0de6368
Compare
Fix concurrent BACKUP_PHASE_FINISH issue.
0de6368 to
8b1c81b
Compare
| case BACKUP_PHASE_NO_DML_NON_TRANS: | ||
| /* FIXME: Would be better to selectively purge only the tables we need. */ | ||
| tc_purge(); | ||
| tdc_purge(true); | ||
| return 0; |
There was a problem hiding this comment.
The comment is missing a reference to a ticket where this performance problem will be fixed. As far as I understand, we only need such cleanup for ENGINE=Aria and ENGINE=MyISAM. We don’t want unnecessary disruption of ENGINE=InnoDB. The BACKUP_PHASE_NO_COMMIT that will be executed a couple of steps after this one will be disruptive enough.
An interim solution with some room for optimization: