mariadb "Last checkpoint at" behavior broken when inserting data since 10.5.7 and nobody cares? This breaks mariabackup inc backups (even still in 10.9.x)!

Asked by Sebastian Bergmann

We have a major issue with mariadb-server starting from 10.5.7 as described below, thank your for your time.

MariaBackup and the values in file "xtrabackup_checkpoints" are based on MariaDB Parameter "Last checkpoint at" , "Log sequence number" and so on. To keep it simple I will first describe The behavior BEFORE mariadb-Server 10.5.7 and STARTING FROM 10.5.7 (until newest)

BEFORE 10.5.7
Checkpoint Value is increased every time you insert/commit data
Full backup -> Size 70GB
inc 1 backup: ~25mb
inc 2 backup : ~25mb
and so on

STARTING FROM 10.5.7 (until even newest 10.9.x) it changes
Checkpoint Value is NOT INCREASING. Only mariadb service restart changes it so far. So leaves us with gigantic inc backups
Full backup -> Size 70GB
inc 1 backup: Size 70GB + x
inc 2 backup : Size 70GB + xy
so every inc backup is BIGGER than the Full backup at the beginning!

the faulty behavior is supported by the inc file output:
backup_type = incremental
from_lsn = 1049533741592
to_lsn = 1049533741592
last_lsn = 1050057849314

to_lsn and from_lsn are NOT INCREASING until we restart the whole database - which is clear because mariadb-server doesn't change the checkpoint values. Why is that???
What am I missing? I noticed some parameter changes in 10.5.7:

Question information

Language:
English Edit question
Status:
Answered
For:
MariaDB Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Sebastian Bergmann (ryanthur) said :
#1

 parameter changes in 10.5.7:

innodb_lru_flush_size 32
innodb_lru_scan_depth 1024 -> 1536
innodb_max_dirty_pages_pct 75 -> 90

do those have impact? When changing them we noticed no difference in behavior. I they are not the root cause, what gigantic secret is hidden in the Code of mariadb-server 10.5.7?

please help us out, we are stuck for weeks on this topic.

Thanks!

Revision history for this message
Sebastian Bergmann (ryanthur) said :
#2

https://jira.mariadb.org/browse/MDEV-27295

we solved it for now with SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.001;

If the performance impact is not too bad we will use it as solution.

But I still dont get how nobody cares about this behavior in default, weird.

Revision history for this message
Sergei Golubchik (sergii) said :
#3

Thank you for your feedback. We intend to run some performance tests to see if the default value can be changed without hurting performance.

Note that even if you find that `innodb_max_dirty_pages_pct_lwm=0.001` is too expensive to use all the time, you can still use it for a backup, like
* SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.001;
* Wait few seconds for a checkpoint to happen
* SET GLOBAL innodb_max_dirty_pages_pct_lwm=default;
* perform a backup.

Revision history for this message
Sebastian Bergmann (ryanthur) said (last edit ):
#4

maybe some additional Info for you: We narrowed it down to a change in 10.5.6 -> 10.5.7.
We excluded external impacts (our params) since a clean installation also changes behavior.
We are using 10.6.10 atm but since it started with 10.5.7 I referred to that version all the time since it might make investigation easier.

thank you.

Revision history for this message
Sergei Golubchik (sergii) said :
#5

I've created https://jira.mariadb.org/browse/MDEV-30000 to implement the above workaround in mariadb-backup.

Please use it for further discussion. We don't really use launchpad anymore, the code is on github, bug/feature management is done in Jira.

Revision history for this message
Sebastian Bergmann (ryanthur) said (last edit ):
#6

Well I'm happy you saw it anyway.

Please also consider possible performance impact if you only do rare checkpoints. I don't entirely understand the mechanics there and just want to point at it in case it could be a problem.

thanks.

Revision history for this message
Sebastian Bergmann (ryanthur) said (last edit ):
#7

First round of tests shows 4-8 Times more Write I/O when innodb_max_dirty_pages_pct_lwm=0.001 is permanently active. Test time was ~15% longer.
CPU seemed a bit lower.

Can you help with this problem?

Provide an answer of your own, or ask Sebastian Bergmann for more information if necessary.

To post a message you must log in.