#archlinux32 | Logs for 2018-04-06

[20:17:24] <deep42thought> Hi, we're going to attemt to set up group replication with one host in the us and the other(s) in europe. Anything, we should pay attention to make it play nicely with the large(r) latency?
[20:29:04] <mgriffin> deep42thought: why is other(s) seemingly optional, are you considering an even number of nodes in your cluster?
[20:29:45] <deep42thought> mgriffin: yes, we'd start with two, others might follow later
[20:29:47] <mgriffin> deep42thought: also, do you mean to talk about GR in #mysql or do you mean Galera?
[20:30:57] <deep42thought> I'm talking about GR with mariadb (archlinux' standard for mysqld)
[20:31:03] <deep42thought> I've not heard of Galera
[20:31:23] <mgriffin> deep42thought: with GR/Galera, you can't use two nodes, even if they are in the same DC. You need to allow majority consensus.
[20:32:00] <deep42thought> is this Galera: https://dev.mysql.com/doc/refman/5.7/en/group-replication.html ?
[20:32:11] <deep42thought> I wanted to try the tutorial linked.
[20:32:15] <mgriffin> i'm 90% sure that mariadb does not have "group replication", which you link to
[20:32:25] <deep42thought> oh
[20:32:50] <mgriffin> it has "Galera", which came out before GR, and is quite similar in operation (certification based replication)
[20:34:36] <deep42thought> When we set up a Galera cluster with 3 nodes and one fails - will this crash the cluster or will it come back when the 3rd node appears again?
[20:35:11] <mgriffin> galera with an odd number of nodes in a single datacenter will stay online if one fails
[20:35:22] <deep42thought> ok
[20:35:33] <mgriffin> with multiple datacenters, you have to still think about consensus.
[20:36:04] <deep42thought> by "datacenters" you mean geographical locations?
[20:36:27] <mgriffin> lets say i mean physical racks of servers
[20:36:37] <deep42thought> ok
[20:37:20] <mgriffin> you want two nodes in UK and one in US if UK is more "important", but having an even number of datacenters is still kinda risky
[20:37:42] <mgriffin> lets say you have uk-db1, uk-db2, and us-db3..
[20:38:06] <mgriffin> if uk-db2 fails, you still have majority since uk-db1 and us-db3 are still up
[20:38:21] <mgriffin> but then what if there is a network problem between uk/us.. now you are down
[20:39:02] <mgriffin> btw galera lets you optimize somewhat for wan replication, in that you can define all of the uk nodes in one "segment"
[20:39:31] <mgriffin> so if a write happens on us, the writeset is shipped to only one of the nodes over wan and it relays to the others in that segment over lan
[20:41:01] <mgriffin> you can use just two datacenters with four nodes total, if you use weighted quorum
[20:41:52] <deep42thought> I see
[20:42:19] <deep42thought> I think, we do not (yet) need to consider two simultanous failures
[20:42:29] <mgriffin> actually you can get away with just one node in uk/us with weighted quorum, as well
[20:42:41] <deep42thought> what if the "heavier" node fails?
[20:42:55] <mgriffin> http://galeracluster.com/documentation-webpages/weightedquorum.html
[20:43:55] <mgriffin> fwiw i typically treat galera as "master/slaves" for various reasons
[20:46:43] <tyzoid> So a weighted quorum replication essentially acts as master-slave, with the slave able to accept writes if connected to the node of higher weight
[20:46:54] <tyzoid> (assuming only 2 nodes)
[20:47:17] <deep42thought> sounds like exactly what we want
[20:47:23] <deep42thought> ... for the beginning
[20:47:44] <tyzoid> once I reprovision srv0, we could run another node on the US side on that
[20:48:02] <tyzoid> mgriffin: Can you confirm my understanding of that?
[20:48:40] <mgriffin> tyzoid: that sounds right but i still don't like writing to more than one node for performance reasons
[20:48:50] <mgriffin> tyzoid: unless you are writing to different tables
[20:49:08] <tyzoid> mgriffin: Well we've currently got a read-slave on the US side
[20:49:10] <deep42thought> we will mostly write on one server
[20:49:18] <tyzoid> but the problem is if the current master goes down
[20:49:26] <tyzoid> we want to be able to sync them back up easily
[20:49:35] <tyzoid> with the possibility of adding write slaves on the US side later
[20:49:39] <mgriffin> http://galeracluster.com/documentation-webpages/dealingwithmultimasterconflicts.html
[20:49:45] <tyzoid> only way I can see to do both is using multi-master
[20:52:45] <tyzoid> Unless I'm mistaken, having a conflict like that is only possible if transactions are allowed to commit prematurely, right?
[20:52:50] <tyzoid> mgriffin: ^
[20:53:28] <mgriffin> i don't know what you mean prematurely
[20:53:55] <tyzoid> if commit is accepted before consensus of nodes is reached
[20:53:59] <mgriffin> if two nodes try to modify the same row faster than it can be replicated, one needs to conflict and abort
[20:54:37] <mgriffin> update site set hits=hits+1 where property=42;
[20:54:49] <mgriffin> if you have hot rows like that, you can easily run in to a lot of conflicts
[20:55:01] <tyzoid> right, but then you trigger a failure in the client application, right?
[20:55:06] <MadMerlin|work> fyi, there's a couple outstanding bugs with galera (at least with 10.2.x) that may be relevant to you... https://jira.mariadb.org/browse/MDEV-9266 https://jira.mariadb.org/browse/MDEV-15252 https://jira.mariadb.org/browse/MDEV-14153 https://jira.mariadb.org/browse/MDEV-15227 https://jira.mariadb.org/browse/MDEV-15154
[20:55:13] * tyzoid isn't sure he sees the problem with that
[20:55:16] <MadMerlin|work> we're looking to deploy galera as well, but those are blockers for us
[20:55:31] <mgriffin> tyzoid: the performance from end user perspective is reduced if the application has to retry
[20:55:54] <tyzoid> mgriffin: I don't really care about retry performance, as long as the initial failure is reasonably quick
[20:56:01] <tyzoid> we're not doing massively parallel updates of tables
[20:56:31] <tyzoid> I just don't want the initial commit to take more than ~100ms
[20:56:52] <mgriffin> give it a shot
[20:57:06] <tyzoid> deep42thought: ^
[20:57:36] <tyzoid> mgriffin / MadMerlin|work: Mind if I post a log of this conv. publicly ?
[20:57:48] <mgriffin> be sure to configure sst to be fast and use a large enough gcache for ist
[20:58:06] <mgriffin> i assume anything i say on the internet will be used against me in a court of law
[20:58:11] <MadMerlin|work> fine by me
[20:58:42] <tyzoid> thanks for the help, then! I'm sure we'll be back if we hit any issues
[20:59:11] <mgriffin> for xtrabackup sst, use parallel io threads and parallel compression
[20:59:26] <mgriffin> on the receiver use parallel decompression and potentially parallel io threads
[20:59:52] <mgriffin> set gcache to be large enough to allow a node to be offline for an hour maybe but not so large that ist is slower than sst
[21:00:02] <mgriffin> ie your gcache should not be larger than your datadir
[21:00:29] <mgriffin> use a vpn if you aren't using ssl...
[21:00:46] <mgriffin> (for connections and group communication)
[21:01:40] <mgriffin> oh and don't forget to increase memory for innobackupex apply in sst
[21:02:06] <mgriffin> inno-apply-opts = "--use-memory=2G"
[21:03:29] <mgriffin> haven't dusted this off in a while but here is a starting point https://pastebin.com/ZDYwJKAh