sql server - The curious case of HADR_SYNC_COMMIT waits - Database Administrators Stack Exchange


we noticing interesting pattern hadr_sync_commit waits in our environment. have 3 replica; 1 primary, 1 sync secondary , 1 async secondary in datacenter , added 3 more async replicas in datacenter (~2400 miles apart).

ever since, have started notice enormous increase in hadr_sync_commit waits. when @ active sessions, see bunch of commit transaction queries waiting on sync replica

from screenshot, can see there jump in hadr_sync_commit wait on june 29, , dropped 'two' of 3 async replica in remote datacenter sometime in noon on july 1st. dropped wait times considerably along it.

image

what have checked far – log send queue, redo queue, last hardened time , last commit time on remote replicas. have continuous bursts of small transactions during business hours, , therefore send queues pretty small @ given timestamp (anywhere between 60kb , 1mb).
remote replicas in sync, there little difference between last commit time , last hardened time individual lsn on replicas.

the network pipe 10g , modified transmit buffer size 256 megs 2 gigs, made under assumption network dropping packets , re-transmitting them; either way didn’t seem much.

so, i’m wondering async replicas have hadr_sync_commit waits? shouldn’t sync replica depend alone on wait type, missing here?

first description of wait event question regarding is:

waiting transaction commit processing synchronized secondary databases harden log. wait reflected transaction delay performance counter. wait type expected synchronized availability groups , indicates time send, write, , acknowledge log secondary databases.

https://msdn.microsoft.com/en-us/library/ms179984.aspx

digging mechanics of wait have log blocks being transmitted , hardened recovery not completed on remote servers. being case , given added additional replicas stands reason hadr_sync_commit may increase due increase in bandwidth requirements. in case aaron bertrand correct in comments on question.

source: http://blogs.msdn.com/b/psssql/archive/2013/04/26/alwayson-hadron-learning-series-hadr-sync-commit-vs-writelog-wait.aspx

digging second part of question how wait related application slowdowns. believe causality issue. looking @ waits increasing , recent user complaint , drawing conclusion potentially incorrectly 2 have relationship when may not case @ all. fact added tempdb files , application became more responsive me indicates may have had underlying contention issues have been exacerbated additional overhead of implicit snapshot isolation level overhead when database in availability group. may have had little or nothing hadr_sync_commit waits.

if wanted test utilize extended event trace looks @ hadr_db_commit_mgr_update_harden xevent on primary replica , baseline. once have baseline can add replicas in 1 @ time , see how trace changes. encourage use file resides on volume not contain databases , set rollover , maximum size. please adjust duration filter needed gather events match waits can further troubleshoot , correlate other teams need involved.

create event session [hadr_sync_commit-monitor] on server  -- run on primary replica  add event sqlserver.hadr_db_commit_mgr_update_harden(     ([delay]>(10))) -- encourage use delay filter avoid getting many events back, measured in milliseconds add target package0.event_file(set filename=n'<yourfilepathhere>') (max_memory=4096 kb,event_retention_mode=allow_single_event_loss,max_dispatch_latency=30 seconds,max_event_size=0 kb,memory_partition_mode=none,track_causality=off,startup_state=off) go 

Comments