Bugzilla – Bug 327
gmond memory leak when deaf = no & no other gmond instances are sending data
Last modified: 2012-04-04 09:15:57
You need to log in before you can comment on or make changes to this bug.
Configuration: - gmond 3.3.1 built from source into RPM using specfile provided with source - OS Centos 5 - Using unicast with all nodes in a cluster configured with udp_send_channel to send to designated aggregation nodes - All nodes have deaf = no & both a tcp_accept_channel & udp_recv_channel configured for simplicity Problem: When configured as above, those nodes which have deaf = no leak memory at a pretty significant rate, reaching hundreds of megs within 12 hours. The nodes which are aggregators, which are the destination of the udp_send_channel configured on all nodes, do not leak memory. So it appears that if you have a gmond node which is not receiving metrics through a configured rx channel it will leak memory. Modifying the above configuration by removing the tcp_accept_channel & udp_recv_channel and setting mute = yes will cause the leak to stop.
Per comment from Kostas Georgiou on IRC I changed gmond.c from "if ((now - udp_last_heard) > 60 * APR_USEC_PER_SEC)" to 60000 and problem "goes away". I am attaching the image that shows the memory consumption originally. And with the above change.
Created an attachment (id=277) [details] Gmond memory utilization, before and after the change
Just curious. You write "Modifying the above configuration by removing the tcp_accept_channel & udp_recv_channel and setting mute = yes will cause the leak to stop." But if you set "mute=yes", the gmond will not send any data any more. did you mean "deaf=yes"?
Good catch - yes, I meant that you must change deaf = yes.
Current attempt for a fix at git://github.com/georgiou/monitor-core.git in the fixes/bz327 branch. The first commit only resets the channels once every 60 secs instead of every cycle reducing the effects of the leak. The second commit fixes a leak in join_mcast and should also be safe. The third commit fixes the main leak but it will need a second pair of eyes and some testing to make sure it doesn't break anything.
in review to merge after 3.3.5 gets released in : https://github.com/ganglia/monitor-core/pull/30