Bugzilla – Bug 327
gmond memory leak when deaf = no & no other gmond instances are sending data
Last modified: 2012-04-04 09:15:57
You need to
before you can comment on or make changes to this bug.
- gmond 3.3.1 built from source into RPM using specfile provided with source
- OS Centos 5
- Using unicast with all nodes in a cluster configured with udp_send_channel to
send to designated aggregation nodes
- All nodes have deaf = no & both a tcp_accept_channel & udp_recv_channel
configured for simplicity
When configured as above, those nodes which have deaf = no leak memory at a
pretty significant rate, reaching hundreds of megs within 12 hours. The nodes
which are aggregators, which are the destination of the udp_send_channel
configured on all nodes, do not leak memory.
So it appears that if you have a gmond node which is not receiving metrics
through a configured rx channel it will leak memory.
Modifying the above configuration by removing the tcp_accept_channel &
udp_recv_channel and setting mute = yes will cause the leak to stop.
Per comment from Kostas Georgiou on IRC I changed
gmond.c from "if ((now - udp_last_heard) > 60 * APR_USEC_PER_SEC)" to 60000
and problem "goes away".
I am attaching the image that shows the memory consumption originally. And with
the above change.
Created an attachment (id=277) [details]
Gmond memory utilization, before and after the change
Just curious. You write "Modifying the above configuration by removing the
tcp_accept_channel & udp_recv_channel and setting mute = yes will cause the
leak to stop."
But if you set "mute=yes", the gmond will not send any data any more. did you
Good catch - yes, I meant that you must change deaf = yes.
Current attempt for a fix at git://github.com/georgiou/monitor-core.git in the
The first commit only resets the channels once every 60 secs instead of every
cycle reducing the effects of the leak.
The second commit fixes a leak in join_mcast and should also be safe.
The third commit fixes the main leak but it will need a second pair of eyes and
some testing to make sure it doesn't break anything.
in review to merge after 3.3.5 gets released in :