Bug 198

Summary: 3.1.0 gmetad segfaults when aggregating XML from another 3.1.0 gmetad
Product: Ganglia Monitoring System Reporter: Bernard Li <bernard@vanhpc.org>
Component: gmetadAssignee: Brad Nicholes <bnicholes@novell.com>
Status: RESOLVED FIXED    
Severity: critical CC: carenas@sajinet.com.pe
Priority: P2    
Version: 3.1.x   
Hardware: PC   
OS: All   
URL: http://www.mail-archive.com/ganglia-developers@lists.sourceforge.net/msg04584.html
Attachments: proposed backport patch for 3.1

Description:   Opened: 2008-08-08 16:49
When gmetad has a data_source which points to another gmetad's non-interactive
port (8651 by default), it will segfault.

gmetad debug output:

[Test] is a 2.5 or later data stream
hash_create size = 50
hash->size is 53
Found a <GRID>, depth is now 1
Segmentation fault

strace output:

0xfee2b0b4)          = ? ERESTART_RESTARTBLOCK (To be restarted)
+++ killed by SIGSEGV +++

gdb output:

#0  0x000000300160af10 in hash_lookup (key=0x44603a90, hash=0x0) at hash.c:304
#1  0x000000000040590b in startElement_EXTRA_ELEMENT (data=0x44604c50, el=<value
optimized out>, attr=0x621980)
    at process_xml.c:731
#2  0x0000000000405eac in start (data=0x44604c50, el=0x61f9dc "EXTRA_ELEMENT",
attr=0x621980) at process_xml.c:1010
#3  0x0000003005209e49 in strcmp () from /lib64/libexpat.so.1
#4  0x000000300520acf4 in strcmp () from /lib64/libexpat.so.1
#5  0x000000300520be19 in strcmp () from /lib64/libexpat.so.1
#6  0x000000300520ce2b in strcmp () from /lib64/libexpat.so.1
#7  0x0000003005203fb1 in XML_ParseBuffer () from /lib64/libexpat.so.1
#8  0x0000000000405150 in process_xml (d=0x60fb10,
    buf=0x610dd0 "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"
standalone=\"yes\"?>\n<!DOCTYPE GANGLIA_XML [\n   <!ELEMENT GANGLIA_XML
(GRID|CLUSTER|HOST)*>\n      <!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>\n  
   <!ATTLIST"...)
    at process_xml.c:1186
#9  0x0000000000403f2d in data_thread (arg=<value optimized out>) at
data_thread.c:160
#10 0x0000003001e06407 in start_thread () from /lib64/libpthread.so.0
#11 0x00000030012d4b0d in clone () from /lib64/libc.so.6
------- Comment #1 From Carlo Marcelo Arenas Belon 2008-08-08 17:13:25 -------
this is partially a duplicated of BUG188, who already has the patch uploaded.

will update it here too for completeness and to track it getting merged into
3.1.1  not to waste this bug then.
------- Comment #2 From Carlo Marcelo Arenas Belon 2008-08-08 17:14:37 -------
Created an attachment (id=152) [details]
proposed backport patch for 3.1

also uploaded in BUG188
------- Comment #3 From Carlo Marcelo Arenas Belon 2008-08-15 06:01:07 -------
Commited revision 1657 for 3.1