Submitted by: Sylvain GALLIANO sg@efficientip.com
Step to reproduce:
/dev/null
You should see:
==86177== 9,568 bytes in 26 blocks are definitely lost in loss record 337
of 339
==86177== at 0x4C1EC90: calloc (in
/data1/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==86177== by 0x5127B29: netsnmp_swrun_entry_create (swrun.c:306)
==86177== by 0x5139FA4: netsnmp_arch_swrun_container_load
(swrun_kinfo.c:203)
==86177== by 0x512770D: netsnmp_swrun_container_load (swrun.c:224)
==86177== by 0x51273E0: _cache_load (swrun.c:140)
==86177== by 0x4E3C867: _cache_load (cache_handler.c:700)
==86177== by 0x4E3C295: netsnmp_cache_check_and_reload
(cache_handler.c:542)
==86177== by 0x4E3C6DA: netsnmp_cache_helper_handler
(cache_handler.c:638)
==86177== by 0x4E52D61: netsnmp_call_handler (agent_handler.c:526)
==86177== by 0x4E5328F: netsnmp_call_next_handler (agent_handler.c:640)
==86177== by 0x4E45392: table_helper_handler (table.c:712)
==86177== by 0x4E52D61: netsnmp_call_handler (agent_handler.c:526)
The issue come from duplicate index insertion on hrSWRunTable (all threads
are retrieved, leading to duplicate PID indexes)
diff -ru net-snmp-5.7.3-leak/agent/mibgroup/host/data_access/swrun_kinfo.c
net-snmp-5.7.3/agent/mibgroup/host/data_access/swrun_kinfo.c
--- net-snmp-5.7.3-leak/agent/mibgroup/host/data_access/swrun_kinfo.c
2016-06-07 18:48:23.000000000 +0200
+++ net-snmp-5.7.3/agent/mibgroup/host/data_access/swrun_kinfo.c
2016-06-07 18:48:40.000000000 +0200
@@ -189,7 +189,7 @@
#elif defined(HAVE_KVM_GETPROC2)
proc_table = kvm_getproc2(kd, KERN_PROC_ALL, 0, sizeof(struct
kinfo_proc2), &nprocs );
#else
- proc_table = kvm_getprocs(kd, KERN_PROC_ALL, 0, &nprocs );
+ proc_table = kvm_getprocs(kd, KERN_PROC_PROC, 0, &nprocs );
#endif
for ( i=0 ; i<nprocs; i++ ) {
if ( 0 == proc_table[i].SWRUN_K_STAT )
This should no happen :-)
Can you give me the output from "procstat -at" and/or "ps -auxH -o lwp" when it happens?
I added a log in swrun_kinfo.c:
When running a 'snmpwalk -v 2c -c public 10.0.231.14 HOST-RESOURCES-MIB::hrSWRunTable'
I got these duplicate lines:
got ID 827
got ID 825
got ID 808
got ID 808
got ID 808
got ID 808
got ID 805
got ID 804
-> 808 is nfsd (multi-threads)
OK, I see the problem. Yes, you are right, that replacing KERN_PROC_ALL with KERN_PROC_PROC would fix it.
On the other hand, I would like to keep the kernel threads visible, but maybe that is not possible with one sweep of the proc table.
I will have to think a bit more abiut this.
Thank you for the patch.
We have heavy threaded java applications and had to restart snmpd every couple of weeks because of this leak.
The patch works perfectly for us :-)
Actually, INSERT_CONTAINER returns -1 if entry is a duplicate.
So the correct solutions is:
--- swrun_kinfo.c.orig 2017-05-04 20:37:09.650803000 +0200
+++ swrun_kinfo.c 2017-05-04 20:38:46.640120000 +0200
@@ -176,7 +176,7 @@
netsnmp_arch_swrun_container_load( netsnmp_container container, u_int flags)
{
struct SWRUN_TABLE proc_table;
- int nprocs, i, rc;
+ int nprocs, i;
char buf[BUFSIZ+1], *argv;
netsnmp_swrun_entry entry;
@@ -203,7 +203,12 @@
entry = netsnmp_swrun_entry_create(proc_table[i].SWRUN_K_PID);
if (NULL == entry)
continue; / error already logged by function /
- rc = CONTAINER_INSERT(container, entry);
+ if ( -1 == CONTAINER_INSERT(container, entry)) {
+ // entry didn't get inserted (duplicate key)
+ free(entry);
+ continue;
+ }
+
That way kernel threads get inserted, normal processes get inserted (once), and memory doesn't leak.
Will upload patch file.
Last edit: Markus Wennrich 2017-05-04
Fixed by commit [5846564f5be46e0e362be894d4cb57be383c5b3d].
Related
Commit: [584656]