Monitoring oracle with intelligent agent and net or ucd snmp

| | Comments (4)
It's *pure* voodoo. You make oracle's "normal" snmp agent, called 'master_peer', act as the master agent for the system. Then , oracle's 'encap_peer' will look for an snmp "subagent" on port UDP:1161. That's where you put ucd-snmp or net-snmp. Oracle will essentially proxy all requests that aren't it's own to the other program. This let's you get to the normal SNMP, HOST, et al MIBs. Read on...

The purpose of this document is to walk through installation, configuration, and examination of oracle's smart agent's SNMP capabilities. This document is specific to Unix, but directions exist for windows. Oracle provides it's own SNMP master agent. You'll use an oracle wrapper to net/ucd-snmp. I tried to do it the other way. I tried to make net-snmp the master and *just* use dbsnmp's port 1748 (snmp port), but it didn't work. So this is the way for now.

First connect to the instance(s) in sqlplus as sysdba and run 'catsnmp.sql'. This script creates an SNMPAGENT role in the database that has privs on V$'s to harvest stats. It also creates a user called DBSNMP that is assigned the SNMPAGENT role. There is also an undo script called 'catnsnmp.sql'.


@?/rdbms/admin/catsnmp.sql
The mib's end in .v1 and are in $ORACLE_HOME/network/doc read the top couple of lines to see which MIB is which. The oracleDDB and rdbms were most important for my needs. You'll probably want to copy those to your MIB directory (/usr/local/share/snmp/mibs ymmv). One of the MIB's is broken for net-snmp. Edit onrs.v1

        Counter
                FROM RFC1155-SMI

becomes
 
        Counter, enterprises
                FROM RFC1155-SMI
Now we need to make some small changes in oracle's configuration. Look in $ORACLE_HOME/network/snmp/peer. You'll see
  • snmp.conf - crap, they expect you to feed this to your SNMP daemon. We don't.
  • master_peer - the "master" agent. It's an SNMP server.
  • CONFIG.master - config for master server (oracle's snmp server)
  • encap_peer - the encapsulation server shim between master_peer and snmpd. It passes requests to native snmp on 1161
  • CONFIG.encap - config for the encap server
  • start_peer - a silly wrapper script to start master, encap, and then your native snmp server.
You can probably leave CONFIG.master except maybe to change the IP of your SNMP trap server. I don't have one so I just setup a readonly community that only accepts from localhost.

COMMUNITY public
ALLOW GET OPERATIONS
USE NO ENCRYPTION
MEMBERS localhost
Edit CONFIG.encap and add any extra oids. Add any other OID's you want encapsulated and passed to your native snmp server. The extra OID's shown are host, internet and rdbms.

AGENT AT PORT 1161 WITH COMMUNITY public 
SUBTREES        1.3.6.1.2.1.1, 
                1.3.6.1.2.1.2, 
                1.3.6.1.2.1.3, 
                1.3.6.1.2.1.4, 
                1.3.6.1.2.1.5, 
                1.3.6.1.2.1.6, 
                1.3.6.1.2.1.7, 
                1.3.6.1.2.1.8, 
                1.3.6.1.2.1.25, 
                1.3.6.1.2.1.39, 
                1.3.6.1.4.1.2021, 
                1.3.6.1.4.1.11.2 
FORWARD ALL TRAPS; 
Now to fix oracle's start_peer script. This script starts master, encap, and snmpd. Edit start_peer (another lame oracle script). Make sure SNMPD= points to your snmpd daemon (/usr/local/sbin/snmpd ymmv). You should point SNMPD_CONF= to a vlid snmpd.conf file, but I've found you can do it without. But you get whatever the defaults are. Snmpd config files are outside of scope, but see your snmpd.conf man page to write a suitable one. Then fix the line at the end that calls ucd/net snmp. Fix it to use (or not use) the right config file. Note the difference in how the two programs get listening addresses.

        $SNMPD -c $SNMPD_CONFIG -p $NEW_SNMPD_PORT >snmpd.out 2>&1 &
to (ucd snmp)
        $SNMPD -C -c $SNMPD_CONFIG -p $NEW_SNMPD_PORT  >snmpd.out 2>&1 &
or to (net-snmp)
        $SNMPD -C -c $SNMPD_CONFIG $HOSTNAME:$NEW_SNMPD_PORT  >snmpd.out 2>&1 &
Make sure your native snmpd isn't running. Now you can start daemons with './start_peer -a'. You'll get output like this:

Starting master_peer ...
./master_peer CONFIG.master NOV >master_peer.out 2>&1 &
Done!

Starting encap_peer ...
./encap_peer -t 1162 -s 1160 -c CONFIG.encap >encap_peer.out
2>&1 &
Done!

Starting /usr/sbin/snmpd ...
/usr/sbin/snmpd -C -c /usr/local/shar/snmp/snmpd.conf -p 1161 >snmpd.out 2>&1 &
Done!
Now you can start oracle's dbsnmp processes. You use lsnrctl to manage this. Start the dbsnmp service by typing 'lsnrctl dbsnmp_start' as the oracle user. There are two dbsnmp processes that start. One listens on TCP:1748 as the SNMP agent (only comms with iAgent). The other is the intelligent agents RPC handler on port TCP:1754 (for oem mainly). I'd like to turn the second one off, but haven't figured it out yet.

Finally, you want to wrap this all up into a nice startup script. Remember that iAgent is now starting your snmpd daemon for you. So, you'll want to remove snmpd from you rc scripts. Oracle, doesn't have an SNMP trap daemon running. So, if this is a trap station keep snmptrapd in init. Here's my init script for snmp (oracle_intelligent_agent). It's not very elegant in the clean up, but oracle doesn't give you much to work with. Make sure this script starts after oracle, and is stopped before oracle is stopped. You have to hax sun init scripts to get the right environment. The one below is for linux. If you need sun help contact me.


#!/bin/bash

# Simple script to start oracle's SNMP subsystem
# which also manages the native snmpd.
#
# This script gets run with /bin/sh on solaris
# so watch your syntax bash-boy
#

ORACLE_HOME=/oracle/product/8.1.7
ORACLE_OWNER=oracle
export ORACLE_HOME
export ORACLE_OWNER

case "$1" in
 'start')
    echo "Starting Oracle Intelligent Agent"
    cd $ORACLE_HOME/network/snmp/peer/
    ./start_peer -a
    sleep 2
    su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl dbsnmp_start"
    ;;
 'stop')  # Stop the Oracle databases and Net8 listener
    echo "Stopping oracle intelligent agent"
    su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/lsnrctl dbsnmp_stop"
    pkill master_peer 
    pkill encap_peer
    pkill snmpd
    sleep 2
    if pgrep "master_peer|encap_peer|snmpd"
    then
      pkill -KILL master_peer 
      pkill -KILL encap_peer
      pkill -KILL snmpd
    fi
    ;;
 *)
    echo "Usage: $0 start|stop"
    ;;
esac
Now check that everything is working. Can you snmpwalk the system? Solaris with net-snmp I used 'snmpwalk -Of -m all -c public -v 1 localhost .' Which basically says use all MIB's to give FQ descriptions of all OID's on public@localhost using SNMPv1 which is oracle's only version I can find. Using ucd-snmp on linux I used 'snmpwalk -Of -m all localhost public .' You should get a HUGE list that includes HOST and many of the oracle OIDs.

[...]
application.applTable.applEntry.applIndex.5 = INTEGER: 5
application.applTable.applEntry.applName.2 = STRING: "bb6"
[...]
rdbmsMIB.rdbmsObjects.rdbmsDbTable.rdbmsDbEntry.rdbmsDbIndex.2 = INTEGER: 2
rdbmsMIB.rdbmsObjects.rdbmsDbTable.rdbmsDbEntry.rdbmsDbIndex.5 = INTEGER: 5
[...]

Cool huh? You're basically back where you started with a working SNMP agent, but with oracle doing proxxy magic to net/ucd snmp. You should shore up the security by tailoring the config files and your firewall to your needs. Now you just need to figure out which datums are important to you and throw them at MRTG or HPOV or whatever.

Problems? The 'peers' have some kind of lock on a socket or file that takes 5 minutes to release. So you either have to wait or reboot to start master_peer or encap_peer. These are files to check for messages:

  • $ORACLE_HOME/network/logs/dbsnmpw.log
  • $ORACLE_HOME/network/logs/dbsnmpw.log
  • $ORACLE_HOME/network/snmp/peer/master_peer.out
  • $ORACLE_HOME/network/snmp/peer/encap_peer.out
  • $ORACLE_HOME/network/snmp/peer/snmpd.out

Some references I used
http://download-west.oracle.com/docs/cd/A87860_01/doc/em.817/a85249/toc.htm
Google groups discussion
http://sourceforge.net/docman/display_doc.php?docid=5201&group_id=12694

4 Comments

I've been looking for a reference like this. Do you have anything similar for Solaris 9's default snmpdx daemon?

Nice write up. Only problem I'm having (on SuSE 9.1, Oracle 9.0.2.4) is that start_peer starts the master on udp/161, starts the peer, then starts the native snmpd (/usr/sbin/snmpd) which then tries to bind to udp/161 and tcp/1161. The udp bind fails, of course. Any ideas/suggestions?

I am trying to set up Oracle so that we can monitor the performance using SNMP data. You are correct that Oracle support is horrible. Would it be possible for you to guide me. Our platform is : Oracle 9i on SOlaris 2.8.

Hello guys. I've got recipe for Oracle 9.2 on Debian GNU/Linux 3.1 (Sarge).

Thanks Jim Weller for his article - it helped alot.

So procedure is following ('#' prepends root's command prompt, '$' prepends oracle's command prompt):

* install Oracle 9.2

* install SNMP daemon and stop it immediately:
# aptitude install snmpd
# /etc/init.d/snmpd stop

* edit Oracle config files in В $ORACLE_HOME/network/snmp/peer:

- CONFIG.encap:

AGENT AT PORT 1161 WITH COMMUNITY public
SUBTREES 1.3.6.1.2.1.1,
1.3.6.1.2.1.2,
1.3.6.1.2.1.3,
1.3.6.1.2.1.4,
1.3.6.1.2.1.5,
1.3.6.1.2.1.6,
1.3.6.1.2.1.7,
1.3.6.1.2.1.8,
1.3.6.1.2.1.25,
1.3.6.1.2.1.39,
1.3.6.1.4.1.77,
1.3.6.1.4.1.2021,
1.3.6.1.4.1.11.2
FORWARD ALL TRAPS;

- CONFIG.master:

COMMUNITY public
ALLOW GET OPERATIONS
USE NO ENCRYPTION
MEMBERS localhost

- start_peer (change string that start snmp to following):

echo "$SNMPD -p $NEW_SNMPD_PORT >snmpd.out 2>&1 &"
$SNMPD -p $NEW_SNMPD_PORT >snmpd.out 2>&1 &

* copy Oracle MIBs into snmpd mib's directory:

# cp $ORACLE_HOME/network/doc/*.v1 /usr/share/snmp/mibs/

* fix /usr/share/snmp/mibs/onrs.v1:
old content:
Counter
FROM RFC1155-SMI
fixed content:
Counter, enterprises
FROM RFC1155-SMI

* add missing mib RFC1316-MIB

# cd /usr/share/snmp/mib
# wget http://www.simpleweb.org/ietf/mibs/modules/IETF/txt/RFC1316-MIB

* run Oracle's catsnmp.sql script on behalf of SYSDBA

$ sqlplus "/ as sysdba" @$ORACLE_HOME/rdbms/admin/catsnmp.sql

* start 'start_peer':

# cd $ORACLE_HOME/network/snmp/peer
# ./start_peer -a

* make 'dbsnmp' executable (for some strange reason mine 'dbsnmp' was not executable):
$ chmod +x $ORACLKE_HOME/bin/dbsnmp

* then start Oracle Intelligent Agent:

$ agentctl start

* check agent status:

$ agentctl status

* check how the agent is working:

$ snmpwalk -t 30 -M /usr/share/snmp/mibs/ -m all -c public localhost rdbmsMIB

$ snmpwalk -t 30 -M /usr/share/snmp/mibs/ -m all -c public localhost enterprises

(notice that you have to use long timeouts - 30 sec in this case because agent responds very slow).

* create /etc/init.d/orasnmp script:

#!/bin/bash
#
# dbora Start/Stop Oracle SNMP agent
#

ORACLE_HOME=
ORACLE_OWNER=oracle
export ORACLE_HOME
export ORACLE_OWNER

case "$1" in
'start')
echo "Starting Oracle Intelligent Agent"
cd $ORACLE_HOME/network/snmp/peer/
./start_peer -a
sleep 2
su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/agentctl start"
;;
'stop')
echo "Stopping Oracle Intelligent Agent"
su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/agentctl stop"
pkill master_peer
pkill encap_peer
pkill snmpd
sleep 2
if pgrep "master_peer|encap_peer|snmpd"
then
pkill -KILL master_peer
pkill -KILL encap_peer
pkill -KILL snmpd
fi
;;
'status')
echo "Oracle Intelligent Agent status:"
su - $ORACLE_OWNER -c "$ORACLE_HOME/bin/agentctl status"
;;
*)
echo "Usage: $0 start|stop|status"
exit 1
;;
esac

* then turn off snmpd and turn on orasnmp script in startup:

# update-rc.d -f snmpd remove
# update-rc.d orasnmp defaults 50 35

(notice that you have to start orasnmp after starting of oracle database)