Topics: AIX, System Administration

Configuring dsh

The dsh (distributed shell) is a very useful (and powerful) utility that can be used to run commands on multiple servers at the same time. By default it is not installed on AIX, but you can install it yourself:

First, install the dsm file sets. DSM is short for Distributed Systems Management, and these filesets include the dsh command. These file sets can be found on the AIX installation media. Install the following 3 filesets:

# lslpp -l | grep -i dsm
  dsm.core  COMMITTED  Distributed Systems Management
  dsm.dsh  COMMITTED  Distributed Systems Management
  dsm.core  COMMITTED  Distributed Systems Management
Next, we'll need to set up some environment variables that are being used by dsh. The best way to do it, is by putting them in the .profile of the root user (in ~root/.profile), so you won't have to bother setting these environment variables manually every time you log in:
# cat .profile
alias bdf='df -k'
alias cls="tput clear"
stty erase ^?
export TERM=vt100

# For DSH
export DSH_NODE_RSH=/usr/bin/ssh
export DSH_NODE_LIST=/root/hostlist
export DSH_NODE_OPTS="-q"
export DSH_REMOTE_CMD=/usr/bin/ssh
export DCP_NODE_RCP=/usr/bin/scp
In the output from .profile above, you'll notice that variable DSH_NODE_LIST is set to /root/hostlist. You can update this to any file name you like. The DSH_NODE_LIST variable points to a text file with server names in them (1 per line), that you will use for the dsh command. Basically, every hostname of a server that you put in the list that DSH_NODE_LIST refers to, will be used to run a command on using the dsh command. So, if you put 3 hostnames in the file, and then run a dsh command, that command will be executed on these 3 hosts in parallel.

Note: You may also use the environment variable WCOLL instead of DSH_NODE_LIST.

So, create file /root/hostlist (or any file that you've configured for environment variable DSH_NODE_LIST), and add hostnames in it. For example:
# cat /root/hostlist
Next, you'll have to set up the ssh keys for every host in the hostlist file. The dsh command uses ssh to run commands, so you'll have to enable password-less ssh communication from the host where you've installed dsh on (let's call that the source host), to all the hosts where you want to run commands using dsh (and we'll call those the target hosts).

To set this up, follow these steps:
  • Run "ssh-keygen -t rsa" as user root on the source and all target hosts.
  • Next, copy the contenst of ~root/.ssh/ from the source host into file ~root/.ssh/authorized_keys on all the target hosts.
  • Test if you can ssh from the source hosts, to all the target hosts, by running: "ssh host1 date", for each target host. If you're using DNS, and have fully qualified domain names configured for your hosts, you will want to test by performing a ssh to the fully qualified domain name instead, for example: "ssh". This is because dsh will also resolve hostnames through DNS, and thus use these instead of the short host names. You will be asked a question when you run ssh for the first time from the source host to the target host. Answer "yes" to add an entry to the known_host file.
Now, ensure you log out from the source hosts, and log back in again as root. Considering that you've set some environment variables in .profile for user root, it is necessary that file .profile gets read, which is during login of user root.

At this point, you should be able to issue a command on all the target hosts, at the same time. For example, to run the "date" command on all the servers:
# dsh date
Also, you can now copy files using dcp (notice the similarity between ssh and dsh, and scp and dcp), for example to copy a file /etc/exclude.rootvg from the source host to all the target hosts:
# dcp /etc/exclude.rootvg /etc/exclude.rootvg
Note: dsh and dcp are very powerful to run commands on multiple servers, or to copy files to multiple servers. However, keep in mind that they can be very destructive as well. A command, such as "dsh halt -q", will ensure you halt all the servers at the same time. So, you probably may want to triple-check any dsh or dcp commands that you want to run, before actually running them. That is, if you value your job, of course.

Topics: AIX, System Administration

Copy printer configuration from one AIX system to another

The following procedure can be used to copy the printer configuration from one AIX system to another AIX system. This has been tested using different AIX levels, and has worked great. This is particularly useful if you have more than just a few printer queues configured, and configuring all printer queues manually would be too cumbersome.

  1. Create a full backup of your system, just in case something goes wrong.
  2. Run lssrc -g spooler and check if qdaemon is active. If not, start it with startsrc -s qdaemon.
  3. Copy /etc/qconfig from the source system to the target system.
  4. Copy /etc/hosts from the source system to the target system, but be careful to not lose important entries in /etc/hosts on the target system (e.g. the hostname and IP address of the target system should be in /etc/hosts).
  5. On the target system, refresh the qconfig file by running: enq -d
  6. On the target system, remove all files in /var/spool/lpd/pio/@local/custom, /var/spool/lpd/pio/@local/dev and /var/spool/lpd/pio/@local/ddi.
  7. Copy the contents of /var/spool/lpd/pio/@local/custom on the source system to the target system into the same folder.
  8. Copy the contents of /var/spool/lpd/pio/@local/dev on the source system to the target system into the same folder.
  9. Copy the contents of /var/spool/lpd/pio/@local/ddi on the source system to the target system into the same folder.
  10. Create the following script, called, and run it:
    let counter=0
    cp /usr/lpp/printers.rte/inst_root/var/spool/lpd/pio/@local/smit/* \
    cd /var/spool/lpd/pio/@local/custom
    chmod 775 /var/spool/lpd/pio/@local/custom
    for FILE in `ls` ; do
       let counter="$counter+1"
       chmod 664 $FILE
       QNAME=`echo $FILE | cut -d':' -f1`
       DEVICE=`echo $FILE | cut -d':' -f2`
       echo $counter : chvirprt -q $QNAME -d $DEVICE
       chvirprt -q $QNAME -d $DEVICE
  11. Test and confirm printing is working.
  12. Remove file

Topics: AIX, System Administration

Running bootp in debug mode to troubleshoot NIM booting

If you have a LPAR that is not booting from your NIM server, and you're certain the IP configuration on the client is correct, for example by completing a successful ping test, then you should have a look at the bootp process on the NIM server as a possible cause of the issue.

To accomplish this, you can put bootp into debug mode. Edit file /etc/inetd.conf, and comment out the bootps entry with a hash mark (#). This will help to avoid bootp being started by the inetd in response to a bootp request. Then refresh the inetd daemon, to pick up the changes to file /etc/inetd.conf:

# refresh -s inetd
Now check if any bootpd processes are running. If necessary, use kill -9 to kill them. Again check if no more bootpd processes are active. Now that bootp has stopped go ahead and bring up another PuTTY window on your NIM master. You'll need another window opened, because putting bootp into debug is going to lock the window, while it is active. Run the following command in that window:
# bootpd -d -d -d -d -s
Now you can retry to boot the LPAR from your NIM master, and you should see information scrolling by of what is going on.

Afterwards, once you've identified the issue, make sure to stop the bootpd process (just hit ctrl-c to make it stop), and change file /etc/inetd.conf back the way it was, and run refresh -s inetd to refresh it again.

Topics: AIX, Storage, System Administration

Allocating shared storage to VIOS clients

The following is a procedure to add shared storage to a clustered, virtualized environment. This assumes the following: You have a PowerHA cluster on two nodes, nodeA and nodeB. Each node is on a separate physical system, and each node is a client of a VIOS. The storage from the VIOS is mapped as vSCSI to the client. Client nodeA is on viosA, and client nodeB is on viosB. Futhermore, this procedure assumes you're using SDDPCM for multi-pathing on the VIOS.

First of all, have your storage admin allocate and zone shared LUN(s) to the two VIOS. This needs to be one or more LUNs that is zoned to both of the VIOS. This procedure assumes you will be zoning 4 LUNs of 128 GB.

Once that is completed, then move to work on the VIOS:


First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.

# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
Change PVID for the disks (repeat for all the LUNs):
# chdev -l hdiskX -a pv=yes
Next, map the new LUN from viosA to the nodeA lpar. You'll need to know 2 things here: [a] What vhost adapter (or "vadapter) to use, and [b] what name to give the new device (or "virtual target device"). Have a look at the output of the "lsmap -all" command that you ran previously. That will provide you information on the current naming scheme for the virtual target devices. Also, it will show you what vhost adapters already exist, and are in use for the client. In this case, we'll assume the vhost adapter is vhost0, and there are already some virtual target devices, called: nodeA_vtd0001 through nodeA_vtd0019. The new four LUNs therefore will be named: nodeA_vtd0020 through nodeA_vtd0023. We'll also assume the new disks are numbered hdisk44 through hdisk47.
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeA_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeA_vtd0021
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeA_vtd0022
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeA_vtd0023
Now the mapping of the LUNs is complete on viosA. You'll have to repeat the same process on viosB:


First, gather some system information as user root on the VIOS, and save this information to a file for safe-keeping.
# lspv
# lsdev -Cc disk
# /usr/ios/cli/ioscli lsdev -virtual
# lsvpcfg
# datapath query adapter
# datapath query device
# lsmap -all
Discover new SAN LUNs (4 * 128 GB) as user padmin on the VIOS. This can be accomplished by running cfgdev, the alternative to cfgmgr on the VIOS. Once that has run, identify the 4 new hdisk devices on the system, and run the "bootinfo -s" command to determine the size of each of the 4 new disks:
# cfgdev
# lspv
# datapath query device
# bootinfo -s hdiskX
No need to set the PVID this time. It was already configured on viosA, and after running the cfgdev command, the PVID should be visible on viosB, and it should match the PIVIDs on viosA. Make sure this is correct:
# lspv
Map the new LUN from viosB to the nodeB lpar. Again, you'll need to know the vadapter and the virtual target device names to use, and you can derive that information by looking at the output of the "lsmap -all" command. If you've done your work correctly in the past, the naming of the vadapter and the virtual target devices will probably be the same on viosB as on viosA:
# mkvdev -vdev hdisk44 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk45 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk46 -vadapter vhost0 -dev nodeB_vtd0020
# mkvdev -vdev hdisk47 -vadapter vhost0 -dev nodeB_vtd0020
Now that the mapping on both the VIOS has been completed, it is time to move to the client side. First, gather some information about the PowerHA cluster on the clients, by running as root on the nodeA client:
# clstat -o
# clRGinfo
# lsvg |lsvg -pi
Run cfgmgr on nodeA to discover the mapped LUNs, and then on nodeB:
# cfgmgr
# lspv
Ensure that the disk attributes are correctly set on both servers. Repeat the following command for all 4 new disks:
# chdev -l hdiskX -a algorithm=fail_over -a hcheck_interval=60 -a queue_depth=20 -a reserve_policy=no_reserve
Now you can add the 4 new added physical volumes to a shared volume group. In our example, the shared volume group is called sharedvg, and the newly discovered disks are called hdisk55 through hdisk58. Finally, the concurrent resource group is called concurrent_rg.
# /usr/es/sbin/cluster/sbin/cl_extendvg -cspoc -g'concurrent_rg' -R'nodeA' sharedvg hdisk55 hdisk56 hdisk57 hdisk58
Next, you can move forward to creating logical volumes (and file systems if necessary), for example, when creating raw logical volumes for an Oracle database:
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw5 sharedvg 1023 hdisk55
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw6 sharedvg 1023 hdisk56
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw7 sharedvg 1023 hdisk57
# /usr/es/sbin/cluster/sbin/cl_mklv -TO -t raw -R'nodeA' -U oracle -G dba -P 600 -y asm_raw8 sharedvg 1023 hdisk58
Finally, verify the volume group:
# lsvg -p sharedvg
# lsvg sharedvg
# ls -l /dev/asm_raw*
If necessary, these are the steps to complete, if the addition of LUNs has to be backed out:
  1. Remove the raw logical volumes (using the cl_rmlv command)
  2. Remove the added LUNs from the volume group (using the cl_reducevg command)
  3. Remove the disk devices on both client nodes: rmdev -dl hdiskX
  4. Remove LUN mappings from each VIOS (using the rmvdev command)
  5. Remove the LUNs frome each VIOS (using the rmdev command)

Topics: AIX, System Administration

Export and import PuTTY sessions

PuTTY itself does not provide a means to export the list of sessions, nor a way to import the sessions from another computer. However, it is not so difficult, once you know that PuTTY stores the session information in the Windows Registry.

To export the Putty sessions, run:

regedit /e "%userprofile%\desktop\putty-sessions.reg" HKEY_CURRENT_USER\Software\SimonTatham\PuTTY\Sessions
Or, to export just all settings (and not only the sessions, run:
regedit /e "%userprofile%\desktop\putty.reg" HKEY_CURRENT_USER\Software\SimonTatham
This will create either a putty-sessions.reg or putty.reg file on your Windows dekstop. You can transfer these files over to another computer, and after installing PuTTY on the other computer, simply double-click on the reg file, to have the Windows Registry entries added. Then, if you start up PuTTY, all the sessions information should be there.

Topics: AIX, Storage, System Administration

Identifying a Disk Bottleneck Using filemon

This blog will display the steps required to identify an IO problem in the storage area network and/or disk arrays on AIX.

Note: Do not execute filemon with AIX 6.1 Technology Level 6 Service Pack 1 if WebSphere MQ is running. WebSphere MQ will abnormally terminate with this AIX release.

Running filemon: As a rule of thumb, a write to a cached fiber attached disk array should average less than 2.5 ms and a read from a cached fiber attached disk array should average less than 15 ms. To confirm the responsiveness of the storage area network and disk array, filemon can be utilized. The following example will collect statistics for a 90 second interval.

# filemon -PT 268435184 -O pv,detailed -o /tmp/filemon.rpt;sleep 90;trcstop

Run trcstop command to signal end of trace.
Tue Sep 15 13:42:12 2015
System: AIX 6.1 Node: hostname Machine: 0000868CF300
[filemon: Reporting started]
# [filemon: Reporting completed]

[filemon: 90.027 secs in measured interval]
Then, review the generated report (/tmp/filemon.rpt).
# more /tmp/filemon.rpt
Detailed Physical Volume Stats   (512 byte blocks)

VOLUME: /dev/hdisk11  description: XP MPIO Disk P9500   (Fibre)
reads:                  437296  (0 errs)
  read sizes (blks):    avg     8.0 min       8 max       8 sdev     0.0
  read times (msec):    avg   11.111 min   0.122 max  75.429 sdev   0.347
  read sequences:       1
  read seq. lengths:    avg 3498368.0 min 3498368 max 3498368 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067240
  seek dist (%tot blks):init 4.87525
time to next req(msec): avg   0.206 min   0.018 max 461.074 sdev   1.736
throughput:             19429.5 KB/sec
utilization:            0.77

VOLUME: /dev/hdisk12  description: XP MPIO Disk P9500   (Fibre)
writes:                 434036  (0 errs)
  write sizes (blks):   avg     8.1 min       8 max      56 sdev     1.4
  write times (msec):   avg   2.222 min   0.159 max  79.639 sdev   0.915
  write sequences:      1
  write seq. lengths:   avg 3498344.0 min 3498344 max 3498344 sdev     0.0
seeks:                  1       (0.0%)
  seek dist (blks):     init 3067216
  seek dist (%tot blks):init 4.87521
time to next req(msec): avg   0.206 min   0.005 max 536.330 sdev   1.875
throughput:             19429.3 KB/sec
utilization:            0.72
In the above report, hdisk11 was the busiest disk on the system during the 90 second sample. The reads from hdisk11 averaged 11.111 ms. Since this is less than 15 ms, the storage area network and disk array were performing within scope for reads.

Also, hdisk12 was the second busiest disk on the system during the 90 second sample. The writes to hdisk12 averaged 2.222 ms. Since this is less than 2.5 ms, the storage area network and disk array were performing within scope for writes.

Other methods to measure similar information:

You can use the topas command using the -D option to get an overview of the most busiest disks on the system:
# topas -D
In the output, columns ART and AWT provide similar information. ART stands for the average time to receive a response from the hosting server for the read request sent. And AWT stands for the average time to receive a response from the hosting server for the write request sent.

You can also use the iostat command, using the -D (for drive utilization) and -l (for long listing mode) options:
# iostat -Dl 60
This will provide an overview over a 60 second period of your disks. The "avg serv" column under the read and write sections will provide you average service times for reads and writes for each disk.

An occasional peak value recorded on a system, doesn't immediately mean there is a disk bottleneck on the system. It requires longer periods of monitoring to determine if a certain disk is indeed a bottleneck for your system.

Topics: AIX, System Administration

Commands to create printer queues

Here are some commands to add a printer to an AIX system. Let's assume that the hostname of the printer is "printer", and that you've added an entry for this "printer" in /etc/hosts, or that you've added it to DNS, so it can be resolved to an IP address. Let's also assume that the queue you wish to make will be called "printerq", and that your printer can communicate on port 9100.

In that case, to create a generic printer queue, the command will be:

# /usr/lib/lpd/pio/etc/piomkjetd mkpq_jetdirect -p 'generic' -D asc \
-q 'printerq' -h 'printer' -x '9100'

In case you wish to set it up as a postscript printer, called "printerqps", then the command will be:
# /usr/lib/lpd/pio/etc/piomkjetd mkpq_jetdirect -p 'generic' -D ps \
-q 'printerqps' -h 'printer' -x '9100'

Topics: AIX, Monitoring, Networking, Red Hat, Security, System Administration

Determining type of system remotely

If you run into a system that you can't access, but is available on the network, and have no idea what type of system that is, then there are few tricks you can use to determine the type of system remotely.

The first one, is by looking at the TTL (Time To Live), when doing a ping to the system's IP address. For example, a ping to an AIX system may look like this:

# ping
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=253 time=0.394 ms
TTL (Time To Live) is a timer value included in packets sent over networks that tells the recipient how long to hold or use the packet before discarding and expiring the data (packet). TTL values are different for different Operating Systems. So, you can determine the OS based on the TTL value. A detailed list of operating systems and their TTL values can be found here. Basically, a UNIX/Linux system has a TTL of 64. Windows uses 128, and AIX/Solaris uses 254.

Now, in the example above, you can see "ttl=253". It's still an AIX system, but there's most likely a router in between, decreasing the TTL with one.

Another good method is by using nmap. The nmap utility has a -O option that allows for OS detection:
# nmap -O -v | grep OS
Initiating OS detection (try #1) against (
OS details: IBM AIX 5.3
OS detection performed.
Okay, so it isn't a perfect method either. We ran the nmap command above against an AIX 7.1 system, and it came back as AIX 5.3 instead. And sometimes, you'll have to run nmap a couple of times, before it successfully discovers the OS type. But still, we now know it's an AIX system behind that IP.

Another option you may use, is to query SNMP information. If the device is SNMP enabled (it is running a SNMP daemon and it allows you to query SNMP information), then you may be able to run a command like this:
# snmpinfo -h -m get -v sysDescr.0
sysDescr.0 = "IBM PowerPC CHRP Computer
Machine Type: 0x0800004c Processor id: 0000962CG400
Base Operating System Runtime AIX version: 06.01.0008.0015
TCP/IP Client Support  version: 06.01.0008.0015"
By the way, the example for SNMP above is exactly why AIX Health Check generally recommends to disable SNMP, or at least to dis-allow providing such system information trough SNMP by updating the /etc/snmpdv3.conf file appropriately, because this information can be really useful to hackers. On the other hand, your organization may use monitoring that relies of SNMP, in which case it needs to be enabled. But then you stil have the opportunity of changing the SNMP community name to something else (the default is "public"), which also limits the remote information gathering possibilities.

Topics: AIX, System Administration

Resolving IBM.DRM software errors

If you see several SRC_RSTRT errors in the error report regarding IBM.DRM or IBM.AuditRM, using identifiers CB4A951F or BA431EB7, and detecting module "srchevn.c", then you are probably having a system that has been cloned in the past from another system, and the RSCT software is using the keys of the original system.

The solution is this:

# /usr/sbin/rsct/bin/rmcctrl -z 
# /usr/sbin/rsct/bin/rmcctrl -d 
# /usr/sbin/rsct/install/bin/recfgct -s 
# /usr/sbin/rsct/bin/rmcctrl -A 
# /usr/sbin/rsct/bin/rmcctrl -p 
This will generate new keys, and will solve the errors in the error report. Just to make sure, reboot your system, and they should no longer show up in the error report after the reboot.

Topics: AIX, Red Hat, Security, System Administration

System-wide separated shell history files for each user and session

Here's how you can set up your /etc/profile in order to create a separate shell history file for each user and each login session. This is very useful when you need to know who exactly ran a specific command at a point in time. For Red Hat Linux, put the updates in either /etc/profile or /etc/bashrc.

Put this in /etc/profile on all servers:

# execute only if interactive
if [ -t 0 -a "${SHELL}" != "/bin/bsh" ]
 d=`date "+%H%M.%m%d%y"`
 t=`tty | cut -c6-`
 u=`who am i | awk '{print $1}'`
 w=`who -ms | awk '{print $NF}' | sed "s/(//g" | sed "s/)//g"`
 y=`tty | cut -c6- | sed "s/\//-/g"`
 mkdir $HOME/.history.$USER 2>/dev/null
 export HISTFILE=$HOME/.history.$USER/.sh_history.$USER.$u.$w.$y.$d
 find $HOME/.history.$USER/.s* -type f -ctime +91 -exec rm {} \; \

 H=`uname -n | cut -f1 -d'.'`
 if [ ${mywhoami} = "root" ] ; then
  PS1='${USER}@(${H}) ${PWD##/*/} # '
  PS1='${USER}@(${H}) ${PWD##/*/} $ '

# Time out after 60 minutes
# Use readonly if you don't want users to be able to change it.
# readonly TMOUT=3600
export TMOUT
For AIX, put this in /etc/environment, to turn on time stamped history files:
# Added for extended shell history
For Red Hat, put this in /etc/bashrc, to enable time-stamped output when running the "history" command:
This way, *every* user on the system will have a separate shell history in the .history directory of their home directory. Each shell history file name shows you which account was used to login, which account was switched to, on which tty this happened, and at what date and time this happened.

Shell history files are also time-stamped internally. For AIX, you can run "fc -t" to show the shell history time-stamped. For Red Hat, you can run: "history". Old shell history files are cleaned up after 3 months, because of the find command in the example above. Plus, user accounts will log out automatically after 60 minutes (3600 seconds) of inactivity, by setting the TMOUT variable to 3600. You can avoid running into a time-out by simply typing "read" or "\" followed by ENTER on the command line, or by adding "TMOUT=0" to a user's .profile, which essentially disables the time-out for that particular user.

One issue that you now may run into on AIX, is that because a separate history file is created for each login session, that it will become difficult to run "fc -t", because the fc command will only list the commands from the current session, and not those written to a different history file. To overcome this issue, you can set the HISTFILE variable to the file you want to run "fc -t" for:
# export HISTFILE=.sh_history.root.user.
Then, to list all the commands for this history file, make sure you start a new shell and run the "fc -t" command:
# ksh "fc -t -10"
This will list the last 10 commands for that history file.

Number of results found for topic AIX: 225.
Displaying results: 1 - 10.