In order to better understand or troubleshoot your system you need to know how to gather information from the system in trouble. I will divide this section into two parts, Hardware & Software, and to finish this article I added a section for maintenance and networking with some options for your maintenance routine and some handy networking tools.
Gather Hardware information
A system administrator should always know about the hardware in the system that he works with. If you did the initial system set-up then you probably know about the hardware but what if you take over a system that has been maintained by someone else? The following commands will help you.
lscpu – will give you information about the CPU in your system. Lets have a look at the output of lscpu from one of my systems. Just type lscpu and hit enter and you will see something similar to this:
CPU op-mode(s): 64-bit
Thread(s) per core: 1
Core(s) per socket: 2
CPU socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 16
CPU MHz: 800.000
L1d cache: 64K
L1i cache: 64K
L2 cache: 1024K
In this case it is a AMD 64-Bit CPU with two cores, 800 MHz bus speed and with Virtualization Instruction Set AMD-V. We now have a good idea of the features of the CPU but still don’t know exactly what CPU it is. The next command will tell us more , much more. We focus now on the CPU part. The command is lshw and we are going to use it with the short option and pipe the output thru grep looking for the processor. Type in the following command and hit enter.
sudo lshw -short |grep processor
/0/4 processor AMD Athlon(tm) II X2 240 Processor
Great, now we know everything we need to know. Lets go on and use the lshw command just with the –short option and see what other useful information we can gather from it. Type in the following command and hit enter.
sudo lshw -short
H/W path Device Class Description
system Inspiron 546
/0 bus 0F896N
/0/0 memory 64KiB BIOS
/0/4 processor AMD Athlon(tm) II X2 240 Processor
/0/4/5 memory 128KiB L1 cache
/0/4/6 memory 1MiB L2 cache
/0/26 memory 4GiB System Memory
/0/26/0 memory 2GiB DIMM DDR2 Synchronous 667…
/0/26/1 memory 2GiB DIMM DDR2 Synchronous 667…
/0/26/2 memory [empty]
/0/26/3 memory [empty]
/0/1/0 memory 128KiB L1 cache
/0/1/1 memory 1MiB L2 cache
/0/100 bridge RS780 Host Bridge
/0/100/2 bridge RS780 PCI to PCI bridge (ext gfx port
/0/100/2/0 display G86 [GeForce 8400 GS]
/0/100/7 bridge RS780 PCI to PCI bridge (PCIE port 3)
/0/100/7/0 eth0 network RTL8101E/RTL8102E PCI Express
/0/100/11 scsi2 storage SB700/SB800 SATA Controller
/0/100/11/0 /dev/sda disk 320GB WDC WD3200AAKS-7
/0/100/11/0/1 /dev/sda1 volume 288GiB EXT4 volume
/0/100/11/0/2 /dev/sda2 volume 9598MiB Extended partition
/0/100/11/0/2/5 /dev/sda5 volume 9598MiB Linux swap / Solaris partitio
/0/100/11/1 /dev/cdrom disk DVD+-RW DH-16AAS
/0/100/11/1/0 /dev/cdrom disk
/0/100/11/0.0.0 /dev/sdb disk 500GB ST3500641AS
/0/100/11/0.0.0/1 /dev/sdb1 volume 465GiB EXT3 volume
/0/100/12 bus SB700/SB800 USB OHCI0 Controller
/0/100/12.1 bus SB700 USB OHCI1 Controller
/0/100/12.2 bus SB700/SB800 USB EHCI Controller
/0/100/13 bus SB700/SB800 USB OHCI0 Controller
/0/100/13.1 bus SB700 USB OHCI1 Controller
/0/100/13.2 bus SB700/SB800 USB EHCI Controller
/0/100/14 bus SBx00 SMBus Controller
/0/100/14.1 storage SB700/SB800 IDE Controller
/0/100/14.2 multimedia SBx00 Azalia (Intel HDA)
/0/100/14.3 bridge SB700/SB800 LPC host controller
/0/100/14.4 bridge SBx00 PCI to PCI Bridge
/0/101 bridge Family 10h Processor HyperTransport
/0/102 bridge Family 10h Processor Address Map
/0/103 bridge Family 10h Processor DRAM Controller
/0/104 bridge Family 10h Processor Miscellaneous Co
/0/105 bridge Family 10h Processor Link Control
/0/2 scsi6 storage
/0/2/0.0.0 /dev/sdc disk SCSI Disk
/0/2/0.0.1 /dev/sdd disk SCSI Disk
/0/2/0.0.2 /dev/sde disk SCSI Disk
/0/2/0.0.3 /dev/sdf disk SCSI Disk
/0/3 scsi7 storage
/0/3/0.0.0 /dev/sdg disk 1TB SCSI Disk
/0/3/0.0.0/1 /dev/sdg1 volume 931GiB Windows FAT volume
/0/5 scsi8 storage
/0/5/0.0.0 /dev/sdh disk SCSI Disk
Wow that’s a lot of useful information and if you enter that command without the additional option you will receive even more but I am sure you will agree with me that the format above is more appealing for your eyes than the unformatted one. Now lets go a little bit further and see what kind of USB devices we have connected. Type in the command lsusb and hit enter, you should see something similar to the output below:
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 002: ID 413c:2105 Dell Computer Corp. Model L100 Keyboard
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 002: ID 413c:3012 Dell Computer Corp. Optical Wheel Mouse
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 007: ID 0fca:8004 Research In Motion, Ltd.
Bus 002 Device 006: ID 13fd:1340 Initio Corp. Hi-Speed USB to SATA Bridge
Bus 002 Device 005: ID 0409:005a NEC Corp. HighSpeed Hub
Bus 002 Device 004: ID 058f:6362 Alcor Micro Corp. Flash Card Reader/Writer
Bus 002 Device 003: ID 0424:2504 Standard Microsystems Corp. USB 2.0 Hub
Bus 002 Device 002: ID 046d:09a1 Logitech QuickCam Communicate MP/S5500
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Now we see all USB devices and their ID numbers. With the ID numbers we can lookup if a device is supported and what kernel module we can use to make it functional on your Ubuntu System. Just Google for the USB ID and you should find the information you are looking for. Well there are two more commands that will provide you with Hardware information. The commands are lspci and lspcmcia but the commands above should also cover these as well. Just enter those and see what information it will display and decide for your self if it’s useful for you or not.
Gather Software information (Kernel, Driver Module etc.)
Now that we covered the Hardware on your system lets have a look at the software site. I like to start off by determining the Kernel Version, to find this out we use the command uname. Type in uname and hit enter.
Well that was not the information we are looking for. Let’s try the command line switch –a which should display all information.
Linux turrican 2.6.35-23-generic-pae #37-Ubuntu SMP Fri Nov 5 20:57:06 UTC 2010 i686 GNU/Linux
So we got all the information but what if you need just the Kernel revision number for example to incorporate it into a script? Well there are more command line switches. The switch –r will just display the revision number. Type in uname -r and hit enter
Great now we are covered. Now lets see if we can figure out how to display the loaded kernel modules on your running system. You are lucky there is the command lsmod that will do just the job. Type in lsmod and hit enter.
Module Size Used by
nls_iso8859_1 3261 1
nls_cp437 4931 1
vfat 9201 1
fat 48240 1 vfat
nls_utf8 1069 1
udf 79366 1
crc_itu_t 1383 1 udf
binfmt_misc 6599 1
parport_pc 26378 0
ppdev 5556 0
nvidia 9331115 40
snd_usb_audio 86544 1
snd_hda_codec_via 51755 1
snd_hda_intel 22203 2
snd_hda_codec 87552 2 snd_hda_codec_via,snd_hda_intel
snd_pcm 71603 3 snd_usb_audio,snd_hda_intel,snd_hda_codec
snd_hwdep 5040 2 snd_usb_audio,snd_hda_codec
snd_usbmidi_lib 17413 1 snd_usb_audio
snd_seq_midi 4588 0
snd_rawmidi 17783 2 snd_usbmidi_lib,snd_seq_midi
snd_seq_midi_event 6047 1 snd_seq_midi
snd_seq 47174 3 snd_seq_midi,snd_seq_midi_event
uvcvideo 55911 0
snd_timer 19067 2 snd_pcm,snd_seq
snd_seq_device 5744 3 snd_seq_midi,snd_rawmidi,snd_seq
videodev 43098 1 uvcvideo
v4l1_compat 13359 2 uvcvideo,videodev
Now that you see all the modules loaded into your running system you would probably like to know more about some of the modules that your system is using. modinfo displays information about a kernel module that might help to specify certain parameters to load a module or information for troubleshooting purposes, also information about the author, version, license, dependencies etc.
Type in modinfo and hit enter. Lets use the snd module for this demonstration.
description: Advanced Linux Sound Architecture driver for soundcards.
author: Jaroslav Kysela
description: Jack detection support for ALSA
author: Mark Brown
vermagic: 2.6.35-23-generic-pae SMP mod_unload modversions 686
parm: slots: Module names assigned to the slots. (array of charp)
parm: major: Major # for sound driver. (int)
parm: cards_limit: Count of auto-loadable soundcards. (int)
We now know more about a certain module and it’s parameters. What to do with it now? On Linux you can load kernel modules by hand by using the insmod command. The insmod command can load modules and also passes on parameters to the module. For example:
insmod lp reset=1
Lets imagine now that you are experiencing problems to get a module loaded because it is missing a ton of dependent modules. You can use modprobe which will load automatically all dependencies. For example:
Think of the situation that your system starts up but some functionality is missing. Most likely there is a module not loaded during boot time or it has an error. If you remember your Ubuntu system does not display all your information during boot up but there is a command that will display those boot up messages. The command is called dmesg. If you would type in dmesg and hit enter all the information will run down the screen but what if you would like to view those messages page by page? No problem, remember we used something like that earlier in this article. We just pipe the output of dmesg through another program, more or less, or we redirect it to a text file and use an editor to review the file. Here are three ways:
dmesg > bootup.messages
This should give you enough information about your Ubuntu system. In the next section I will show some monitoring tools and some tools and files that will help you with maintenance and network troubleshooting.
Monitoring, Maintenance and Network Troubleshooting
On every Ubuntu Linux system there are some nice tools that will help you to monitor your system. The most used one is probably the process status (ps) command. Type in ps and hit enter. ps without any command line switches, this just displays the processes running under your login and on your console. Which is not really helpful for a system administrator. System administrators like to see what is going on on the entire system. In order to see all processes on the entire system type in ps ax or ps axu and hit enter.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2868 1648 ? Ss 08:39 0:00 /sbin/init
root 2 0.0 0.0 0 0 ? S 08:39 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 08:39 0:00 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S 08:39 0:00 [migration/0]
root 5 0.0 0.0 0 0 ? S 08:39 0:00 [watchdog/0]
root 6 0.0 0.0 0 0 ? S 08:39 0:00 [migration/1]
root 7 0.0 0.0 0 0 ? S 08:39 0:07 [ksoftirqd/1]
root 8 0.0 0.0 0 0 ? S 08:39 0:00 [watchdog/1]
root 9 0.0 0.0 0 0 ? S 08:39 0:00 [events/0]
root 10 0.0 0.0 0 0 ? S 08:39 0:03 [events/1]
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /sbin/init
2 ? S 0:00 [kthreadd]
3 ? S 0:00 [ksoftirqd/0]
4 ? S 0:00 [migration/0]
5 ? S 0:00 [watchdog/0]
6 ? S 0:00 [migration/1]
7 ? S 0:07 [ksoftirqd/1]
8 ? S 0:00 [watchdog/1]
9 ? S 0:00 [events/0]
10 ? S 0:03 [events/1]
I usually use the ps ax because it provides in most cases just enough information like the Process ID (PID) on what terminal it is running and the process name itself but if you like to know who is running a certain process I recommend ps axu.
Process Status is important if you like to stop a process. To stop a process you need to know the start-up script to start, stop or restart it but this can be interruptible for other users who depend on the same process as some other users, so we use the kill command to stop a specific process and therefore we need the process status to determine the process ID. As soon you know the process ID type in kill or if the process wont stop use kill -9 and hit enter. Lets think of the situation that a accounting user is running the application accounting with the process ID 23498. Type in kill -9 23498 and hit enter and the process will be killed. There is also another way to look at the Utilisation of the system and the processes running on it. The tool is called top. Type in top and hit enter and you will see something similar to the picture below.
The next tool I like to show is ‘df‘. This is a tool to display the file system disk space usage. Type in ‘df -ah‘ and hit enter. The option ‘a‘ displays all disks and the option ‘h‘ converts the size in to human readable values.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 285G 111G 160G 42% /
proc 0 0 0 – /proc
none 0 0 0 – /sys
fusectl 0 0 0 – /sys/fs/fuse/connections
none 0 0 0 – /sys/kernel/debug
none 0 0 0 – /sys/kernel/security
none 2.0G 296K 2.0G 1% /dev
none 0 0 0 – /dev/pts
none 2.0G 2.5M 2.0G 1% /dev/shm
none 2.0G 220K 2.0G 1% /var/run
none 2.0G 0 2.0G 0% /var/lock
binfmt_misc 0 0 0 – /proc/sys/fs/binfmt_misc
gvfs-fuse-daemon 0 0 0 – /home/msj/.gvfs
/dev/sr0 7.5G 7.5G 0 100% /media/STAR_TREK_XI_DOM
/dev/sdg1 932G 765G 167G 83% /media/0FFD-432F
Another tool can display the file space usage of a directory. The tool is called ‘du‘ and you can pass on some command line switches, and the one I will show you is the option ‘h‘ for human readable and ‘s‘ to summarise the output on the directory you are looking for. Lets say we would like to see how big the /boot directory is. Type in du -sh /boot and hit enter.
du -sh /boot
if you don’t set the ‘s‘ option it would look like this one
If you do some planned maintenance on your system you will probably want to shutdown, reboot the system at some point and a good administrator will provide also some information to it’s users to get prepared. In order to pass on information to users logging in to the system there are three files we can make use of. The files /etc/issue and /etc/issue.net will display information before someone logs in and the /etc/motd which by the way stands for ‘message of the day’ will display it’s content right after the login above the prompt. Just use your favorite editor and enter the information you like to pass on.
! There is a difference between /etc/issue and /etc/issue.net. /etc/issue is being used for local logins and /etc/issue.net is being used for telnet logins. !
Now that you are done with your maintenance you will either want to reboot your system immediately or to a scheduled time. To reboot the system right away you type in ‘reboot‘ or ‘shutdown -r now‘ and hit enter.
In case you need to replace hardware you need to shut-down the system and you can do that with either the command ‘halt’ or ‘shutdown -h now’. Sometimes you have to schedule a reboot or a system shut-down, how would you do that? You said to use cron? No there is an easier way to do that. As you noticed the shutdown command is not only there to shut-down a system you can reboot it as well. Lets say you have to reboot or shut-down the system at 11pm on the same day and you like to notify also all logged in users. Just type in either command below and hit enter.
shutdown -r 23:00 “System will be rebooted at 11pm, please save all data and log off before reboot”
shutdown -h 23:00 “System will shutdown at 11pm for maintenance”
I recommend when ever you can use the ‘shutdown‘ command over ‘reboot‘ or ‘halt’.
! Remember system commands like ‘reboot’, ‘halt’ & ‘shutdown’ require root rights so use ‘sudo’ to execute this commands. !
There is sometime issues regarding the network configuration or setup and there are some commands that can help figure out what the problem might be. The first and mostly used tool probably is ‘ping‘ use ping to see if a gateway, router, firewall or other servers are answering. Like this:
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_req=1 ttl=13 time=0.206 ms
64 bytes from 192.168.0.1: icmp_req=2 ttl=13 time=0.184 ms
64 bytes from 192.168.0.1: icmp_req=3 ttl=13 time=0.203 ms
If you see a reply like the one above indicates that your server is communicating properly with your network. Next I would try to ping a name like www.google.com
and if you receive a reply then it would appear to be okay. In case you get nothing this could indicate problems with DNS possibly. The next tool I would use is the tracepath
to verify that the destination can be reached. Then I would go ahead and look at the ethernet configuration with ifconfig
and you should see something like this:
eth0 Link encap:Ethernet HWaddr 00:25:64:ec:99:b7
inet addr:192.168.0.100 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::225:64ff:feec:99b7/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:488790 errors:0 dropped:0 overruns:0 frame:0
TX packets:510876 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:209885973 (209.8 MB) TX bytes:288248282 (288.2 MB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:314 errors:0 dropped:0 overruns:0 frame:0
TX packets:314 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:13591 (13.5 KB) TX bytes:13591 (13.5 KB)
In this case it indicates that our ethernet adapter has an IP Address and it is linked up. Now you can verify that the system is using the propper DNS Server by looking at the /etc/resolv.conf file, just type in ‘cat /etc/resolv.conf‘ and you should see something like this:
# Generated by NetworkManager
Well if that looks fine too I would have a look at your routes by issueing the command ‘route -n’ and you would see something like this:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 1 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth0
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
If you see something like this you are good here as well. Now lets say you have some customers that complain that a certain service is not responding on the server you are responsible for. You can check this with commands you already learned above like ps or top. Here is one more command you can use to see which services are running and listening on your system and what TCP/IP Port it is using. Check out the command ‘netstat -l’
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:webmin *:* LISTEN
tcp 0 0 sonic.katakis.at:domain *:* LISTEN
tcp 0 0 localhost:domain *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 *:ipp *:* LISTEN
tcp 0 0 localhost:953 *:* LISTEN
udp 0 0 *:mdns *:*
udp 0 0 *:ipp *:*
udp 0 0 *:10000 *:*
This system seems to have running Webmin, DNS, SSH, IPP etc. and it is listening for that as well. DNS can be recognized by the work domain and IPP is usually C.U.P.S. Which is the Common Unix Printing System. So I hope this will help you to get some information to troubleshoot networking issues you might be confronted with.
If you need more detailed information you can use the ‘man‘ command which will load the manual page of a command with all it’s option and it will also suggest similar commands you might look at to help you troubleshoot. Lets say you would like to know more regarding the netstat command just issue th ‘man netstat’ and it displays a lot of information regarding this command.
Let me finish up this article with two more Ubuntu specific commands which might be helpful as well. First there is the command ‘ubuntu-support-status‘ which will query your system and displays information like the one below:
/usr/bin/ubuntu-support-status:95: DeprecationWarning: Deprecated, please use ‘is_installed’ instead
Support status summary of ‘turrican':
You have 1519 packages (76.8%) supported until April 2012 (18m)
You have 16 packages (0.8%) that can not/no-longer be downloaded
You have 443 packages (22.4%) that are unsupported
Run with –show-unsupported, –show-supported or –show-all to see more details
In this case it indicates that this particular system is only supported to 76.8%. If you experience problems with packages on this system and before you place a support call to Ubuntu/Canonical make sure that the package is in a supported stage.
The last program I would like to talk about is the ‘ubuntu-bug‘ which you can use to troubleshoot audio, display and storage problems and it can be used to generate a bug report which can be sent to Ubuntu for solving. For example if you type in ‘ubuntu-bug audio‘ a graphical troubleshooter appears and helps you to solve possible audio issues and if all those steps don’t help it will generate a report that can be sent to Ubuntu or in another scenario lets say you experience issues with Firefox, just run ‘ubuntu-bug firefox‘ and it gathers all information regarding firefox and possible crash reports which can be sent to Ubuntu.
Thanks for reading this article. I hope it was useful to you. Please visit my website for more Articles and screencasts at http://www.ubuntuvideocast.com
and I would appreciate it very much if you could leave a comment or sign up for it.