Linux problem determination tools to troubleshoot issues

The Linux problem determination tools and facilities are free, which begs the question: Why not install them? Without these tools, a simple problem can turn into a long and painful ordeal that can affect a business and/or your personal time. Before reading through the rest of the article, take some time to make sure the following tools are installed on your system(s). These tools are just waiting to make your life easier and/or your business more productive.

Linux Problem determination tools

  1. strace :  The strace tool traces the system calls, special functions that interact with the operating system. You can use this for many types of problems, especially those that relate to the operating system.
  2. ltrace :  The ltrace tool traces the functions that a process calls. This is similar to strace, but the called functions provide more detail.
  3. lsof : The lsof tool lists all of the open files on the operating system (OS). When a file is open, the OS returns a numeric file descriptor to the process to use. This tool lists all of the open files on the OS with their respective process IDs and file descriptors.
  4. top : This tool lists the “top” processes that are running on the system. By default it sorts by the amount of current CPU being consumed by a process.
  5. traceroute/tcptraceroute : These tools can be used to trace a network route (or at least one direction of it).
  6. ping: Ping simply checks whether a remote system can respond. Sometimes firewalls block the network packets ping uses, but it is still very useful. 3
  7. hexdump or equivalent : This is simply a tool that can display the raw contents of a file.
  8. tcpdump and/or ethereal : Used for network problems, these tools can display the packets of network traffic.
  9. GDB : This is a powerful debugger that can be used to investigate some of the more difficult problems.
  10. readelf : This tool can read and display information about various sections of an Executable and Linking Format (ELF) file.

Lets see how to utilize them in a problem situation. Those are Linux problem determination tools

strace command

Strace is a debugging tool that will help you troubleshoot issues.

Strace monitors the system calls and signals of a specific program. It is helpful when you do not have the source code and would like to debug the execution of a program. strace provides you the execution sequence of a binary from start to end.

Trace the Execution of an Executable

You can use strace command to trace the execution of any executable. The following example shows the output of strace for the Linux mkdir command.

Trace a Specific System Calls in an Executable Using Option -e

Be default, strace displays all system calls for the given executable. To display only a specific system call, use the strace -e option as shown below.

ARKIT~]# strace -e open ls

strace command

The above output displays only the open system call of the ls command. At the end of the strace output, it also displays the output of the ls command.

If you want to trace multiple system calls use the “-e trace=” option. The following example displays both open and read system calls.

ARKIT~]# strace -e trace=open,read ls /home

Save the Trace Execution to a File Using Option -o

The following examples stores the strace output to output.txt file.

ARKIT ~]# strace -o output.txt ls

Print Time stamp for Each Trace Output Line Using Option -t

To print the time stamp for each strace output line, use the option -t as shown below.

ARKIT ~]# strace -t -e open ls /home

ltrace Command

Installing the ltrace command

[root@ARKIT ~]# yum install ltrace
Loaded plugins: langpacks, product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package ltrace.x86_64 0:0.7.91-11.el7 will be installed
--> Finished Dependency Resolution

It traces the calls to the library function. It executes the program in that process.

~]# ltrace /usr/bin/w

The output is shown below.

ltrace command

ltrace command

You can observe that there is a set of calls to getutxent and its family of library function. You can also note that ltrace gives the results in the order the functions are called in the program.

Now we know that ‘w’ command works by calling the getutxent and its family of function to get the logged in users.
Using ltrace

One does NOT need any root privileges to run ltrace.

Using ltrace to debug an application executable

ARKIT ~]# ltrace ./executable <parameters>

Using ltrace to debug a running process

ARKIT ~]# ltrace -p <PID>

Now we know how to use ltrace for debugging applications. For more better intricate functionalities like breakpoints, stepping through, etc use GDB. However, most of the bugs can be resolved using ltrace/strace by knowing and analyzing the calls and their success and failure errors.

lsof command

lsof stands for List Open Files.

It is easy to remember lsof command if you think of it as “ls + of”, where ls stands for list, and of stands for open files.

It is a command line utility which is used to list the information about the files that are opened by various processes. In unix, everything is a file, ( pipes, sockets, directories, devices, etc.). So by using lsof, you can get the information about any opened files.

Introduction to lsof

Simply typing lsof will provide a list of all open files belonging to all active processes.

ARKIT ~]# lsof

By default One file per line is displayed. Most of the columns are self explanatory. We will explain the details about couple of cryptic columns (FD and TYPE).

FD – stands for File descriptor and may seen some of the values as:

cwd current working directory
rtd root directory
txt program text (code and data)
mem memory-mapped file

Also in FD column numbers like 1u is actual file descriptor and followed by u,r,w of it’s mode as:

r for read access.
w for write access.
u for read and write access.

TYPE – of files and it’s identification.

DIR – Directory
REG – Regular file
CHR – Character special file.
FIFO – First In First Out

The below command will display the list of all opened files of user root

ARKIT ~]# lsof -u root

Find Processes running on Specific Port

To find out all the running process of specific port, just use the following command with option -i. The below example will list all running process of port 22.

ARKIT ~]# lsof -i TCP:22

List Open Files of TCP Port ranges 1-1024

To list all the running process of open files of TCP Port ranges from 1-1024.

ARKIT ~]# lsof -i TCP:1-1024

Exclude User with ‘^’ Character

Here, we have excluded root user. You can exclude a particular user using ‘^’ with command as shown above.

ARKIT ~]# lsof -i -u^root

lsof command

top command

Display of Top Command

In this example, it will show information like tasks, memory, cpu and swap. Press ‘q‘ to quit window.

# top

press (shift +f ) to see more options

you can see more detailed information how much memory is used by which process.

top command

traceroute / tcptraceroute

Before beginning with examples, lets understand the concept on which traceroute works.

Traceroute utility uses the TTL field in the IP header to achieve its operation. For users who are new to TTL field, this field describes how much hops a particular packet will take while traveling on network.

So, this effectively outlines the lifetime of the packet on network. This field is usually set to 32 or 64. Each time the packet is held on an intermediate router, it decreases the TTL value by 1. When a router finds the TTL value of 1 in a received packet then that packet is not forwarded but instead discarded.

After discarding the packet, router sends an ICMP error message of “Time exceeded” back to the source from where packet generated. The ICMP packet that is sent back contains the IP address of the router.

So now it can be easily understood that traceroute operates by sending packets with TTL value starting from 1 and then incrementing by one each time. Each time a router receives the packet, it checks the TTL field, if TTL field is 1 then it discards the packet and sends the ICMP error packet containing its IP address and this is what traceroute requires. So traceroute incrementally fetches the IP of all the routers  between the source and the destination.

You should also understand the IP header fields that we discussed a while back.
Traceroute Examples

1. How to run traceroute?

ARKIT ~]# traceroute 192.168.234.135

traceroute – print the route packets trace to network host. traceroute  tracks  the route packets taken from an IP network on their way to a given host. It utilizes the IP protocol’s time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.

traceroute6 is equivalent to traceroute -6

The only required parameter is the name or IP address of the destination host .  The optional packet_len`gth is  the  total  size  of  the  probing packet (default 60 bytes for IPv4 and 80 for IPv6). The specified size can be ignored in some situations or increased up to a minimal value.

This program attempts to trace the route an IP packet would follow to some internet host by launching probe packets with a small ttl (time to live) then listening for an ICMP “time exceeded” reply from a gateway.  We start our probes with a ttl of one and increase by one until we  get  an  ICMP “port  unreachable”  (or TCP reset), which means we got to the “host”, or hit a max (which defaults to 30 hops). Three probes (by default) are sent at each ttl setting and a line is printed showing the ttl, address of the gateway and round trip time of each probe. The address can be followed by additional  information  when  requested.

traceroute command

ping command

As you already know, ping command is used to find out whether the peer host/gateway is reachable.

If you are thinking ping is such a simple command and why do I need examples, you should read the rest of the article.

Ping command provides lot more options than what you might already know.

Increase or Decrease the Time Interval Between Packets

#ping -i 5 192.168.234.135

Note: Only super user can specify interval less than 0.2 seconds. If not, you’ll get the following error message.

Check whether the local network interface is up and running

ARKIT ~]#ping 0

Send N packets and stop

#ping -c 3 192.168.234.135

Show Version and Exit

#ping -V

Flood the network

Super users can send hundred or more packets per second using -f option. It prints a ‘.’ when a packet is sent, and a backspace is printed when a packet is received.

As shown below, ping -f has sent more than 400,000 packets in few seconds.

ARKIT ~]# ping -f localhost

Audible ping:  Give beep when the peer is reachable

This option is useful for sysadmin during troubleshooting. There is no need for you to look at the ping output after each and every change. You can continue working with your changes, and when the remote machine become reachable you’ll hear the beep automatically.

ping -a 192.168.234.135

ping command

hexdump command

To display file contents in decimal format (hexdump -d file_path)

ARKIT ~]# hexdump -d sample.txt
  0000000   12849   13363   00010
  0000005

As we know that bytes are taken in reversed order in hexdump, So here “12″ is taken as “21″.
“21″ in ASCII Format – 5049.
In Binary format – [00110010][00110001].
“-d” option takes this sequence in one order and outputs the result as a whole.So here “0011001000110001″ is taken and it’s decimal representation i.e 12849 is shown in the output.

ethereal command

Ethereal is available for a wide range of platforms, including Linux, Windows, and several UNIX platforms. Although Ethereal has a command-line interface, it requires that GTK+ be installed on the system on which it is being built. Ethereal also relies on libpcap. I’ll build it from source on Linux in this section. However, when you install it, it is important to ensure that you have Ethereal and the two software packages that it relies on.

tcpdump command

Tcpdump is a most powerful and widely used command-line packets sniffer or package analyzer tool which is used to capture or filter TCP/IP packets that received or transferred over a network on a specific interface. It is available under most of the Linux/Unix based operating systems.

tcpdump also gives us a option to save captured packets in a file for future analysis. It saves the file in a pcap format, that can be viewed by tcpdump command or a open source GUI based tool called Wireshark (Network Protocol Analyzier) that reads tcpdump pcap format files.

How to Install tcpdump in Linux

Many of Linux distributions already shipped with tcpdump tool, if in case you don’t have it on systems, you can install it using following Yum command.

ARKIT ~]# yum install tcpdump

Once tcpdump tool is installed on systems, you can continue to browse following commands with their examples.

Capture Packets from Specific Interface

Capture Only N Number of Packets

When you run tcpdump command it will capture all the packets for specified interface, until you Hit cancel button. But using -c option, you can capture specified number of packets. The below example will only capture 6 packets.

ARKIT ~]# tcpdump -c 5 -i eth0

Print Captured Packets in ASCII

The below tcpdump command with option -A displays the package in ASCII format. It is a character-encoding scheme format.

ARKIT ~]# tcpdump -A -i eth0

Display Available Interfaces

To list number of available interfaces on the system, run the following command with -D option.

ARKIT ~]# tcpdump -D

tcpdump comand

tcpdump comand

GDB

GDB offers a big list of commands, however the following commands are the ones used most frequently

The purpose of a debugger such as GDB is to allow you to see what is going on “inside” another program while it executes — or what another program was doing at the moment it crashed.

GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

·   Start your program, specifying anything that might affect its behavior.
·   Make your program stop on specified conditions.
·   Examine what has happened, when your program has stopped.
·   Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

How to use GDB program

You can use GDB to debug programs written in C, C@t{++}, Fortran and Modula-2.

b main - Puts a breakpoint at the beginning of the program
b - Puts a breakpoint at the current line
  b N - Puts a breakpoint at line N
  b +N - Puts a breakpoint N lines down from the current line
  b fn - Puts a breakpoint at the beginning of function "fn"
  b N - Deletes breakpoint number N info break - list breakpoints
  r - Runs the program until a breakpoint or error
  c - Continues running the program until the next breakpoint or error
  f - Runs until the current function is finished
  s - Runs the next line of the programs N - Runs the next N lines of the program
  n - Like s, but it does not step into functions
  u N - Runs until you get N lines in front of the current line
  p var - Prints the current value of the variable "var"
  bt - Prints a stack trace
  u - Goes up a level in the stack
  d - Goes down a level in the stack
  q - Quits gdb

readelf Command – Linux problem determination tools

Examining ELF header.

The produced binary will be our examination target. Let’s start with the content of the ELF header:

ARKIT ~]# readelf -h testELF

Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2’s complement, little endian Version: 1 (current) OS/ABI: UNIX – System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x80482c0 Start of program headers: 52 (bytes into file) Start of section headers: 2060 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 7 Size of section headers: 40 (bytes) Number of section headers: 28 Section header string table index: 25

What does this header tell us?

  • This executable is created for Intel x86 32 bit architecture (“machine” and “class” fields).
  • When executed, program will start running from virtual address 0x80482c0 (see entry point address). The “0x” prefix here means it is a hexadecimal number. This address doesn’t point to our main() procedure, but to a procedure named _start. Never felt you had created such thing? Of course you don’t. _start procedure is created by the linker whose purpose is to initialize your program.
  • This program has a total of 28 sections and 7 segments.

Conclusion

All of the above Linux problem determination tools / commands are used to troubleshoot a problem in deep dive.

If you like my Linux problem determination tools article please do remember to post your feedback as a comment…

Related Articles

netstat command to monitor Network statistics

Linux Performance Monitoring using vmstat command

Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools Linux problem determination tools  Linux problem determination tools Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools  Linux problem determination tools UPdate Java

Thanks for your wonderful Support and Encouragement

blank

Ankam Ravi Kumar

Working as Linux / Storage Administrator L3. Interested in sharing the knowledge.

Leave a Reply

Your email address will not be published. Required fields are marked *