  • Text Processing Unix Commands

    Text processing Unix commands are those affecting text and text files. For example, commands like cut, paste, cmp, sort, comm, head, tail, wc, diff, grep are few in the list. While editing files, you may lose track of what changes you have made to which files.

    In this article, we will cover commands that compare files.

    cmp command in Unix

    • Compare 2 files byte by byte.
    • It is mostly useful for scripts as it only reports whether the files are different or not.
    • Helps to find out whether the two files are identical or not.

    When cmp is used for comparison between two files, it reports the location of the first mismatch to the screen if difference is found and if no difference is found i.e the files compared are identical. cmp displays no message and simply returns the prompt if the files compared are identical.

    cmp [OPTION] file1 file2

    Options can be:

    -c :Output differing bytes as characters. -i N :Ignore differences in the first N bytes of input. -l : Write the byte number (decimal) and the differing bytes (octal) for each difference. -s : Write nothing for differing files; return exit statuses only. -v : Output version info

    Example 1: lets consider two files :

    $ cat file1.txt
    My name is Mohan
    $ cat file2.txt
    My name is Mohit

    1. Compare file1 to file2 and outputs results

    $ cmp file1.txt file2.txt
    OUTPUT:  file1.txt file2.txt differ: byte 15, line 1

    2. Skip same number of initial bytes from both input files

    $ cmp -i 3 file1.txt file2.txt 
    file1.txt file2.txt differ: byte 12, line 1

    So we see that the initial 3 bytes were skipped

    Example 2:

    $ cat file1.txt
    My name is Mira
    $ cat file2.txt
    My name is Mira
    $cmp file1.txt file2.txt
    $ _

    /*indicating that the files are identical*/

    diff command in Unix

        • Reports the differences between files.
        • Compares files line by line.
        • It is useful when record level differences are to be traced.
        • Tells which lines in one file have to be changed to make the two files identical.
        • If the files match, no output is produced.
        • diff uses certain special symbols and instructions that are required to make two files identical

    Special symbols are:

        • a : add
        • c : change
        • d : delete
    diff [option...] file1 file2




    Ignores the case differences


    Compares two program code differences


    Reports whether file differ or not


    $cat  test1.txt
    $cat test2.txt
    $diff test1.txt test2.txt
    < sam
    < sam
    > Tom

    The output 1,2c1 means that the lines from 1,2 in the first file need to be changed to match the line number 1 in the second file.  Lines preceded by ‘<‘ are lines from the first file.  Lines preceded by ‘>’ are lines from the second file.

    comm command in Unix

          • Compares 2 sorted files line by line
          • Produces 3 column output when no Options are passed.
          • Column 1 contains line unique to first file
          • Column 2 contains line unique to second file
          • Column 3 contains line common to both files
    Text Processing Unix Commands: comm command in Unix
    $ comm file1 file2




    does not print column 1 (lines unique to file 1)


    does not print column 2 (lines unique to file 2) 


    does not print column 3 (lines common to both files)


    Verify files are in sorted order


    Ignore even if files are not in sorted order


    $ cat  test1.txt
    $ cat test2.txt
    $ comm test1.txt test2.txt

    The output of comm command produces 3 columns.
    1. First contains data specific to test1.txt
    2. Second contains data specific to test2.txt
    3. Third contains lines common to both the file

    For more details on file comparison in Unix refer here

