Putorius
Basic Commands

Uniq - Print or Remove Duplicate Lines on Linux Command Line

The uniq command is a Linux text utility that finds duplicated lines in a file or data stream. It is part of the GNU Core Utilities package and is available on almost all Linux systems.

The main thing to know about uniq is it only finds duplicate adjacent lines. Meaning the duplicated line must follow directly after the original to be detected.

In this short tutorial we will show you how to use the uniq command. We will discuss it's basic usage and some of it's command line options. In addition, we will examine the differences between using sort -u and uniq.

Basic Usage of the Uniq Command

The most basic way to use uniq is to invoke the command and follow it with a filename for input.

uniq inputfile.txt

Let's take a look at an example. We will use a file called test.txt with the following contents.

[mcherisi@putor ~]$ cat test.txt 
Steve thinks Pizza is the greatest food ever.
He also likes Pho. 
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.

We both think Italian food is awesome.


He also likes Pho. 
He also likes Pho.

There are a few things I want you to take notice of in our input file. First, I put the line "He also likes Pho" in between all of the Pizza lines. Second, I included several blank lines after the "Italian Food" line. This will help us demonstrate how uniq works and how it only finds "adjacent" lines.

Let's run our input file through uniq and see what happens.

[mcherisi@putor ~]$ uniq test.txt 
Steve thinks Pizza is the greatest food ever.
He also likes Pho. 
Steve thinks Pizza is the greatest food ever.

We both think Italian food is awesome.

He also likes Pho.

In the above example, uniq command successfully filtered out the duplicated lines. But, it left two of the same exact Pizza lines. This is because they are not adjacent. It also removed one of the repeated blank lines. Since uniq uses a newline character for field separation, it will also detect adjacent blank lines.

The uniq Command Line Options

The uniq command has a few command line options to allow you to customize the output. Let's take a look at some examples of how these options work.

Count How Many Times a Line is Repeated

The -c option will print the number of times each line is repeated. When using this option, uniq will prepend each line with the number of times it was repeated or duplicated.

[mcherisi@putor ~]$ uniq -c test.txt 
      1 Steve thinks Pizza is the greatest food ever.
      1 He also likes Pho. 
      3 Steve thinks Pizza is the greatest food ever.
      1 
      1 We both think Italian food is awesome.
      2 
      2 He also likes Pho. 

Again, it is important to point out that it only counts adjacent lines. The first line has a 1 prepended to it because there were no other adjacent duplicate lines. The third has a 3, because there were three duplicate lines (even though the text was exactly the same as line 1).

Print Duplicated Lines Only with Uniq

The -D (capital letter D) will only print the duplicated lines of the file.

[mcherisi@putor ~]$ uniq -D test.txt 
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.


He also likes Pho. 
He also likes Pho. 

Only Print a Single Instance of Duplicated Lines

The -d (lowercase d) option will print a single instance of each group of duplicated lines.

[mcherisi@putor ~]$ uniq -d test.txt 
Steve thinks Pizza is the greatest food ever.

He also likes Pho. 

Only Print Unique Lines of a File

In contrast to above, the -u option only prints the unique lines (Lines that are not adjacent to a duplicate line).

[mcherisi@putor ~]$ uniq -u test.txt 
Steve thinks Pizza is the greatest food ever.
He also likes Pho. 

We both think Italian food is awesome.

Ignore Case when Filtering Duplicate Lines with Uniq

You can use the -i option to ignore case (case insensitivity) while using uniq.

[mcherisi@putor ~]$ uniq icase.txt 
Meatballs and Braccioli on Sunday.
Meatballs and braccioli on Sunday.

The above lines are not found to be duplicates because the 'b' is lowercase in the second sentence. Now, with the -i option:

[mcherisi@putor ~]$ uniq -i icase.txt 
Meatballs and Braccioli on Sunday.
[mcherisi@putor ~]$ 

When ignoring case, these lines are found to be duplicate.

Don't Compare the First N Fields

You can use the -f option to skip the first n fields. This would be very handy when working with log files or other timestamped files.

[mcherisi@putor ~]$ cat error.log 
[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:53 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:54 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:55 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:56 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

If we just used uniq, none of the lines would be considered duplicate because all of the timestamps differ.

Using the -f option, we can tell uniq to skip the first field (timestamp) when comparing the lines.

[mcherisi@putor ~]$ uniq -f 4 error.log 
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

Don't Compare the First N Characters

Similar to above, the -s options does the same, but instead of fields it ignores the first n characters.

[mcherisi@putor ~]$ uniq -s 21 error.log 
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

In the example above we ignored the first 21 characters which make up the timestamp.

Uniq vs Sort -u Conundrum

The fact that uniq only finds adjacent lines in not really a deficiency, although some people categorize it as such. It was purposely designed this way.

There is a glaring difference between sort -u and uniq. Sort does just that, it sorts the input in some manner (numerically, alphabetically, etc..) depending on it's options. Whereas uniq prints the text in it's original order. Whether this is important or not depends on what you are using it for.

If you grep ip addresses out of a log file and need to remove duplicates, then it probably won't matter if they are sorted. In this case you would want to use sort -u.

If you are working with a written document, where the order of the lines are important, then you would want to use uniq.

Even with something as rudimentary as our example, you can see how sort -u changing the order of the text can be problematic.

[mcherisi@putor ~]$ sort -u test.txt 

He also likes Pho. 
Steve thinks Pizza is the greatest food ever.
We both think Italian food is awesome.

Conclusion

The uniq command itself is unique. Although sort can also remove duplicate lines, it lacks the some of the functionality provided by the extended options of uniq. In my opinion, these tools are often unfairly compared.

In this article we showed you the basic syntax and usage of the uniq command. We also examined some of the most popular options to extend it's functionality. We ended with a short rant about the differences of sort -u and uniq and why they shouldn't be compared.

If you have any comments, corrections, or questions, feel free to sound off in the comments below.

Resources and Links

Exit mobile version