The uniq command is a Linux text utility that finds duplicated lines in a file or data stream. It is part of the GNU Core Utilities package and is available on almost all Linux systems.
The main thing to know about uniq is it only finds duplicate adjacent lines. Meaning the duplicated line must follow directly after the original to be detected.
In this short tutorial we will show you how to use the uniq command. We will discuss it's basic usage and some of it's command line options. In addition, we will examine the differences between using sort -u and uniq.
Table of Contents
Basic Usage of the Uniq Command
The most basic way to use uniq is to invoke the command and follow it with a filename for input.
uniq inputfile.txt
Let's take a look at an example. We will use a file called test.txt with the following contents.
[mcherisi@putor ~]$ cat test.txt
Steve thinks Pizza is the greatest food ever.
He also likes Pho.
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
We both think Italian food is awesome.
He also likes Pho.
He also likes Pho.
There are a few things I want you to take notice of in our input file. First, I put the line "He also likes Pho" in between all of the Pizza lines. Second, I included several blank lines after the "Italian Food" line. This will help us demonstrate how uniq works and how it only finds "adjacent" lines.
Let's run our input file through uniq and see what happens.
[mcherisi@putor ~]$ uniq test.txt
Steve thinks Pizza is the greatest food ever.
He also likes Pho.
Steve thinks Pizza is the greatest food ever.
We both think Italian food is awesome.
He also likes Pho.
In the above example, uniq command successfully filtered out the duplicated lines. But, it left two of the same exact Pizza lines. This is because they are not adjacent. It also removed one of the repeated blank lines. Since uniq uses a newline character for field separation, it will also detect adjacent blank lines.
The uniq Command Line Options
The uniq command has a few command line options to allow you to customize the output. Let's take a look at some examples of how these options work.
Count How Many Times a Line is Repeated
The -c
option will print the number of times each line is repeated. When using this option, uniq will prepend each line with the number of times it was repeated or duplicated.
[mcherisi@putor ~]$ uniq -c test.txt
1 Steve thinks Pizza is the greatest food ever.
1 He also likes Pho.
3 Steve thinks Pizza is the greatest food ever.
1
1 We both think Italian food is awesome.
2
2 He also likes Pho.
Again, it is important to point out that it only counts adjacent lines. The first line has a 1 prepended to it because there were no other adjacent duplicate lines. The third has a 3, because there were three duplicate lines (even though the text was exactly the same as line 1).
Print Duplicated Lines Only with Uniq
The -D (capital letter D) will only print the duplicated lines of the file.
[mcherisi@putor ~]$ uniq -D test.txt
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
Steve thinks Pizza is the greatest food ever.
He also likes Pho.
He also likes Pho.
Only Print a Single Instance of Duplicated Lines
The -d
(lowercase d) option will print a single instance of each group of duplicated lines.
[mcherisi@putor ~]$ uniq -d test.txt
Steve thinks Pizza is the greatest food ever.
He also likes Pho.
Only Print Unique Lines of a File
In contrast to above, the -u
option only prints the unique lines (Lines that are not adjacent to a duplicate line).
[mcherisi@putor ~]$ uniq -u test.txt
Steve thinks Pizza is the greatest food ever.
He also likes Pho.
We both think Italian food is awesome.
Ignore Case when Filtering Duplicate Lines with Uniq
You can use the -i
option to ignore case (case insensitivity) while using uniq.
[mcherisi@putor ~]$ uniq icase.txt
Meatballs and Braccioli on Sunday.
Meatballs and braccioli on Sunday.
The above lines are not found to be duplicates because the 'b' is lowercase in the second sentence. Now, with the -i
option:
[mcherisi@putor ~]$ uniq -i icase.txt
Meatballs and Braccioli on Sunday.
[mcherisi@putor ~]$
When ignoring case, these lines are found to be duplicate.
Don't Compare the First N Fields
You can use the -f
option to skip the first n fields. This would be very handy when working with log files or other timestamped files.
[mcherisi@putor ~]$ cat error.log
[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:53 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:54 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:55 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:56 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
If we just used uniq, none of the lines would be considered duplicate because all of the timestamps differ.
Using the -f option, we can tell uniq to skip the first field (timestamp) when comparing the lines.
[mcherisi@putor ~]$ uniq -f 4 error.log
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
Don't Compare the First N Characters
Similar to above, the -s
options does the same, but instead of fields it ignores the first n characters.
[mcherisi@putor ~]$ uniq -s 21 error.log
[Wed Oct 11 14:32:57 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:58 2000] [error] [client 192.168.1.15] client denied by server configuration: /export/home/live/ap/htdocs/test
[Wed Oct 11 14:32:59 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test
In the example above we ignored the first 21 characters which make up the timestamp.
Uniq vs Sort -u Conundrum
The fact that uniq only finds adjacent lines in not really a deficiency, although some people categorize it as such. It was purposely designed this way.
There is a glaring difference between sort -u
and uniq
. Sort does just that, it sorts the input in some manner (numerically, alphabetically, etc..) depending on it's options. Whereas uniq prints the text in it's original order. Whether this is important or not depends on what you are using it for.
If you grep ip addresses out of a log file and need to remove duplicates, then it probably won't matter if they are sorted. In this case you would want to use sort -u
.
If you are working with a written document, where the order of the lines are important, then you would want to use uniq.
Even with something as rudimentary as our example, you can see how sort -u
changing the order of the text can be problematic.
[mcherisi@putor ~]$ sort -u test.txt
He also likes Pho.
Steve thinks Pizza is the greatest food ever.
We both think Italian food is awesome.
Conclusion
The uniq command itself is unique. Although sort can also remove duplicate lines, it lacks the some of the functionality provided by the extended options of uniq. In my opinion, these tools are often unfairly compared.
In this article we showed you the basic syntax and usage of the uniq command. We also examined some of the most popular options to extend it's functionality. We ended with a short rant about the differences of sort -u
and uniq
and why they shouldn't be compared.
If you have any comments, corrections, or questions, feel free to sound off in the comments below.