In Linux you can use regular expressions with grep to extract an IP address from a file. The grep command has the -E (extended regex) option to allow it to interpret a pattern as a extended regular expression.
The format of an IPv4 address is a 32-but numeric address, grouped 8 bits at a time (called an octet), each separated by a dot. Each octet can range from 0-255.
To start we will create a text file that contains both valid and invalid IP addresses.
[savona@putor ~]$ cat ips.txt
123.321.234.712
999.999.999.999
192.168.5.5
10.0.0.4
10.000.000.04
Now we can create a simple regular expression to look for 4 blocks of 1-3 digits separated by a dot, like so:
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ips.txt
That command would be fine for most purposes, but if there is an invalid IP address like 265.168.1.2 (remember an octet cannot be higher than 255) it would still find it.
Here is an example:
[savona@putor ~]$ grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ips.txt
123.321.234.712
999.999.999.99
192.168.5.5
10.0.0.4
10.000.000.04
In the above example, it found all the entries. As we know, 2 of these are NOT valid IP addresses.
We have to expand this regular expression to tell it to ignore numbers of 255 or higher. Here is our second attempt to build a regular expression that would only extract valid IP addresses.
[savona@putor ~]$ grep -E '^((25[0-5]|2[0-4][0-9]|[1]?[1-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[1]?[1-9]?[0-9])$' ips.txt
192.168.5.5
The above does a good job, but it still has issues. It will not find an IP address with leading zeros, nor will it find an IP address with 0 as the only number of the octet. This regular expression would not match 10.0.0.5 for example.
In order to find a regular expression that will only extract valid IP addresses, we have to go to great lengths to validate every octet in the pattern.
Here is an example:
[savona@putor ~]$ grep -E "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" ips.txt
192.168.5.5
10.0.0.4
10.000.000.04
Now we were able to grep valid IP addresses. But, even the above is not without it's flaws.
It is an industry standard to remove leading zeros from addresses in both IPv4 and IPv6. But just because that is a standard way to represent IPv4 addresses, it doesn't mean everyone will.
For example, you could represent a zero filled octet with three zeros or a single zero.
10.000.000.5 = 10.0.0.5
Both of these addresses would work fine on a network, but even ping removes the leading zeros.
[savona@putor ~]$ ping 10.000.000.5
PING 10.000.000.5 (10.0.0.5) 56(84) bytes of data.
64 bytes from 10.0.0.5: icmp_seq=1 ttl=64 time=0.252 ms
64 bytes from 10.0.0.5: icmp_seq=2 ttl=64 time=0.312 ms
There is a little tool built by some folks at Red Hat called ipcalc. It was originally built to calculate IP information for a host, but can also be used for IP address validation. Surely enough, this tool also calls and IP address with leading zeros an invalid IP address.
[savona@putor ~]$ ipcalc -c 10.000.000.5
ipcalc: bad IPv4 address: 10.000.000.5
As you can see it is fairly easy to use grep and regular expressions to extract an IP address from a file. It is not so easy to ensure that the matched pattern is a valid IP address.
Related Articles:
How to grep Email Addresses from a Text File Using Regular Expressions
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
2 Comments
Join Our Newsletter
Categories
- Bash Scripting (17)
- Basic Commands (51)
- Featured (7)
- Just for Fun (5)
- Linux Quick Tips (98)
- Linux Tutorials (65)
- Miscellaneous (15)
- Network Tools (6)
- Reviews (2)
- Security (32)
- Smart Home (1)
maybe a late reply. the last grep command is still buggy. I escape the . (dots) to be sure it's only a dot and not a space or anything else
grep -oE "([^.]|^)([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])([^.]|$)"
else, my output will have also: (2;80,443)
for example
It is wrong to claim that "It is an industry standard to remove leading zeros from addresses in both IPv4". Leading zeros will cause the number to be interpreted as octal... so 10.1.1.010 is in fact 10.1.1.8 and 10.1.1.09 is an invalid address...
$ ping 10.1.1.010
PING 10.1.1.010 (10.1.1.8) 56(84) bytes of data.
$ ping 10.1.1.09
ping: 10.1.1.09: Name or service not known
Not very common knowledge, but if you add leading zeros to align some output, someone is going to get mad at you, rightfully...