Grep All Email Addresses from a Text File

Steven Vona, December 21, 2011

Q: I need to use my Linux system to grep email addresses out of a text file. Is there a way I can tell grep to just look for emails?

A: You can use regular expressions with grep. If you construct a good regex you can pull just about anything out of a text file. Below we use grep with the -E (extended regex) option which allows interpretation of the pattern as a regular expression. The -o option tells grep to only show the matching pattern, not the whole line.

grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" filename.txt

You can also use egrep instead of grep with the -E switch.

egrep -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" filename.txt

That’s it. With the above regular expression you should be able to find all the email addresses in your file.

$ egrep -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" test
 [email protected]

Let’s break down the regular expression.

\b is a word boundary, so we put one on each side. This basically tells grep that there should be a blank space on either side of the match.

[a-zA-Z0-9.-] tries to specify any valid character for the beginning of the email address. These being lowercase a to z, uppercase a to z, any digit, a period or a dash.

The plus sign means add to or concatenate.

Then we specify the @ symbol, which is very recognizable.

Then we repeat the same section looking for valid characters twice, separated by a period. This all makes up the basic structure of an email address.

From grep man pages:
-E = Interpret PATTERN as an extended regular expression.
-o = Show only the part of a matching line that matches PATTERN.

Resources:
GREP MAN PAGE: https://ss64.com/bash/grep.html

Tags #address #addresses #email #file #grep #linux

Extract a Single File From a tar Archive

40 Comments

Philip Rhoades

Thanks! I was almost there but not quite . .

12 years ago Reply
Anonymous

thank you …very usefull

12 years ago Reply
tjallingkikkert

You missed the underscore…

12 years ago Reply
Jason Lee

According to your regexp, some@….com will be a valid email address.

12 years ago Reply
Anonymous

Gentlemen, I appreciate you finding issues with my regex. Please post some solutions!

12 years ago Reply
Anonymous

Thanks!

12 years ago Reply
Gabriel Sousa

Thank you! Very Good!

12 years ago Reply
Anonymous

thnx

12 years ago Reply
3@

Thanks a lot!

12 years ago Reply
Sven

Thanks !!

12 years ago Reply
Anonymous

grep -E -o "b[a-zA-Z0-9.-._]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b" filename.txt

12 years ago Reply
Anonymous

I love this command! thank you!

12 years ago Reply
Anonymous

Thanks. It works perfectly!

12 years ago Reply
Anonymous

Just what I was after – thanks!

12 years ago Reply
Géraldine Hemma713

Thanks a lot. You saved my afternoon ^^

12 years ago Reply
DEn

work, thanx

11 years ago Reply
de tu dai phap

Can you help me to clear b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b , i dont understand it ? can you explain ? Thanks

11 years ago Reply
de tu dai phap

Thanks alo .you make a great help !

11 years ago Reply
khanbaba khan

egrep -i ^[a-z0-9.-]+@[a-z0-9.-]+.[a-z0-9.-]+$ filename.txt

11 years ago Reply
Anonymous

This is using regular expressions, here is some reasoning.

b = Tell grep to match a word boundary
[a-zA-Z0-9] = Tells grep to match any character from a-z, then the same thing capitalized, and also match anything from 0-9 (So basically any letter or number)
+ = Tell grep to match the preceeding any number of times. Which means all thoughter any number of upper case letters, lower case letters or digits.

And so on… Here are some good resources:

https://www.gnu.org/software/findutils/manual/html_node/find_html/egrep-regular-expression-syntax.html

http://www.cs.columbia.edu/~tal/3261/fall07/handout/egrep_mini-tutorial.htm

11 years ago Reply
Anonymous

This comment has been removed by a blog administrator.

11 years ago Reply
Anonymous

@Khanbaba khan – That is in imperfect solution. It will find "joe@domain." which is not a valid email address.

11 years ago Reply
Ray Dobie

cool! but how to add a comma "," in every email adress? like this [email protected],[email protected],…

11 years ago Reply
Anonymous

I came up with this in 5 seconds, might be a cleaner way though.

for i in `grep -E -o "b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b" help`; do echo -n "$i,"; done

Although you will have a comma at the end of the list.

11 years ago Reply
Anonymous

BTW, that should all be on one line.

11 years ago Reply
Anonymous

Thank you . It was very useful……………

11 years ago Reply
Anonymous

Oh, you just saved me an hour!

11 years ago Reply
Adam

You're right that you can't end with a period. Though this leads to a related issue: nearly anything@anything is a valid email address according to the full spec (RFC822), including things like "{-a.-b@c=d$*!/?" (not to mention Unicode). If it doesn't matter for your application to reject uncommon addresses, this isn't much of an issue; just force people to get a "real" address that ends in .blah and doesn't contain fancy symbols. But if you want to err on the side of caution, *@* is pretty much the only way to go. A separate RegEx or script can be used later for actual validation. For example, processing the TLD according to a separate whitelist (like only accepting currently valid TLDs like com, net, gov, tv) though even that changes yearly and the list numbers in the thousands.

10 years ago Reply
jorgo

Simpler with [:alnum:]. "_" and "-" allowed and verify correct string length of domain and top domain:
egrep -o "[[:alnum:]_-]+@[[:alnum:]_-]{2,}\.[[:alnum:]]{2,}"

9 years ago Reply
jorgo

Sorry correction
egrep -o "b[[:alnum:]_-]+@[[:alnum:]_-]{2,}.[[:alnum:]]{2,}b"

9 years ago Reply
Urgen Sherpa

thank you. This was a timesaver

9 years ago Reply
Alice Millour

Thank you 🙂 !!

9 years ago Reply
Vaskir

Thank you for this command 🙂 it is quite useful to extract emails from various files, not only txt but cvs and similar…

9 years ago Reply
Anonymous

plus(+) is a valid email address for most email systems [email protected] will get delivered to [email protected] but you will know +which… as it is ignored, you know which company is spamming you

8 years ago Reply
Dan Haworth

late to the party by a good few years here, but this was very helpful! thankyou!

8 years ago Reply
Unknown

Thanks !

8 years ago Reply
Frank Thilo Röhl

Thanks !

8 years ago Reply
Anonymous

thank you

6 years ago Reply
Rajesh Chaudhary

Thanks!

6 years ago Reply
JM mignot

Shouldn’t the dot after the second + sign be prefixed by \ ?

2 years ago Reply

Sign Up For Our Newsletter

Grep All Email Addresses from a Text File

Extract a Single File From a tar Archive

How to setup SSH Key Authentication - SSH Passwordless Login

Leave a Reply Cancel reply

40 Comments

Join Our Newsletter

Categories