To parse all email addresses from a text file can be done with grep
tool in Linux.
To list all the email addresses from our file index.html
, we use grep and regexp match email addresses from the file:
$ grep -oE "\b[a-zA-Z0-9.-][email protected][a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" index.html
-o
tells grep to echo only matched part, not the whole lineE
tells grep that our search term is in regexp formatThe regexp part "\b[0-9]{1-3}\.[0-9]{1-3}\.[0-9]{1-3}\.[0-9]{1-3}\b"
is not a perfect pattern to match all possible emails. In order to have pattern that matches 100% to all email combinations, you should explore the internet to find more sophisticated pattern for your needs. This simple pattern will do the work in most of the cases.
We may have an output with duplicate email addresses. Let’s dedup the output with uniq
:
$ grep -oE "\b[a-zA-Z0-9.-][email protected][a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" index.html | uniq
The uniq
takes input lines and echoes only unique lines.