sex.py

Smashing Email eXtractor 1.0

Extract valid e-mail addresses from all kind of files. With sex.py you can extract a list of emails from a defaced text file or even scan recursively through a directory and all its content. A scenario could be to download a website to your local hard-drive and use sex.py to harvest all email addresses.

Highlights:
  • Switch the search pattern to match valid email addresses
  • Scan a single file or multiple files form a directory (including subdirectories)
  • Sort the addresses of the output file
  • Except duplicates
  • Change verbosity level
Configuration:
To configure Smashing Email eXtractor edit the variables in the source file.

verbose = n
0 no output
1 print the email addresses e.g. if you want to pipe them
2 output email addresses, current file and grand total

sort = n
0 write email addresses to destination file as found
1 sort addresses in alphabetical order

remove_duplicates = n
0 capture all addresses
1 remove duplicated emails

Usage:

sex.py <source> <destination>


source: absolute path to a file or directory
destination: path to write the output file

Example 1:

$ wget --mirror -p --restrict-file-names=windows --html-extension --convert-links -v http://www.wolfgang-schaeuble.de/
$ python sex.py www.wolfgang-schaeuble.de/ addresses.txt
>> File: www.wolfgang-schaeuble.de/Audioplayer/swfobject.js
...
>> File: www.wolfgang-schaeuble.de/fileadmin/user_upload/PDF/050625nordkurier.pdf
Margareta.Moertl@cducsu.de
...
>> Extraced email addresses: 10
$ cat addresses.txt
Bruno.Kahl@cducsu.de
Margareta.Moertl@cducsu.de
aki-108@gmx.de
forum@welt.de
heike.nieske@cducsu.de
poststelle@bmi.bund.de
sebastian.pieper@cducsu.de
wolfgang.schaeuble.ma02@bundestag.de
wolfgang.schaeuble@bundestag.de
wolfgang.schaeuble@wk.bundestag.de


Example 2:

$ python sex.py shitty_formatted_list.txt shiny_email_list.txt


Download:
sex.py