This project started at June/2005 and is growing pretty fast. We decided to post here the To Do list. Volunteers are more then welcome. Itens in light grey are done, the ones in bold are considered priorities.
Link sample details with malwr.com
Review the URL acquisition bot to correctly sanitize # signals
Create a list of IP addresses instead of URLs
Evaluate the possible use of 'Artists Against 419'
Evaluate if we can create a block list for the Suricata IDS/IPS
Evaluete the possible inclusion of domains used in other scams
Evaluate the usage of SSDEEP hashes
Set up a subscription system.
Consider linking MBLs to RNP-CAIS' frauds catalog.
Retrieve web server information during crawling.
Review RTIR and consider integrating it into the alert system
Add the AddThis button on all pages
Use PEFile to detect file packers
Use _file_ to detect compressed files before running PEFile
Use hashes for URL comparison on DB inserts
Create a SpamAssassin channel for sa-update
Add IP address to the alert message (ArCERT)
Bug: when m-spider workers die, no message is sent out
Automagically understand Content-Disposition properly
Fight some more the deep crawling (URLs levels) and some obscure redirections
Browser impersonation for URLs crawling
Add .jar to the list of potentially dangerous extensions
Use AJAX to include more Malware information on the Malware details page
Include Packer links in search.pl
Show multiple infections in archive files
Setup Packer statistics
Include ASN registrar information on Malware details (http://www.iana.org/assignments/as-numbers)
Use ASN registrie information to send MBL Alerts to registrarsEvaluate GreenSQL
Create a list of recently sent MBL alerts
Review XSSDB for some obscure URL obfuscation techniques
Setup a FAQ ASAP
Use ASN information on MBL Alerts.
Get some basic automatic Malware analysis working
Setup descendant date ordered lists so users with small hardware can grab just the first N entries which will be the newest ones (user suggestions).
Review "strip user" function for 28648 wrong parsing (@)
Write better multi threading for craller scripts.
Review receive_mail script to make sure URLs submitted by CSIRTS are inserted in the database with the right addresses.
Set up lists (regex, domain) to SquidGuard.
Upgrade to Apache 2.2 + mod_perl.
List ASNs that actively host Malware.
Set up agressive block lists.
Correct a bug when updating domain names (on 404 and bad MIME).
Set up lists to DansGuardian and SmoothWall.
Setup a new internal status to put domains like rapiduploads.com. These domains shouldn't be checked regularly.
Rewrite and group all common functions in modules, avoiding code duplication.
Create a new e-mail address for contributors who want to get feedback. Right now firstname.lastname@example.org doesn't send feedback.
Setup an SMTP smarthost and a secundary nameserver in another ISP.
Think about better ways to catch hidden URLs in e-mail messages.
Setup some daily graphs.
Review SquidGuard lists to include "domain" lists.
Setup secondary database and web servers. (moved to server07)
Get more spamtraps working. ISPs willing to help are welcome.
Review date format on DB and date updates on regular URL checks (date_last_check).
Setup a search engine including searchs by domain, malware and e-mail address which submited URLs (this should send results to the searched e-mail address only, no display). URLs will never be shown unsanitized.
Code some new crawling features on the receive_email script (external crawling).
Review stats script. Its a bit slow.
MySQL database fine tuning (indexes, query cache size, etc).
Export the MBL in XML format.
Enhance stats with graphics and more information.
Review lists that accept regex to make same domain URLs be reported in a single line.
Set up secondary DNS servers.
Review web site duplicate content that is not in include files.
Find a secure way to make available all archived phishing scams received so far (needs URL sanitizing on e-mail messages).
Write a press release in english and portuguese.
Expand email@example.com functionality.
Have a look at surbl.org.
Have a look on SpamCopURL.
Create signatures for Snort/Bleeding Edge Snort.
Decide what to do with http_last_modif.
Properly validate the CORRUPTED AV status.
Use MD5 and SHA-1 Perl modules instead of command line.
Implement abuse_net.pl for new infected URLs report.
Move dailly crawlling jobs to server06 (multi-thread).
Implement a batch job script, status and spider flag.
Setup lists for ClamAV signatures.
Setup MS DNS(?) lists.