gnu: Add rspamd.

* gnu/packages/mail.scm (rspamd): New public variable.
author: Tobias Geerinckx-Rice <me@tobias.gr> 2020-09-23 19:53:52 +0200
committer: Tobias Geerinckx-Rice <me@tobias.gr> 2024-09-01 02:00:00 +0200
commit: c518c9b2e8a1f49de8c0457b9247995164a3bdc1 (patch)
tree: ba5af690ac731c947ff80d1d0575c0f347af5fc8
parent: 131cedce904c1886890c840efecb44b8a02ca4d4 (diff)
1 files changed, 147 insertions, 0 deletions
diff --git a/gnu/packages/mail.scm b/gnu/packages/mail.scm
index f79b2eff5a..d7d1bed73c 100644
--- a/gnu/packages/mail.scm
+++ b/gnu/packages/mail.scm
@@ -139,6 +139,7 @@
   #:use-module (gnu packages lua)
   #:use-module (gnu packages m4)
   #:use-module (gnu packages man)
+  #:use-module (gnu packages maths)
   #:use-module (gnu packages mercury)
   #:use-module (gnu packages ncurses)
   #:use-module (gnu packages nettle)
@@ -3110,6 +3111,152 @@ etc; plus other capabilities including support for MIME and
 powerful user customization features.")
     (license license:gpl2+)))
 
+(define-public rspamd
+  (package
+    (name "rspamd")
+    (version "2.5")
+    (source
+     (origin
+       (method git-fetch)
+       (uri (git-reference
+             (url "https://github.com/rspamd/rspamd.git")
+             (commit version)))
+       (sha256
+        (base32 "01fhh07dddc6v7a5kq6h1z221vl0d4af43cchqkf54ycyxxxw06h"))))
+    (build-system cmake-build-system)
+    (arguments
+     `(#:configure-flags
+       (list "-DLOCAL_CONFDIR=/etc/rspamd"
+             "-DDBDIR=/var/lib/rspamd"
+             "-DLOGDIR=/var/log/rspamd"
+             "-DRUNDIR=/run/rspamd"
+
+             "-DENABLE_BLAS=ON"         ; faster neural net processing
+             ;; "-DENABLE_HYPERSCAN=ON"    ; faster regexes (x86_64-only)
+             "-DENABLE_OPTIMIZATION=ON"
+             "-DENABLE_PCRE2=ON"        ; default is the implicit pcre input
+
+             "-DINSTALL_WEBUI=OFF")))
+    (native-inputs
+     `(("pkg-config" ,pkg-config)
+       ("ragel" ,ragel)))
+    (inputs
+     `(("glib" ,glib)
+       ("icu4c" ,icu4c)
+       ("libressl" ,libressl)
+       ("libsodium" ,libsodium)
+       ("lua" ,luajit)
+       ("openblas" ,openblas)
+       ("pcre2" ,pcre2)
+       ("perl" ,perl)
+       ("sqlite" ,sqlite)))
+    (home-page "https://rspamd.com/")
+    (synopsis "Rapid spam filtering system")
+    (description
+     "
+Rspamd is a spam filtering system that allows  evaluation of messages by a number of rules including regular expressions, statistical analysis and custom services such as URL black lists. Each message is analysed by rspamd and given a &quot;spam score&
+quot;. According to this spam score and the user settings, rspamd recommends an actio
+n for the MTA to apply to the message, for example to pass, reject or add a header.  Rspamd is designed to process hundreds of messages per second simultaneously and has a number of features available.
+
+Rspamd is an advanced spam filtering system supporting a variety of filtering mechanisms including regular expressions, statistical analysis and custom services such as URL black lists. Each message is analysed by rspamd and given a spam score.
+
+According to this spam score and the user’s settings rspamd recommends an action for the MTA to apply to the message- for example to pass, reject or add a header. Rspamd is designed to process hundreds of messages per second simultaneously.
+
+Spam filtering features implemented in Rspamd include:
+
+    Regular expressions filtering - allows processing of messages, their textual parts, MIME headers and SMTP data received by the MTA against a set of expressions including both normal regular expressions and message processing functions. Rspamd expressions are a powerful tool for filtering messages based on predefined rules. This feature is similar to regular expressions in spamassassin spam filter.
+
+    SPF module validates a message’s origin against the policy defined in the DNS record of sender’s domain. You can read about SPF policies here. A number of mail systems include SPF support, such as gmail or yahoo mail.
+
+    DKIM module validates a message cryptographic signature against a public key placed in the DNS record of sender’s domain. Like SPF, this technique is widely adopted and validates that a message was sent from a specific domain.
+
+    DNS black lists allows to estimate reputation of sender’s IP address or network. Rspamd uses a number of DNS lists including such lists as SORBS or spamhaus. However, rspamd doesn’t trust any specific DNS list and instead uses a conjunction of estimations to avoid mistakes and false positives. Rspamd also uses positive and grey DNS lists for checking for trusted senders.
+
+    URL black lists are rather similar to DNS black lists but measure reputation of domains seen in URLs. This technique is very useful for finding malicious domains.
+
+    Statistics - rspamd uses a bayesian classifier based on five gramms of input. This means that the input is evaluated not based on individual words, but organized into chains. This approach achieves better results than traditionally used monogramms (or words literally speaking). It’s described in detail in this paper.
+
+    Fuzzy hashes - for identifying malicious mail patterns rspamd uses so-called fuzzy hashes. Unlike normal hashes, these structures are designed to hide small differences between text patterns allowing to find similar messages quickly. Rspamd has internal storage of such hashes and can block mass spam sendings quickly based on users’ feedback. Moreover, this allows for feeding rspamd with data from honeypots without polluting the statistical module.
+
+Rspamd uses a conjunction of different techniques to make a final decision about a message. This improves the overall quality of filtering and reduces the number of false positives (i.e. when a innocent message is incorrectly classified as spam). I have tried to simplify rspamd usage by adding the following elements:
+
+    Web interface - Rspamd is shipped with a fully functional ajax-based web interface that allows for observing rspamd statistics; configuring rules, weights and lists; scanning and learning messages and viewing the history of scans. The interface is self-hosted, requires zero configuration and follows the recent web applications standards. You don’t need a web server or application server to run the web UI - you just need to run rspamd itself.
+
+    Integration with MTA - Rspamd can work with the most popular mail transfer systems, such as postfix, exim or sendmail. whilst for exim there are several solutions to work with rspamd. Should you require MTA integration then please consult with the integration guide.
+
+Rspamd has been started to handle mail flows that has grown over the last decade by more than ten times. From the very beginning of the project, Rspamd was oriented on highly loaded mail systems with development focus on performance and scan speed. Rspamd is written in plain C language and it uses a number of techniques to run fast especially on the modern hardware. On the other hand, it is possible to run Rspamd even on an embedded device with a very constrained environment.
+
+Rspamd can be treated as a faster replacement for SpamAssassin mail filter with the ability to scan ten times more messages using the same rules (by means of SpamAssassin plugin).
+
+Rspamd is an advanced spam filtering system that allows evaluation of messages by a number of rules including regular expressions, statistical analysis and custom services such as URL black lists. Each message is analysed by Rspamd and given a spam score.
+
+According to this spam score and the user’s settings Rspamd recommends an action for the MTA to apply to the message: for example, to pass, to reject or to add a header. Rspamd is designed to process hundreds of messages per second simultaneously and has a number of features available.
+
+You can watch the following video from the Linux Chemnitz Days 2019.
+
+You can also check the recent performance analyse article to have a better impression about how fast Rspamd could be.
+Unique features
+
+    Web interface. Rspamd is shipped with the fully functional Ajax-based web interface that allows to monitor and configure Rspamd rules, scores, dynamic lists, to scan and learn messages and to view the history of scans. The web interface is self-hosted, requires zero configuration and follows the recent web applications standards. You don’t need a web server or applications server to run web UI - you just need to run Rspamd itself and a web browser.
+
+    Integration with MTA. Rspamd can work with the most popular mail transfer systems, such as Postfix, Exim, Sendmail or Haraka.
+
+    Extensive Lua API. Rspamd ships with hundreds of Lua functions that are helpful to create your own rules for efficient and targeted spam filtering.
+
+    Dynamic tables - it is possible to specify bulk lists as dynamic maps that are checked in runtime with updating data only when they are changed. Rspamd supports file, HTTP and HTTPS maps.
+
+Content scan features
+
+Content scan features are used to find certain patterns in messages, including text parts, headers and raw content. Content scan technologies are intended to filter the most common cases of spam messages and offer the static part of spam filtering. Rspamd supports various types of content scanning checks, such as:
+
+    Regular expression filtering offers basic processing of messages, their textual parts, MIME headers and SMTP data received by MTA against a set of expressions that includes both normal regular expressions and message processing functions. Rspamd regular expressions are a powerful tool that allows to filter messages based on some pre-defined rules. Rspamd can also use SpamAssassin regular expressions via plugin.
+
+    Fuzzy hashes are used by Rspamd to find similar messages. Unlike normal hashes, these structures are targeted to hide small differences between text patterns allowing to find common messages quickly. Rspamd has internal storage of such hashes and allows to block spam mass mails based on user’s feedback that specifies message reputation. Moreover, fuzzy storage allows to feed Rspamd with data from honeypots without polluting the statistical module. You can read more about it in the following document.
+
+    DCC is quite similar to the previous one but it uses the external service DCC to check if a message is a bulk message (that is sent to many recipients simultaneously).
+
+    Chartable module helps to find specially crafted messages that are intended to cheat spam filtering systems by switching the language of text and replacing letters with their analogues. Rspamd uses UTF-8 normalization to detect and filter such techniques commonly used by many spammers.
+
+Policy check features
+
+There are many resources that define policies for different objects in email transfer: for sender’s IP address, for URLs in a message and even for a message itself. For example, a message could be signed by sender using DKIM technology. Another example could be URL filtering: phishing checks or URL DNS blacklists - SURBL. Rspamd supports various policy checks:
+
+    SPF checks allow to validate a message’s sender using the policy defined in the DNS record of sender’s domain. You can read about SPF policies here. A number of mail systems support SPF, such as Gmail or Yahoo Mail.
+
+    DKIM policy validates a message’s cryptographic signature against a public key placed in the DNS record of sender’s domain. This method allows to ensure that a message has been received from the specified domain without altering on the path. Rspamd also supports DKIM signing
+
+    DMARC combines DKIM and SPF techniques to define more or less restrictive policies for certain domains. Rspamd can also store data for DMARC reports in Redis database.
+
+    ARC is a relatively new addition to the DKIM signing mechanism allowing to forward signed messages over a chain of trusted relays.
+
+    Whitelists are used to avoid false positive hits for trusted domains that pass other checks, such as DKIM, SPF or DMARC. For example, we should not filter messages from PayPal if they are correctly signed with PayPal domain signature. On the other hand, if they are not signed and DMARC policy defines restrictive rules for DKIM, we should mark this message as spam as it is potentially phishing. Whitelist module provides different modes to perform policy matching and whitelisting or blacklisting of certain combinations of verification results.
+
+    DNS lists allows to estimate reputation of sender’s IP address or network. Rspamd uses a number of DNS lists including such lists as SORBS or SpamHaus. However, Rspamd doesn’t trust ultimately any specific DNS list and does not reject mail based just on this factor. Rspamd also uses white and grey DNS lists to avoid false positive spam hits.
+
+    URL lists are rather similar to DNS black lists but uses URLs in a message to fight spam and phishing. Rspamd has full embedded support of the most popular SURBL lists, such as URIBL and SURBL from SpamHaus.
+
+    Phishing checks are extremely useful to filter phishing messages and protect users from cyber attacks. Rspamd uses sophisticated algorithms to find phished URLs and supports the popular URL redirectors (for example, http://t.co) to avoid false positive hits. Popular phishing databases, such as OpenPhish and PhishTank are also supported.
+
+    Rate limits allow to prevent mass mails to be sent from your own hacked users. This is an extremely useful feature to protect both inbound and outbound mail flows.
+
+    IP reputation plugin allows to adjust reputation for specific IP addresses, networks, autonomous blocks (ASN) and even countries.
+
+    Greylisting is a common method to introduce delay for suspicious messages, as many spammers do not use the fully functional SMTP servers that allow to queue delayed messages. Rspamd implements greylisting internally and can delay messages that has a score higher than certain threshold.
+
+    Replies module is intended to whitelist messages that are reply to our own messages as these messages are likely important for users and false positives are highly undesirable for them.
+
+    Maps module provides a Swiss Knife alike tool that could filter messages based on different attributes: headers, envelope data, sender’s IP and so on. This module is very useful for building custom rules.
+
+Statistical tools
+
+Statistical approach includes many useful spam recognition techniques that can learn dynamically from messages being scanned. Rspamd provides different tools that could be learned either manually or automatically and adopt for the actual mail flow.
+
+    Bayes classifier is a tool to classify spam and ham messages. Rspamd uses an advanced algorithm of statistical tokens generation that achieves better results than traditionally used ones (e.g. in SpamAssassin) that is described in details in the following paper.
+
+    Neural network learns from scan results and allows to improve the final score by finding some common patterns of rules that are typical for either spam or ham messages. This module is especially useful for large email systems as it can learn from your own rules and adopt quickly for spam mass mailings.
+")
+       (license license:asl2.0)))
+
 (define-public sendmail
   (package
     (name "sendmail")
author	Tobias Geerinckx-Rice <me@tobias.gr>	2020-09-23 19:53:52 +0200
committer	Tobias Geerinckx-Rice <me@tobias.gr>	2024-09-01 02:00:00 +0200
commit	c518c9b2e8a1f49de8c0457b9247995164a3bdc1 (patch)
tree	ba5af690ac731c947ff80d1d0575c0f347af5fc8
parent	131cedce904c1886890c840efecb44b8a02ca4d4 (diff)