Backup your Gmail in a few easy steps!

I’ve actually spent a few hours searching around for a good backup solution for my mailbox until I decided to stick with getmail.  What you’ll be able to achieve after reading this HowTo and deploying the following setup is:

  1. A full backup of your e-mail DATA in the Mbox format. (yes, Gmail’s labels / folders as well)
  2. Prevent getmail to mark all mails as read after delivering them. (this was a pretty bad issue since getmail was marking all my mails as read even if I did not access my e-mail at all)
  3. Keep your backups up-to-date with the latest content from your mailbox. (by default getmail grabs all the DATA from your mailbox and fills up the Mbox / Maildir content keeping deleted mails. So let’s say I deleted a mail two days ago, well it’ll still appear on today’s backups. This behaviour is definitely unwanted)

I’ll now move to explain a few details about my new configuration but before moving to tweak getmail’s main config file, please do the following change on _retrieverbases.py* :

return self._getmsgpartbyid(msgid, '(RFC822)')

to

return self._getmsgpartbyid(msgid, '(BODY.PEEK[])')

When done grab the following getmailrc and adapt it to your needs**:

[retriever]
type = SimpleIMAPSSLRetriever ## or SimplePOP3SSLRetriever.
server = imap.gmail.com ## or pop.gmail.com for POP3.
username = example@gmail.com
password = password

## so-called Gmail's labels should be listed one by one here for getmail to retrieve mail from them successfully.

mailboxes = ("INBOX", "[Gmail]/Sent mail",
"ubuntu", "gnome/example", "linux/example")

[destination]
type = Mboxrd
path = ~/.getmail/backup.mbox

[options]
delivered_to = false ## No delivered_to header added automatically.
received = false ## No received header added automatically.
verbose = 2 ## getmail will print messages about each of its actions.

When done we should go ahead setting up getmail’s directories and config file:

mkdir $HOME/.getmail
cp $HOME/getmailrc $HOME/.getmail/

Adapt $HOME/getmailrc to whatever dir you put that file into. But…pretty much all the remaining work will be done by a small shell script I wrote:

#!/bin/sh

WORKDIR=$HOME/.getmail
date=`date "+%d-%m-%Y_%H:%M"`

if [ ! -f  $WORKDIR/backup.mbox ]
then
touch $WORKDIR/backup.mbox
fi

getmail > $WORKDIR/getmail.log
OUT=$?
if [ $OUT -eq 0 ]
then
mkdir -p $WORKDIR/backups/ && { mv $WORKDIR/backup.mbox $WORKDIR/backups/backup_$date.mbox ;}
else [ $OUT -eq 1 ]
exit 1
fi

## Cleanup older than 3 days backups
find $WORKDIR/backups/* -mtime +3 -exec rm {} ;
cd $WORKDIR && { rm -rf oldmail-* ;}

This script will:

  1. Run getmail using the getmailrc config file you previously worked on.
  2. If the above command will be successful, it’ll create a_ **backups**_ dir into **$HOME/.getmail** and move the latest Mbox file there appending a date and time to its name. (by doing this we are sure next getmail run will happen on an empty **backup.mbox** file, thus it will just contain the **latest** content from your mailbox)
  3. It’ll re-create a backup.mbox file on $HOME/.getmail to avoid the next getmail run to fail.
  4. In the end, it’ll clean up older than 3 days backups to avoid a too crowded backups folder. (it removes the oldmail file as well since it is useless in our case)

In the end set up a cronjob that will run the above script and generate the backups for you every one hour:

0 * * * * $HOME/.getmail/getmail_run.sh > /dev/null

Feel free to let me know if you’ve encountered any issue while following the above HowTo. Enjoy!

* /usr/share/getmail4/getmailcore/_retrieverbases.py on line 901.

** More documentation about the getmailrc file and syntax can be found on getmail’s documentation page.