Andrea Veri's Blog Me, myself and I

GNOME Infrastructure updates

As you may have noticed from outage and maintenance notes we sent out last week the GNOME Infrastructure has been undergoing a major redesign due to the need of moving to a different datacenter. It’s probably a good time to update the Foundation membership, contributors and generally anyone consuming the multitude of services we maintain of what we’ve been up to during these past months.

New Data Center

One of the core projects for 2020 was moving off services from the previous DC we were in (located in PHX2, Arizona) over to the Red Hat community cage located in RAL3. This specific task was made possible right after we received a new set of machines that allowed us to refresh some of the ancient hardware we had (with the average box dating back to 2013). The new layout is composed of a total of 5 (five) bare metals and 2 (two) core technologies: Openshift (v. 3.11) and Ceph (v. 4).

The major improvements that are worth being mentioned:

  1. VMs can be easily scheduled across the hypervisors stack without having to copy disks over across hypervisors themselves. VM disks and data is now hosted within Ceph.
  2. IPv6 is available (not yet enabled/configured at the OS, Openshift router level)
  3. Overall better external internet uplink bandwidth
  4. Most of the VMs that we had running were turned into pods and are now successfully running from within Openshift

RHEL 8 and Ansible

One of the things we had to take into account was running Ceph on top of RHEL 8 to benefit from its containarized setup. This originally presented itself as a challenge due to the fact RHEL 8 ships with a much newer Puppet release than the one RHEL 7 provides. At the same time we didn’t want to invest much time in upgrading our Puppet code base due to the amount of VMs we were able to migrate to Openshift and to the general willingess of slowly moving to use Ansible (client-side, no more need of maintaining server side pieces). On this specific regard we:

  1. Landed support for RHEL 8 provisioning
  2. Started experimenting with Image Based deployments (much more faster than Cobbler provisioning)
  3. Cooked a set of base Ansible roles to support our RHEL 8 installs including IDM, chrony, Satellite, Dell OMSA , NRPE etc.

Openshift

As originally announced, the migration to the Openshift Container Platform (OSCP) has progressed and we now count a total of 34 tenants (including the entirety of GIMP websites). This allowed us to:

  1. Retire running VMs and prevented the need to upgrade their OS whenever they’re close to EOL. Also, in general, less maintenance burden
  2. Allow the community to easily provision services on top of the platform with total autonomy by choosing from a wide variety of frameworks, programming languages and database types (currently Galera and PSQL, both managed outside of OSCP itself)
  3. Easily scale the platform by adding more nodes/masters/routers whenever that is made necessary by additional load
  4. Data replicated and made redundant across a GlusterFS cluster (next on the list will be introducing Ceph support for pods persistent storage)
  5. Easily set up services such as Rocket.Chat and Discourse without having to mess much around with Node.JS or Ruby dependencies

Special thanks

I’d like to thank Bartłomiej Piotrowski for all the efforts in helping me out with the migration during the past couple of weeks and Milan Zink from the Red Hat Storage Team who helped out reviewing the Ceph infrastructure design and providing useful information about possible provisioning techniques.

The GNOME Infrastructure is moving to Openshift

During GUADEC 2018 we announced one of the core plans of this and the coming year: it being moving as many GNOME web applications as possible to the GNOME Openshift instance we architected, deployed and configured back in July. Moving to Openshift will allow us to:

  1. Save up on resources as we’re deprecating and decommissioning VMs only running a single service
  2. Allow app maintainers to use the most recent Python, Ruby, preferred framework or programming language release without being tied to the release RHEL ships with
  3. Additional layer of security: containers
  4. Allow app owners to modify and publish content without requiring external help
  5. Increased apps redundancy, scalability, availability
  6. Direct integration with any VCS that ships with webhooks support as we can trigger the Openshift provided endpoint whenever a commit has occurred to generate a new build / deployment

Architecture

The cluster consists of 3 master nodes (controllers, api, etcd), 4 compute nodes and 2 infrastructure nodes (internal docker registry, cluster console, haproxy-based routers, SSL edge termination). For the persistent storage we’re currently making good use of the Red Hat Gluster Storage (RHGS) product that Red Hat is kindly sponsoring together with the Openshift subscriptions. For any app that might require a database we have an external (as not managed as part of Openshift) fully redundant, synchronous, multi-master MariaDB cluster based on Galera (2 data nodes, 1 arbiter).

The release we’re currently running is the recently released 3.11, which comes with the so-called “Cluster Console”, a web UI that allows you to manage a wide set of the underlying objects that previously were only available to the oc cli client and with a set of Monitoring and Metrics toolings (Prometheus, Grafana) that can be accessed as part of the Cluster Console (Grafana dashboards that show how the cluster is behaving) or externally via their own route.

SSL Termination

The SSL termination is currently happening on the edge routers via a wildcard certificate for the gnome.org and guadec.org zones. The process of renewing these certificates is automated via Puppet as we’re using Let’s Encrypt behind the scenes (domain verification for the wildcard certs happen at the DNS level, we built specific hooks in order to make that happen via the getssl tool). The backend connections are following two different paths:

  1. edge termination with no re-encryption in case of pods containing static files (no logins, no personal information ever entered by users): in this case the traffic is encrypted between the client and the edge routers, plain text between the routers and the pods (as they’re running on the same local broadcast domain)
  2. re-encrypt for any service that requires authentication or personal information to be entered for authorization: in this case the traffic is encrypted from end to end

App migrations

App migrations have started already, we’ve successfully migrated and deprecated a set of GUADEC-related web applications, specifically:

  1. $year.guadec.org where $year spaces from 2013 to 2019
  2. wordpress.guadec.org has been deprecated

We’re currently working on migrating the GNOME Paste website making sure we also replace the current unmaintained software to a supported one. Next on the list will be the Wordpress-based websites such as www.gnome.org and blogs.gnome.org (Wordpress Network). I’d like to thank the GNOME Websites Team and specifically Tom Tryfonidis for taking the time to migrate existing assets to the new platform as part of the GNOME websites refresh program.

Back from GUADEC 2018

Been a while since GUADEC 2018 has ended but subsequent travels and tasks reduced the time to write up a quick summary of what happened during this year’s GNOME conference. The topics I’d like to emphasize mainly are:

  • We’re hiring another Infrastructure Team member
  • We’ve successfully finalized the cgit to GitLab migration
  • Future plans including the migration to Openshift

GNOME Foundation hirings

With the recent donation of 1M the Foundation has started recruiting on a variety of different new professional roles including a new Infrastructure team member. On this side I want to make sure that although the job description mentions the word sysadmin the figure we’re looking for is a systems engineer with a proven experience on Cloud computing platforms and tools such as AWS, Openshift and generally configuration management softwares such as Puppet and Ansible. Additionally this person should prove to have a clear understanding of the network and operating system (mainly RHEL) layers.

We’ve already identified a set of candidates and will be proceeding with interviews in the coming weeks. This doesn’t mean we’ve hired anyone already, please keep sending CVs if interested and feeling the position would match your skills and expectations.

cgit to GitLab

As announced on several occasions the GNOME Infrastructure has successfully finalized the cgit to GitLab migration. From a read-only view against .git directories to a fully compliant CI/CD infrastructure. The next step on this side will be deprecating Bugzilla which has already started with bugmasters turning products read-only in case they were migrated to the new platform or by identifying whether any of the not-yet-migrated products can be archived. The idea here is waiting to see zero activity on BZ in terms of new comments to existing bugs and no new bugs being submitted at all (we have redirects in place to make sure whenever enter_bug.cgi is triggered the request gets sent to the /issues/new endpoint for that specific GitLab project) and then turn the entire BZ instance into an HTML archive for posterity and to reduce the maintenance burden of keeping an instance up-to-date with upstream in terms of CVEs.

Thanks to all the involved parties including Carlos, Javier and GitLab itself given the prompt and detailed responses they always provided to our queries. Also thanks for sponsoring our AWS activities related to GitLab!

Future plans

With the service == VM equation we’ve been following for several years it’s probably time for us to move to a more scalable infrastructure. The next generation platform we’ve picked up is going to be Openshift, its benefits:

  1. It’s not important where a service runs behind scenes: it only has to run (VM vs pods and containers that are part of a pod that get scheduled randomly on the available Openshift nodes)
  2. Easily scalable in case additional resources are needed for a small period of time
  3. Built-in monitoring starting from Openshift 3.9 (the release we’ll be running) based on Prometheus (+ Grafana for dashboarding)
  4. GNOME services as containers
  5. Individual application developers to schedule their own builds and see their code deployed with one click directly in production

The base set of VMs and bare metals has been already configured. Red Hat has been so great to provide the GNOME Project with a set of Red Hat Container Platform (and GlusterFS for heketi-based brick provisioning) subscriptions. We’ll start moving over to the infrastructure in the coming weeks. It’s going to take some good time but in the end we’ll be able to free up a lot of resources and retire several VMs and related deprecated configurations.

Misc

Slides from the Foundation AGM Infrastructure team report are available here.

Many thanks to the GNOME Foundation for the sponsorship of my travel!

GUADEC 2018 Badge

Adding reCAPTCHA v2 support to Mailman

As a follow-up to the reCAPTCHA v1 post published back in 2014 here it comes an updated version for migrating your Mailman instance off from version 1 (being decommissioned on the 31th of March 2018) to version 2. The original python-recaptcha library was forked into https://github.com/redhat-infosec/python-recaptcha and made compatible with reCAPTCHA version 2.

The relevant changes against the original library can be resumed as follows:

  1. Added ‘version=2’ against displayhtml, load_scripts functions
  2. Introduce the v2submit (along with submit to keep backwards compatibility) function to support reCAPTCHA v2
  3. The updated library is backwards compatible with version 1 to avoid unexpected code breakages for instances still running version 1

The required changes are located on the following files:

/usr/lib/mailman/Mailman/Cgi/listinfo.py

--- listinfo.py	2018-02-26 14:56:48.000000000 +0000
+++ /usr/lib/mailman/Mailman/Cgi/listinfo.py	2018-02-26 14:08:34.000000000 +0000
@@ -31,6 +31,7 @@
 from Mailman import i18n
 from Mailman.htmlformat import *
 from Mailman.Logging.Syslog import syslog
+from recaptcha.client import captcha
 
 # Set up i18n
 _ = i18n._
@@ -244,6 +245,10 @@
     replacements['<mm-lang-form-start>'] = mlist.FormatFormStart('listinfo')
     replacements['<mm-fullname-box>'] = mlist.FormatBox('fullname', size=30)
 
+    # Captcha
+    replacements['<mm-recaptcha-javascript>'] = captcha.displayhtml(mm_cfg.RECAPTCHA_PUBLIC_KEY, use_ssl=True, version=2)
+    replacements['<mm-recaptcha-script>'] = captcha.load_script(version=2)
+
     # Do the expansion.
     doc.AddItem(mlist.ParseTags('listinfo.html', replacements, lang))
     print doc.Format()

/usr/lib/mailman/Mailman/Cgi/subscribe.py

--- subscribe.py	2018-02-26 14:56:38.000000000 +0000
+++ /usr/lib/mailman/Mailman/Cgi/subscribe.py	2018-02-26 14:08:18.000000000 +0000
@@ -32,6 +32,7 @@
 from Mailman.UserDesc import UserDesc
 from Mailman.htmlformat import *
 from Mailman.Logging.Syslog import syslog
+from recaptcha.client import captcha
 
 SLASH = '/'
 ERRORSEP = '\n\n<p>'
@@ -165,6 +166,17 @@
             results.append(
     _('There was no hidden token in your submission or it was corrupted.'))
             results.append(_('You must GET the form before submitting it.'))
+
+    # recaptcha
+    captcha_response = captcha.v2submit(
+        cgidata.getvalue('g-recaptcha-response', ""),
+        mm_cfg.RECAPTCHA_PRIVATE_KEY,
+        remote,
+    )
+
+    if not captcha_response.is_valid:
+        results.append(_('Invalid captcha: %s' % captcha_response.error_code))
+
     # Was an attempt made to subscribe the list to itself?
     if email == mlist.GetListEmail():
         syslog('mischief', 'Attempt to self subscribe %s: %s', email, remote)

/usr/lib/mailman/templates/en/listinfo.html

--- listinfo.html	2018-02-26 15:02:34.000000000 +0000
+++ /usr/lib/mailman/templates/en/listinfo.html	2018-02-26 14:18:52.000000000 +0000
@@ -3,7 +3,7 @@
 <HTML>
   <HEAD>
     <TITLE><MM-List-Name> Info Page</TITLE>
-  
+    <MM-Recaptcha-Script> 
   </HEAD>
   <BODY BGCOLOR="#ffffff">
 
@@ -116,6 +116,11 @@
       </tr>
       <mm-digest-question-end>
       <tr>
+      <tr>
+        <td>Please fill out the following captcha</td>
+        <td><mm-recaptcha-javascript></TD>
+      </tr>
+      <tr>
 	<td colspan="3">
 	  <center><MM-Subscribe-Button></center>
     </td>

The updated RPMs are being rolled out to Fedora, EPEL 6 and EPEL 7. In the meantime you can find them here.

A childhood’s dream

Six months since my latest blog post is definitely a lot and reminds me how difficult this year has been for me in many ways. Back in June 2015 I received a job proposal as a Systems and Network Engineer from a company located in Padova, a city in the north-east part of Italy which is around 150km (around 93 miles) away from my home-town. The offer looked very interesting and I went for it. The idea I originally had was to relocate there but the extremely high costs of rents (the city is well known in Italy as one of the best places to take University courses on several faculties) and the fact I really wanted to experiment on whether the job description was actually going to match the expectations I had, I opted to not relocate at all and travel daily to work by train (a total of 4 hours per day spent travelling).

I still recall one of my friends saying me “I am sure you won’t make it for more than two weeks” and how satisfying has been mentioning him a few days ago I did make it for roughly one year. Spending around 4 hours every day on a train hasn’t been that fun but the passion and love for what I’ve been doing has helped me overcoming the difficulties I encountered. At some point something I honestly was expecting arrived: my professional grow at the company was completely stuck. While I initially told myself the moment was surely going to not persist much longer, the opposite happened.

For a passionate, enthusiast and “hungry” person like I am that was one of the worst situations that could ever happened and it was definitely time for me to look at new opportunities. That’s where my childhood’s dream became reality: I will be joining the Platform Operations Team at Red Hat as a System Administrator starting from mid-July! Being part of a great family which cares about Open Source and its values makes me proud and I would really like to thank Red Hat for this incredible opportunity.

On a side note I will be in Brno between the 14th and the 16th of July, please drop me a note if you want to have a drink together!