Andrea Veri's Blog Me, myself and I

Three years and counting

It’s been a while since my last “what’s been happening behind the scenes” e-mail so I’m here to report on what has been happening within the GNOME Infrastructure, its future plans and my personal sensations about a challenge that started around three (3) years ago when Sriram Ramkrishna and Jeff Schroeder proposed my name as a possible candidate for coordinating the team that runs the systems behind the GNOME Project. All this followed by the official hiring achieved by Karen Sandler back in February 2013.

The GNOME Infrastructure has finally reached stability both in terms of reliability and uptime, we didn’t have any service disruption this and the past year and services have been running smoothly as they were expected to in a project like the one we are managing.

As many of you know service disruptions and a total lack of maintenance were very common before I joined back in 2013, I’m so glad the situation has dramatically changed and developers, users, passionates are now able to reach our websites, code repositories, build machines without experiencing slowness, downtimes or

unreachability. Additionally all these groups of people now have a reference point they can contact in case they need help when coping with the infrastructure they daily use. The ticketing system allows users to get in touch with the members of the Sysadmin Team and receive support right away within a very short period of time (Also thanks to Pagerduty, service the Foundation is kindly sponsoring)

Before moving ahead to the future plans I’d like to provide you a summary of what has been done during these roughly three years so you can get an idea of why I define the changes that happened to the infrastructure a complete revamp:

  1. Recycled several ancient machines migrating services off of them while consolidating them by placing all their configuration on our central configuration management platform ran by Puppet. This includes a grand total of 7 machines that were replaced by new hardware and extended warranties the Foundation kindly sponsored.
  2. We strenghten our websites security by introducing SSL certificates everywhere and recently replacing them with SHA2 certificates.
  3. We introduced several services such as Owncloud, the Commits Bot, the Pastebin, the Etherpad, Jabber, the GNOME Github mirror.
  4. We restructured the way we backup our machines also thanks to the Fedora Project sponsoring the disk space on their backup facility. The way we were used to handle backups drastically changed from early years where a magnetic tape facility was in charge of all the burden of archiving our data to today where a NetApp is used together with rdiff-backup.
  5. We upgraded Bugzilla to the latest release, a huge thanks goes to Krzesimir Nowak who kindly helped us building the migration tools.
  6. We introduced the GNOME Apprentice program open-sourcing our internal Puppet repository and cleansing it (shallow clones FTW!) from any sensitive information which now lives on a different repository with restricted access.
  7. We retired Mango and our OpenLDAP instance in favor of FreeIPA which allows users to modify their account information on their own without waiting for the Accounts Team to process the change.
  8. We documented how our internal tools are customized to play together making it easy for future Sysadmin Team members to learn how the infrastructure works and supersede existing members in case they aren’t able to keep up their position anymore.
  9. We started providing hosting to the GIMP and GTK projects which now completely rely on the GNOME Infrastructure. (DNS, email, websites and other services infrastructure hosting)
  10. We started providing hosting not only to the GIMP and GTK projects but to localized communities as well such as GNOME Hispano and GNOME Greece
  11. We configured proper monitoring for all the hosted services thanks to Nagios and Check-MK
  12. We migrated the IRC network to a newer ircd with proper IRC services (Nickserv, Chanserv) in place.
  13. We made sure each machine had a configured management (mgmt) and KVM interface for direct remote access to the bare metal machine itself, its hardware status and all the operations related to it. (hard reset, reboot, shutdown etc.)
  14. We upgraded MoinMoin to the latest release and made a substantial cleanup of old accounts, pages marked as spam and trashed pages.
  15. We deployed DNSSEC for several domains we manage including gnome.org, guadec.es, gnomehispano.es, guadec.org, gtk.org and gimp.org
  16. We introduced an account de-activation policy which comes into play when a contributor not committing to any of the hosted repositories at git.gnome.org since two years is caught by the script. The account in question is marked as inactive and the gnomecvs (from the old cvs days) and ftpadmin groups are removed.
  17. We planned mass reboots of all the machines roughly every month for properly applying security and kernel updates.
  18. We introduced Mirrorbrain (MB), the mirroring service serving GNOME and related modules tarballs and software all over the world. Before introducing MB GNOME had several mirrors located in all the main continents and at the same time a very low amount of users making good use of them. Many organizations and companies behind these mirrors decided to not host GNOME sources anymore as the statistics of usage were very poor and preferred providing the same service to projects that really had a demand for these resources. MB solved all this allowing a proper redirect to the closest mirror (through mod_geoip) and making sure the sources checksum matched across all the mirrors and against the original tarball uploaded by a GNOME maintainer and hosted at master.gnome.org.

I can keep the list going for dozens of other accomplished tasks but I’m sure many of you are now more interested in what the future plans actually are in terms of where the GNOME Infrastructure should be in the next couple of years.

One of the main topics we’ve been discussing will be migrating our Git infrastructure away from cgit (which is mainly serving as a code browsing tool) to a more complete platform that is surely going to include a code review tool of some sort. (Gerrit, Gitlab, Phabricator)

Another topic would be migrating our mailing lists to Mailman 3 / Hyperkitty. This also means we definitely need a staging infrastructure in place for testing these kind of transitions ideally bound to a separate Puppet / Ansible repository or branch. Having a different repository for testing purposes will also mean helping apprentices to test their changes directly on a live system and not on their personal computer which might be running a different OS / set of tools than the ones we run on the GNOME Infrastructure.

What I also aim would be seeing GNOME Accounts being the only authentication resource in use within the whole GNOME Infrastructure. That means one should be able to login to a specific service with the same username / password in use on the other hosted services. That’s been on my todo list for a while already and it’s probably time to push it forward together with Patrick Uiterwijk, responsible of Ipsilon‘s development at Red Hat and GNOME Sysadmin.

While these are the top priority items we are soon receiving new hardware (plus extended warranty renewals for two out of the three machines that had their warranty renewed a while back) and migrating some of the VMs off from the current set of machines to the new boxes is definitely another task I’d be willing to look at in the next couple of months (one machine (ns-master.gnome.org) is being decommissioned giving me a chance to migrate away from BIND into NSD).

The GNOME Infrastructure is evolving and it’s crucial to have someone maintaining it. On this side I’m bringing to your attention the fact the assigned Sysadmin funds are running out as reported on the Board minutes from the 27th of October. On this side Jeff Fortin started looking for possible sponsors and came up with the idea of making a brochure with a set of accomplished tasks that couldn’t have been possible without the Sysadmin fundraising campaign launched by Stormy Peters back in June 2010 being a success. The Board is well aware of the importance of having someone looking at the infrastructure that runs the GNOME Project and is making sure the brochure will be properly reviewed and published.

And now some stats taken from the Puppet Git Repository:

$ cd /git/GNOME/puppet && git shortlog -ns

3520 Andrea Veri
506 Olav Vitters
338 Owen W. Taylor
239 Patrick Uiterwijk
112 Jeff Schroeder
71 Christer Edwards
4 Daniel Mustieles
4 Matanya Moses
3 Tobias Mueller
2 John Carr
2 Ray Wang
1 Daniel Mustieles García
1 Peter Baumgarten

and from the Request Tracker database (52388 being my assigned ID):

mysql> select count(*) from Tickets where LastUpdatedBy = '52388';
+----------+
| count(*) |
+----------+
| 3613 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from Tickets where LastUpdatedBy = '52388' and Status = 'Resolved';
+----------+
| count(*) |
+----------+
| 1596 |
+----------+
1 row in set (0.03 sec)

It’s been a long run which made me proud, for the things I learnt, for the tasks I’ve been able to accomplish, for the great support the GNOME community gave me all the time and most of all for the same fact of being part of the team responsible of the systems hosting the GNOME Project. Thank you GNOME community for your continued and never ending backing, we daily work to improve how the services we host are delivered to you and the support we receive back is fundamental for our passion and enthusiasm to remain high!