Gentoo Infrastructure

Revision as of 13:20, 10 January 2014 by Chrigu (Talk | contribs)


Overview

This article describes how we plan on using gentoo as an infrastructure backbone for creating a complete and modern IT architecture.

Glossary

@TODO We need to clean up some terms already (for instance the portage vs puppet profile thing) A glossary should help us define term more closely (and stick to the definitions).

portage profile
A profile in gentoo portage. Defines either a system or application stack for portage.
portage build profile
A profile in gentoo portage. Based of a system profile but used during the build phase of the binary packages used in the final deploy.
puppet profile
A puppet profile contains the implementation logic of how to install and configure an aspect of a system.
stack
A stack contains a complete and deployable product that may be provisioned and used. Stack have very simple inheritance letting the admin create stack trees based on each other. For instance a Ruby on Rails stack will be based of of a ruby stack which is based off a linux stack.

Required components

  • Build host(s) for binary packages
  • HTTP server for serving binary packages and distfiles (required by the ebuilds)
  • Git clone of official portage tree
  • Overlay(s)
  • Own portage profile(s)
  • rsync or Git server for serving the Overlay and the portage profiles
  • Stage3 building system
  • Puppet for configuration management and software installation
  • Git version control for everything (overlays, portage profiles, puppet manifests and scripts/code)
  • Install host (PXE boot / TFTP / DHCP)
    • emc/puppetlabs razor can do this but needs some work for gentoo
  • Automatic base installation script
    • also in the scope of razor
  • Separation of development, staging and production environments
    • tagged and managed in git
  • PKI environment (with dedicated sub CAs) for X509 certificates (used for Puppet, server and client certs etc.)
  • git web interface (make dotfiles and frozen clones accessible to power-users)
  • Central authentication service
  • DNS, DHCP and NTP services
  • Monitoring and alarming system
  • Logging
  • versioning for everything (if it is a committable file, use semver on its repo)

Binary package requirements

  • Ability to build and install binary packages with the same version but different USE flags. For example, MySQL server package (-minimal and MySQL client & libs package minimal)
    • don't go there: this imposes a significant amount of maintenance work and may still break. Rather provide large enough base sets and accept that some packages install too much (you can still disable them at runtime) and build the few deviations from the rule on the servers from source --Tiziano (talk) 14:39, 3 January 2014 (CET)
      • Yes, we need to and can go there :-) I agree with you, that we should do this only if necessary, apache for example can be built once and has the ability to turn features (module loading) on/off via its configuration. Other software does not provide such run-time configuration which results in unwanted server-software and dependencies on the installed hosts (net-analyzer/zabbix for example). I clearly do not want to have a dedicated build environment for each of those packages, I would rather see a build env, called minimal for example, which is used to build all those database packages with only lib and clients enabled (use the same env for PostgreSQL, OpenLDAP, MySQL etc.). As stated before, the whole build process needs to be automated, so I don't see a considerable increase of maintenance work coming up here. The dependency problem is mitigated through the fact that we have a frozen portage tree for all our build envs and therefore use the same versions everywhere. --Chrigu (talk) 12:04, 6 January 2014 (CET)
      • Yes and no on this one. We clearly need to keep the list of packages that require this at bare minimum. net-analyzer/zabbix for instance doesn't warrant this, we just won't start the server on non server nodes. Easy as cake. The server code and it's deps wont do any harm on say a desktop or other server box. Even though I can't think of example, I do believe we will be needing this possibility when we encounter packages that need to be built using different profiles for different use cases, things like having a php with-curlwrappers vs one with the curl module sans curlwrappers. The important point I take from this is that creating new profiles with small deviations from our default must be very easy (ie. not much work). Basically we need the infras support for n different build profiles to be fully automated and well documented. Lucas (talk) 19:52, 9 January 2014 (CET)
  • Providing binary packages for different major (and sometimes minor) versions, for example: dev-db/mysql-5.X.Y and dev-db/mysql-6.X.Y.
  • Provide binary packages for pre-compiled Linux kernels and modules (not just a binary package of sys-kernel/gentoo-sources)
        • The net-analyzer/zabbix is definitely a good example, I don't want to install and maintain MySQL, Apache, PHP, snmpd (including all the deps) etc. on hosts which just need a Zabbix agent. I would also like to pragmatically avoid unused deps, in order to minimize reverse-updates and security updates (which must be provided nonetheless if the software is in use or not). --Chrigu (talk) 13:20, 10 January 2014 (CET)
    • This makes it possible to build stage4 images from binary packages.
    • Most likely there will be separate packages for servers and desktops built with different genkernel configs.
  • Handle reverse dependency updates and ABI changes

Build host requirements

  • Build binary package for all required software
  • Support for multiple environments (development, staging and production)
  • Support for multiple architectures (such as x86, amd64 etc.)
  • Support for multiple build profiles
    • system (or base) profile, such as desktop or server (stage3) (all the packages contained within the /etc/portage/make.profile or via emerge @system)
    • application profiles, such as php5-app, django-app etc.)
    • simple inheritance is used for things like python-app -> django-app
    • stacks consist of one system profile and multiple application profiles
    • don't do this: Gentoo itself has only a few profiles and even there issues arise when combining them (for example desktop + selinux-hardened) --Tiziano (talk) 14:40, 3 January 2014 (CET)
      • Those are build-profiles (for example chroots or some sort of overlay-fs) not Gentoo (portage) profiles, we definitely need to clarify those terms ;) --Chrigu (talk) 20:01, 5 January 2014 (CET)
  • All build profiles will use a system profile as their base profile
  • Ability to update an existing build profile, without the need to build it from scratch
  • Ability to do fully automated clean builds (ie. for new archs or new stacks)
  • Ability to automatically update all development profiles on a predefined frequency such as daily, weekly or monthly an be notified about build failures
    • jenkins ci can do this using one jenkins master and a least one build slave per architecture.
    • Other options would be travis ci (not ready for in-house use) or cruise control
    • Rabe already has a jenkins instance: [1]. The instance Jenkins-01 is more or less modern and should be easy to reintegrate with puppet.
  • Each build profile stores the built binary packages under a per-defined directory which will be accessible via a HTTP URL such as https://packages.example.com/ENVIRONMENT/gentoo/ARCH/BUILD-PROFILE-NAME.
  • Application build profiles stores only the extra packages within the above directory, packages included in a base profile won't be duplicated.
  • Old or no longer supported packages will be removed automatically
  • Build a stage 3 tarball, which can be used for the automatic installation via PXE/TFTP.
    • must be able to build a stage tarball for each of the available environment-arch-system profile combinations
  • Handle reverse dependency updates and ABI changes (aka revdep-rebuild)
  • Handle perl and python (maybe more) dependency updates (aka perl-cleaner & python-updater)
  • Ability to build kernel and modules

Portage tree clone requirements

  • The official portage tree needs to be cloned via Git, which basically enables one to:
    • keep the control over portage tree updates
    • provide an old version of the tree
    • cherry pick updates
      • this should be avoided at all cost since it can lead to various sorts of breakages (ebuild <-> ebuild, ebuild <-> eclass, ebuild <-> profile, eclass <-> profile interaction) --Tiziano (talk) 14:24, 3 January 2014 (CET)
        • Yes, I agree. Nonetheless, we need the possibility to do cherry picking, for example to react on zero-day exploits. --Chrigu (talk) 19:53, 5 January 2014 (CET)
  • Support for a development, staging and production branch
    • Ability to automatically sync from upstream
    • Easy merge support from one branch to the next higher one (staging -> production)
  • Notification support for new GLSAs which affect packages within the cloned trees.
    • Either via automatic update and merge of /usr/portage/metadata/glsa or via external mechanisms such as consulting the RDF feed.
    • Having an inventory by collecting puppet facts allows to check for security updates in a central location --Tiziano (talk) 14:31, 3 January 2014 (CET)

Portage overlay requirements

  • One Git based portage overlay
    • Contains own portage profiles
    • Contains own or modified ebuilds or legacy ones removed from the official tree
  • Support for development, staging and production environment (via Git branches)
  • Layman compatibility
    • Portage has now direct repository support (as has cave/paludis) and layman may be omitted --Tiziano (talk) 14:32, 3 January 2014 (CET)

Portage profile requirements

  • Multiple Portage profiles stored within the overlay.
    • One for base, desktop and server (maybe more in the future, such as streambox)
      • desktop and server both inherit from the base profile which serves as the lowest common denominator.
  • Support for multiple architectures (such as x86 and amd64)
    • Avoid definition duplications via parent profile inheriting.
  • All the profiles have an official Gentoo profile as their master
  • Profiles include only packages belonging to a base system, not an application stack (those will be managed via puppet recipes)
  • Profiles can be used to unmask packages required but not belonging to the base system
  • Profiles sets all the default values for the client's make.conf, such as USE flags, BINHOSTS, GENTOO_MIRRORS, CFLAGS, CHOST etc.
    • Warning: many such variables are not incremental and therefore need duplication of Gentoo base profile variables (requiring that someone tracks changes in those variables) --Tiziano (talk) 14:29, 3 January 2014 (CET)
  • keep the profiles (and the inheritance structure) as simple as possible, rather duplicate than inherit for small deviations to avoid inheritence issues --Tiziano (talk) 14:33, 3 January 2014 (CET)

Package host requirements

File mirror host requirements

  • Hosts all the files required to build a package (GENTOO_MIRRORS=mirror.example.com/public/gentoo/distfiles)
    • Acts as a caching mirror for already downloaded packages from an official mirror
    • Serves fetch-restricted files (dev-java/oracle-jdk-bin for example), to authorized clients
  • Files are served via HTTPS
  • Distinguishes between three groups of files
    • public: Files which are available to all clients (theoretically even to the entire internet)
    • site-local: Files which are only available to authenticated clients belonging to the same infrastructure (for example those which would put us into legal troubles if available to the public)
    • stack-local: Files which are only available to authenticated clients belonging to the same infrastructure and the software stack group (private files of a specific customer)
  • Provides an easy way to let an administrator manually upload new files, for example via WebDAV-CGI, SFTP or a similar mechanism.
  • Possibility to authenticate clients either via HTTP basic auth or client certificates.
  • Old or no longer supported files will be removed automatically
  • Can be implemented on the build host

Puppet requirements


  • Support for all three environments (development, staging and production)
  • Version controlled via Git
  • ENC and hiera support with data from ldap
  • Puppet recipes for
    • installing, updating, removing and (re-)configuring specific software belonging to an application stack (see build host).
    • (re-)configuring software belonging to a system stack
    • Updating the system stack (emerge @system) aka system update.
    • installing, updating and removing of kernel packages (including the handling of the ensuing reboot)
  • use best-of-breed tools like hiera and augeas (this might mean targeting 3.3.x due to module data support in ARM-9)
  • Use a sane prexisting puppet architecture concept


Install host requirements

  • Ability to install physical and virtual machines
  • Distinguish machines by their Ethernet MAC address
  • Provide a PXE/TFTP boot mechanism
  • Partition and format the (virtual) harddisks
  • Install a stage3 image which was built by the build host
  • Bootstrap puppet, enabling it to take over the individual installation and customization.
  • Group hosts into
    • environments (development, staging and production)
    • architectures (such as x86, amd64 etc.)
    • portage profiles (system profiles such as desktop and server)
    • stacks (comprising a complete product as a service with the underlying infrastructure) this is the task of Puppet --Chaf (Diskussion) 09:42, 19. Dez. 2013 (CET)

Public key infrastructure requirements

  • Local certificate authority for signing X.509 certificates.
  • Master certificate authority root certificate which is only used to sign Sub-CA certificates
  • Sub certificate authorities used for various cases such as
    • Puppet certificates [2]
    • User certificates
    • Client certificates
    • Host certificates
  • Ability to sign, revoke and extend certificates
  • Publish certificate revocation status either via CRL and/or OCSP
    • CRL is not worth the hassle due to it not defining how often the CRL must be consulted. Since we are in the same physical net OCSP should be far superior here (thank to its live checking support). On the other hand puppet does not do OCSP yet (redmine: #110111) so we might need to implement both or implement OCSP as well as develop our own automated revocation for puppet.
  • Choose DNs below dc=rabe,dc=ch
  • register a PEN-OID as issued by IANA if custom schema work is required
    • Use a @rabe email when requesting a PEN at IANA, last time the @purplehaze.ch was a problem!
  • Some of the aforementioned sub-CAs might be implemented as robot CAs with a self service interface (ie for authorized users).
  • Consider using CMP or CMC as an API to signing, revoking et. al.
    • Since the underlying RFCs of both these protocols are rather new they are not yet broadly supported.
  • Keep local root CA offline!
    • Maybe use an old netbook as root CA :P
  • Support GPG keys for signing packages

Git hosting requirements

  • Public repositories hosted on GitHub (mainly) under the radiorabe organization (almost anything which doesn't leak sensitive informations)
  • Private repositories hosted on the internal infrastructure
    • Accessible via https and a web interface
    • contains some repos with uber-private data the gets compartmentalized even further (ie. hiera datafiles in different repos)
  • One repository per component
  • Daily backup of all repositories
  • Branches for development, staging and production
    • New features are added to the development branch only and later merged up to staging and production
  • Must support pull-requests so we can implement a review process (when pulling through the envs)
    • Sing-Offing might also be required
  • Adhere to Semantic Versioning for version/release tags.
    • Tag releases as vX.Y.Z those will be automatically appear on GitHub as downloadable tarballs, which can be referenced within the corresponding ebuilds.
    • Hit 1.0.0 as soon as code lands on production or earlier
    • Commit .lock files when reaching 1.0.0 where applicable (Gemfile.lock, composer.lock) or earlier if needed
  • Must be able to trigger remote events (ie. update master through mcollective after code was promoted to production in a PR)
  • Support the git-flow branching model

Messaging requirements

  • I'm talking AMPQ, JMS, STOMP, 0MQ and the likes
    • not sure if we need something in this space for the infra
    • it could facilitate comms between components
    • stuff like mcollective and RadioDNS need something in this space

Monitoring, logging and alarming system requirements

@TODO

  • centralized logging is used throughout
    • with tools that help find and fix problems and do post mortems
  • all systems are always monitored by a full monitoring suite
  • the monitoring suite must support alarming users through multiple paths
    • alarming should include a fallback strategy and a way to acknowledge alarms
    • it must have a easy way to configure scheduled maintenance either before or while the maintenance is undergoing
  • monitoring, logging and alarming are all automatically configured during regular provisioning of machines
  • alerting uses jabber by default with fallbacks to email and sms-through-gsm depending on the site.

Implementation proposal

Build host proposal

The build host consists out of various chroots to build binary packages for multiple environments, architectures and build profiles.

Links

  • packer.io can be used to build stage4 (containing a kernel) images and seems to work for gentoo. Packer often gets used to build Vagrant boxes.

Portage tree clone proposal

Portage overlay proposal

Portage profile proposal

Package and file mirror proposal

Puppet proposal

  • Adhere to Craig Dunns architecture [3]
    • on the system level (ie for each bar-metal or virtual machine)
    • on the architecture level (ie. in the cloud-fabric)
      • roles contains the business view (ie. role::cloud-storage, role::product1)
      • profiles contain the implementation (ie profile::storage-cluster, profile::storage-webinterface-farm)
  • Keep profiles, roles (as per craig) and Puppetfile in github.com/radiorabe/puppet
    • This is where we keep feature/*, develop and master (ie staging) branches
    • An internal clone then contains all these + production (what exactly is in prodution, ie. our release schedule is considered sensitive in this implementation)
    • This lets us use the git-flow branching model with almost no changes (the one change being us gating stuff into production on the closed clone)
    • github may use hooks to push content to our internal git when they happen
  • All other modules need their own repo and must be published to the puppet module forge
  • Use librarian-puppet (or r10k) for composing the final puppet envs
    • r10k eschews git submodule support we used in puppet-syslogng but has support for multiple envs out of the box
    • librarian-puppet would need to be run once per environment to achieve what r10k does
  • provide develop, master and production branches from private repo as puppet environments on master

Install host proposal

  • use the existing server on tftp-01 on the RaBe infra as a shortcut
    • replace that instance with one native to the infra when it is ready for that
  • iPXE [4]

Links

  • Tools that run puppet on freshly installed machines (and also do some provisioning)
    • puppetlabs razor bare metal/cloud provisioning tool
    • vagrant cloud provisioning aimed at provisioning developer boxes (with virtualbox). Has 3rd party support for various cloud systems. Vagrant might be interesting for creating dev clouds. I've seen this being used on production sites.

Public key infrastructure proposal

  • write certificate policy (in german!)
  • hold a key ceremony for the root and level 1
    • offline ceremony on an old netbook with centos or similar (not debian, probably not gentoo to make this happen soonish)
    • Sign RaBe root cert and level 1 intermediate cert
    • store root cert key on 2 sdcards and as 1 printout somewhere safely
    • store level 1 intermediate key on sdcards for use by admins
  • use level 1 intermediate key to sign level 2 cas as needed
    • level 2 robot ca key for puppet (managed by puppet ca)
    • level 2 ca for client certs
    • level 2 ca for host certs
    • more level 2 certs
  • use OpenSSL as default software for PKI
    • ssl has the largest userbase which should make it easier on new admins
    • features that openssl does not implement get used as soon as openssl catches up (ie. CMP)

git hosting proposal

  • gitlab seems nice even though is is ruby on rails under the hood

Links

Last modified on 10 January 2014, at 13:20