Gentoo Infrastructure

From stoney cloud
Revision as of 12:18, 21 August 2014 by Tiziano (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

Overview

This article describes how we plan on using gentoo as an infrastructure backbone for creating a complete and modern IT architecture.

Glossary

@TODO We need to clean up some terms already (for instance the portage vs puppet profile thing) A glossary should help us define term more closely (and stick to the definitions).

portage profile
A profile in gentoo portage. Defines either a system or application stack for portage.
portage build profile
A profile in gentoo portage. Based of a system profile but used during the build phase of the binary packages used in the final deploy.
puppet profile
A puppet profile contains the implementation logic of how to install and configure an aspect of a system.
stack
A stack contains a complete and deployable product that may be provisioned and used. Stack have very simple inheritance letting the admin create stack trees based on each other. For instance a Ruby on Rails stack will be based of of a ruby stack which is based off a linux stack.

Required components

  • Build host(s) for binary packages
  • HTTP server for serving binary packages and distfiles (required by the ebuilds)
  • Git clone of official portage tree
  • Overlay(s)
  • Own portage profile(s)
  • rsync or Git server for serving the Overlay and the portage profiles
  • Stage3 building system
  • Puppet for configuration management and software installation
  • Git version control for everything (overlays, portage profiles, puppet manifests and scripts/code)
  • Install host (PXE boot / TFTP / DHCP)
    • emc/puppetlabs razor can do this but needs some work for gentoo
  • Automatic base installation script
    • also in the scope of razor
  • Separation of development, staging and production environments
    • tagged and managed in git
  • PKI environment (with dedicated sub CAs) for X509 certificates (used for Puppet, server and client certs etc.)
  • git web interface (make dotfiles and frozen clones accessible to power-users)
  • Central authentication service
  • DNS, DHCP and NTP services
  • Monitoring and alarming system
  • Logging
  • versioning for everything (if it is a committable file, use semver on its repo)

Binary package requirements

  • Ability to build and install binary packages with the same version but different USE flags. For example, MySQL server package (-minimal and MySQL client & libs package minimal)
    • don't go there: this imposes a significant amount of maintenance work and may still break. Rather provide large enough base sets and accept that some packages install too much (you can still disable them at runtime) and build the few deviations from the rule on the servers from source --Tiziano (talk) 14:39, 3 January 2014 (CET)
      • Yes, we need to and can go there :-) I agree with you, that we should do this only if necessary, apache for example can be built once and has the ability to turn features (module loading) on/off via its configuration. Other software does not provide such run-time configuration which results in unwanted server-software and dependencies on the installed hosts (net-analyzer/zabbix for example). I clearly do not want to have a dedicated build environment for each of those packages, I would rather see a build env, called minimal for example, which is used to build all those database packages with only lib and clients enabled (use the same env for PostgreSQL, OpenLDAP, MySQL etc.). As stated before, the whole build process needs to be automated, so I don't see a considerable increase of maintenance work coming up here. The dependency problem is mitigated through the fact that we have a frozen portage tree for all our build envs and therefore use the same versions everywhere. --Chrigu (talk) 12:04, 6 January 2014 (CET)
      • Yes and no on this one. We clearly need to keep the list of packages that require this at bare minimum. net-analyzer/zabbix for instance doesn't warrant this, we just won't start the server on non server nodes. Easy as cake. The server code and it's deps wont do any harm on say a desktop or other server box. Even though I can't think of example, I do believe we will be needing this possibility when we encounter packages that need to be built using different profiles for different use cases, things like having a php with-curlwrappers vs one with the curl module sans curlwrappers. The important point I take from this is that creating new profiles with small deviations from our default must be very easy (ie. not much work). Basically we need the infras support for n different build profiles to be fully automated and well documented. Lucas (talk) 19:52, 9 January 2014 (CET)
        • The net-analyzer/zabbix is definitely a good example, I don't want to install and maintain MySQL, Apache, PHP, snmpd (including all the deps) etc. on hosts which just need a Zabbix agent. I would also like to pragmatically avoid unused deps, in order to minimize reverse-updates and security updates (which must be provided nonetheless if the software is in use or not). --Chrigu (talk) 13:20, 10 January 2014 (CET)
  • Providing binary packages for different major (and sometimes minor) versions, for example: dev-db/mysql-5.X.Y and dev-db/mysql-6.X.Y.
  • Provide binary packages for pre-compiled Linux kernels and modules (not just a binary package of sys-kernel/gentoo-sources)
    • This makes it possible to build stage4 images from binary packages.
    • Most likely there will be separate packages for servers and desktops built with different genkernel configs.
  • Handle reverse dependency updates and ABI changes

Build host requirements

  • Build binary package for all required software
  • Support for multiple environments (development, staging and production)
  • Support for multiple architectures (such as x86, amd64 etc.)
  • Support for multiple build profiles
    • system (or base) profile, such as desktop or server (stage3) (all the packages contained within the /etc/portage/make.profile or via emerge @system)
    • application profiles, such as php5-app, django-app etc.)
    • simple inheritance is used for things like python-app -> django-app
    • stacks consist of one system profile and multiple application profiles
    • don't do this: Gentoo itself has only a few profiles and even there issues arise when combining them (for example desktop + selinux-hardened) --Tiziano (talk) 14:40, 3 January 2014 (CET)
      • Those are build-profiles (for example chroots or some sort of overlay-fs) not Gentoo (portage) profiles, we definitely need to clarify those terms ;) --Chrigu (talk) 20:01, 5 January 2014 (CET)
  • All build profiles will use a system profile as their base profile
  • Ability to update an existing build profile, without the need to build it from scratch
  • Ability to do fully automated clean builds (ie. for new archs or new stacks)
  • Ability to automatically update all development profiles on a predefined frequency such as daily, weekly or monthly an be notified about build failures
    • jenkins ci can do this using one jenkins master and a least one build slave per architecture.
    • Other options would be travis ci (not ready for in-house use) or cruise control
    • Rabe already has a jenkins instance: [1]. The instance Jenkins-01 is more or less modern and should be easy to reintegrate with puppet.
  • Each build profile stores the built binary packages under a per-defined directory which will be accessible via a HTTP URL such as https://packages.example.com/ENVIRONMENT/gentoo/ARCH/BUILD-PROFILE-NAME.
  • Application build profiles stores only the extra packages within the above directory, packages included in a base profile won't be duplicated.
  • Old or no longer supported packages will be removed automatically
  • Build a stage 3 tarball, which can be used for the automatic installation via PXE/TFTP.
    • must be able to build a stage tarball for each of the available environment-arch-system profile combinations
  • Handle reverse dependency updates and ABI changes (aka revdep-rebuild)
  • Handle perl and python (maybe more) dependency updates (aka perl-cleaner & python-updater)
  • Ability to build kernel and modules

Portage tree clone requirements

  • The official portage tree needs to be cloned via Git, which basically enables one to:
    • keep the control over portage tree updates
    • provide an old version of the tree
    • cherry pick updates
      • this should be avoided at all cost since it can lead to various sorts of breakages (ebuild <-> ebuild, ebuild <-> eclass, ebuild <-> profile, eclass <-> profile interaction) --Tiziano (talk) 14:24, 3 January 2014 (CET)
        • Yes, I agree. Nonetheless, we need the possibility to do cherry picking, for example to react on zero-day exploits. --Chrigu (talk) 19:53, 5 January 2014 (CET)
  • Support for a development, staging and production branch
    • Ability to automatically sync from upstream
    • Easy merge support from one branch to the next higher one (staging -> production)
  • Notification support for new GLSAs which affect packages within the cloned trees.
    • Either via automatic update and merge of /usr/portage/metadata/glsa or via external mechanisms such as consulting the RDF feed.
    • Having an inventory by collecting puppet facts allows to check for security updates in a central location --Tiziano (talk) 14:31, 3 January 2014 (CET)

Portage overlay requirements

  • One Git based portage overlay
    • Contains own portage profiles
    • Contains own or modified ebuilds or legacy ones removed from the official tree
  • Support for development, staging and production environment (via Git branches)
  • Layman compatibility
    • Portage has now direct repository support (as has cave/paludis) and layman may be omitted --Tiziano (talk) 14:32, 3 January 2014 (CET)

Portage profile requirements

  • Multiple Portage profiles stored within the overlay.
    • One for base, desktop and server (maybe more in the future, such as streambox)
      • desktop and server both inherit from the base profile which serves as the lowest common denominator.
  • Support for multiple architectures (such as x86 and amd64)
    • Avoid definition duplications via parent profile inheriting.
  • All the profiles have an official Gentoo profile as their master
  • Profiles include only packages belonging to a base system, not an application stack (those will be managed via puppet recipes)
  • Profiles can be used to unmask packages required but not belonging to the base system
  • Profiles sets all the default values for the client's make.conf, such as USE flags, BINHOSTS, GENTOO_MIRRORS, CFLAGS, CHOST etc.
    • Warning: many such variables are not incremental and therefore need duplication of Gentoo base profile variables (requiring that someone tracks changes in those variables) --Tiziano (talk) 14:29, 3 January 2014 (CET)
  • keep the profiles (and the inheritance structure) as simple as possible, rather duplicate than inherit for small deviations to avoid inheritence issues --Tiziano (talk) 14:33, 3 January 2014 (CET)

Package host requirements

File mirror host requirements

see Mirror Server#Requirements

Puppet requirements


  • Support for all three environments (development, staging and production)
  • Version controlled via Git
  • ENC and hiera support with data from ldap
  • Puppet recipes for
    • installing, updating, removing and (re-)configuring specific software belonging to an application stack (see build host).
    • (re-)configuring software belonging to a system stack
    • Updating the system stack (emerge @system) aka system update.
    • installing, updating and removing of kernel packages (including the handling of the ensuing reboot)
  • use best-of-breed tools like hiera and augeas (this might mean targeting 3.3.x due to module data support in ARM-9)
  • Use a sane prexisting puppet architecture concept


Install host requirements

  • Ability to install physical and virtual machines
  • Distinguish machines by their Ethernet MAC address
  • Provide a PXE/TFTP boot mechanism
  • Partition and format the (virtual) harddisks
  • Install a stage3 image which was built by the build host
  • Bootstrap puppet, enabling it to take over the individual installation and customization.
  • Group hosts into
    • environments (development, staging and production)
    • architectures (such as x86, amd64 etc.)
    • portage profiles (system profiles such as desktop and server)
    • stacks (comprising a complete product as a service with the underlying infrastructure) this is the task of Puppet --Chaf (Diskussion) 09:42, 19. Dez. 2013 (CET)

Public key infrastructure requirements

  • Local certificate authority for signing X.509 certificates.
  • Master certificate authority root certificate which is only used to sign Sub-CA certificates
  • Sub certificate authorities used for various cases such as
    • Puppet certificates [2]
    • User certificates
    • Client certificates
    • Host certificates
  • Ability to sign, revoke and extend certificates
  • Publish certificate revocation status either via CRL and/or OCSP
    • CRL is not worth the hassle due to it not defining how often the CRL must be consulted. Since we are in the same physical net OCSP should be far superior here (thank to its live checking support). On the other hand puppet does not do OCSP yet (redmine: #110111) so we might need to implement both or implement OCSP as well as develop our own automated revocation for puppet.
  • Choose DNs below dc=rabe,dc=ch
  • register a PEN-OID as issued by IANA if custom schema work is required
    • Use a @rabe email when requesting a PEN at IANA, last time the @purplehaze.ch was a problem!
  • Some of the aforementioned sub-CAs might be implemented as robot CAs with a self service interface (ie for authorized users).
  • Consider using CMP or CMC as an API to signing, revoking et. al.
    • Since the underlying RFCs of both these protocols are rather new they are not yet broadly supported.
  • Keep local root CA offline!
    • Maybe use an old netbook as root CA :P
  • Support GPG keys for signing packages

Git hosting requirements

  • Public repositories hosted on GitHub (mainly) under the radiorabe organization (almost anything which doesn't leak sensitive informations)
  • Private repositories hosted on the internal infrastructure
    • Accessible via https and a web interface
    • contains some repos with uber-private data the gets compartmentalized even further (ie. hiera datafiles in different repos)
  • One repository per component
  • Daily backup of all repositories
  • Branches for development, staging and production
    • New features are added to the development branch only and later merged up to staging and production
  • Must support pull-requests so we can implement a review process (when pulling through the envs)
    • Sing-Offing might also be required
  • Adhere to Semantic Versioning for version/release tags.
    • Tag releases as vX.Y.Z those will be automatically appear on GitHub as downloadable tarballs, which can be referenced within the corresponding ebuilds.
    • Hit 1.0.0 as soon as code lands on production or earlier
    • Commit .lock files when reaching 1.0.0 where applicable (Gemfile.lock, composer.lock) or earlier if needed
  • Must be able to trigger remote events (ie. update master through mcollective after code was promoted to production in a PR)
  • Support the git-flow branching model

Messaging requirements

  • I'm talking AMPQ, JMS, STOMP, 0MQ and the likes
    • not sure if we need something in this space for the infra
    • it could facilitate comms between components
    • stuff like mcollective and RadioDNS need something in this space

Monitoring, logging and alarming system requirements

@TODO

  • centralized logging is used throughout
    • with tools that help find and fix problems and do post mortems
  • all systems are always monitored by a full monitoring suite
  • the monitoring suite must support alarming users through multiple paths
    • alarming should include a fallback strategy and a way to acknowledge alarms
    • it must have a easy way to configure scheduled maintenance either before or while the maintenance is undergoing
  • monitoring, logging and alarming are all automatically configured during regular provisioning of machines
  • alerting uses jabber by default with fallbacks to email and sms-through-gsm depending on the site.

Implementation proposal

Build farm proposal

The build farm consists of a system of multiple vms to build binary packages for multiple environments, architectures and build profiles.

  • Git webhook on internal gitlab install pushes changes to jenkins master.
  • Jenkins master dishes out jobs to jenkins slave machines for needed architecture and build profile.
  • Jenkins slaves only get used once and wipe/reprovision themselves after master has stored build artefacts.
  • We have build-slave templates available for each architecture/build profile combo.
  • Upon use those get provisioned to the needed environment using puppet.
  • All of this is set up using puppet and fully automated, even building of new build-slave templates and the whole releng on those.
  • The build farm also keeps old templates and stable boxes on hold so it can use them to build differentials.
  • Artefacts slaves will be producing:
    • "vagrant"-style boot boxes
    • full binpkg repos for a given env/arch/build profile combo
    • stage3 balls for each arch/build profile
    • stage4 balls for each environment
    • build logs
    • /var/db/pkg
    • puppet report data
    • test results and code analysis results
  • When we come to continuos deployment the jenkins master will also be able to trigger puppet when merges to master happen.
  • This rolls out releases to the sub-system that was signed off by a merge to a master branch (see branching strategy in git proposal).

Links

build orchestration

  • Apache Mesos cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. Can run for instance Jenkins.

package building

  • chromite build utility from chromium os (source repo)
    • as far as I recall chromium os does highly parallel building making their build really fast with a slight trade of in long termn stability (ie. build might fail due to dependencies being built out of oder),
    • the chromium os developer guide might also be of interest, among other things it shows that google do split the build into a package building part and an image creation part.
  • entropy is sabayons portage replacement, it focuses on binaries due to sabayon being a binary distribution
    • their build system "Matter" might be of interest, it seems to automate large parts of tracking gentoo portage with its tinderbox subsystem
    • sabayon has kernel-switcher for updating kernels
    • kernel ebuilds live here and probably rely on the sabayon-kernel eclass.

"stage4"/box/iso building

  • packer.io can be used to build stage4 (containing a kernel) images and seems to work for gentoo. Packer often gets used to build Vagrant boxes.
    • gentoo script from packer-warehouse used with packer to create a minimal gentoo vagrant box
    • currently packer and packer-warehouse do not seem capable of building gentoo machines out of the box, I tested this with osx/virtualbox using gentoo stage3 and portage snapshots Lucas (talk) 11:19, 11 January 2014 (CET)
  • veewee vagrant box builder (builds stage4 images in a manner similar to packer
    • has support for a massive amount of guest os types
      • installs puppet/chef using gem due to the oldish versions in gentoo (and probably elsewhere)
    • supports kvm and others as host os
    • while testing with osx/virtualbox I was able to build and export a vagrant box from gentoo stage3 and portage snapshots without any hiccups Lucas (talk) 11:19, 11 January 2014 (CET)
    • is in dire need of DRY: [3] to make it worth forking
  • mkstage4
    • aimed at creating backup stage4 tarballs of gentoo systems
    • written in bash
    • pretty simple, might come in handy as automation tool

kernel

  • at the moment we build tarballs for the kernel+initramfs and the modules using genkernel and have a separate ebuild which installs them
  • ideally we would like to have an ebuild which takes the kernel sources (like the ebuild for sys-kernel/gentoo-source does), builds it according to some default configuration or a user configuration if available (savedconfig.eclass) and then installs the kernel and the modules as well as some minimal headers+configuration to build other packages requiring the sources to be present
  • TODO: check whether dracut has some advantages regarding module loading over genkernel-generated initramfs

Portage tree clone proposal

Portage overlay proposal

Portage profile proposal

Package and file mirror proposal

Puppet proposal

  • Adhere to Craig Dunns architecture [4]
    • on the system level (ie for each bar-metal or virtual machine)
    • on the architecture level (ie. in the cloud-fabric)
      • roles contains the business view (ie. role::cloud-storage, role::product1)
      • profiles contain the implementation (ie profile::storage-cluster, profile::storage-webinterface-farm)
  • Keep profiles, roles (as per craig) and Puppetfile in github.com/radiorabe/puppet
    • This is where we keep feature/*, develop and master (ie staging) branches
    • An internal clone then contains all these + production (what exactly is in prodution, ie. our release schedule is considered sensitive in this implementation)
    • This lets us use the git-flow branching model with almost no changes (the one change being us gating stuff into production on the closed clone)
    • github may use hooks to push content to our internal git when they happen
  • All other modules need their own repo and must be published to the puppet module forge
  • Use librarian-puppet (or r10k) for composing the final puppet envs
    • r10k eschews git submodule support we used in puppet-syslogng but has support for multiple envs out of the box
    • librarian-puppet would need to be run once per environment to achieve what r10k does
  • provide develop, master and production branches from private repo as puppet environments on master

Install host proposal

  • use the existing server on tftp-01 on the RaBe infra as a shortcut
    • replace that instance with one native to the infra when it is ready for that
  • iPXE [5]

Links

  • Tools that run puppet on freshly installed machines (and also do some provisioning)
    • puppetlabs razor bare metal/cloud provisioning tool
    • vagrant cloud provisioning aimed at provisioning developer boxes (with virtualbox). Has 3rd party support for various cloud systems. Vagrant might be interesting for creating dev clouds. I've seen this being used on production sites.

Public key infrastructure proposal

  • write certificate policy (in german!)
  • hold a key ceremony for the root and level 1
    • offline ceremony on an old netbook with centos or similar (not debian, probably not gentoo to make this happen soonish)
    • Sign RaBe root cert and level 1 intermediate cert
    • store root cert key on 2 sdcards and as 1 printout somewhere safely
    • store level 1 intermediate key on sdcards for use by admins
  • use level 1 intermediate key to sign level 2 cas as needed
    • level 2 robot ca key for puppet (managed by puppet ca)
    • level 2 ca for client certs
    • level 2 ca for host certs
    • more level 2 certs
  • use OpenSSL as default software for PKI
    • ssl has the largest userbase which should make it easier on new admins
    • features that openssl does not implement get used as soon as openssl catches up (ie. CMP)

git hosting proposal

  • adhere to git-flow for all the things. Automate said usage as far as possible.
git-flow branching
Branch Environment Merge from Description
master production release/ or hotfix/ Released code with a git tag for each merge.
release/v0.0.0 staging develop Contains final releasing work like updating versioning and changelog. This is where we keep semver concerns in check if they where not taken care of already.
hotfix/v0.0.0 staging master Only for critically urgent fixes. In most cases doing a release from develop is preferred.
develop development feature/ or master Only feature branches that are ready for production should get merged here. master gets merged here after each merge to it. Merging is done with pull requests and review.
feature/featurename development develop New features get implemented here until they are considered ready for production and merged to develop.
support/v0.0.0 LTS Marked experimental in most implementations and unused for now.
  • Install gitlab on a vm and integrate external mirrors from github and ldap users from stoney-ldap.
    • keep repo of public mirrors in hieradata so we can configure them from puppet.
    • each organisation in stoney-ldap automatically gets a private project in gitlab.
  • Configure web hook intrastructure and integrate with continuous integration system.
  • Make continuous integration show feedback back in gitlab.
    • check for git annotate support or use img badges.

On organization projects in gitlab

  • Each project comes with default repos.
Repo Description
puppet Set up using a template, contains a Puppetfile and Puppetfile.lock and a hieradata directory.
role Read only copy of global role module for reference.
profile Read only copy of global profile module for reference.
  • Everything in the latter two modules is configurable through hieradata in the first repo.
  • The default setup automatically updates role and profile when they get new merges.
  • A software agent (ci) regularly clones develop, does a full build and pushes the results back to feature/tinderbox
  • This agent autmatically creates pull requests if tinderbox builds did not fail.
  • Org leaders may then merge these PRs and bake them into a local release.
  • Some kind of UI helps them do this without much technical knowledge.
  • More repos may be added by the customer.
  • project organizations are private, per customer.

Links

Links