Overview
This article describes how we plan on using gentoo as an infrastructure backbone for creating a complete and modern IT architecture.
Glossary
@TODO We need to clean up some terms already (for instance the portage vs puppet profile thing) A glossary should help us define term more closely (and stick to the definitions).
- portage profile
- A profile in gentoo portage. Defines either a system or application stack for portage.
- portage build profile
- A profile in gentoo portage. Based of a system profile but used during the build phase of the binary packages used in the final deploy.
- puppet profile
- A puppet profile contains the implementation logic of how to install and configure an aspect of a system.
- stack
- A stack contains a complete and deployable product that may be provisioned and used. Stack have very simple inheritance letting the admin create stack trees based on each other. For instance a Ruby on Rails stack will be based of of a ruby stack which is based off a linux stack.
Required components
- Build host(s) for binary packages
- HTTP server for serving binary packages and distfiles (required by the ebuilds)
- Git clone of official portage tree
- Overlay(s)
- Own portage profile(s)
- rsync or Git server for serving the Overlay and the portage profiles
- Stage3 building system
- Puppet for configuration management and software installation
- Git version control for everything (overlays, portage profiles, puppet manifests and scripts/code)
- Install host (PXE boot / TFTP / DHCP)
- emc/puppetlabs razor can do this but needs some work for gentoo
- Automatic base installation script
- also in the scope of razor
- Separation of development, staging and production environments
- tagged and managed in git
- PKI environment (with dedicated sub CAs) for X509 certificates (used for Puppet, server and client certs etc.)
- git web interface (make dotfiles and frozen clones accessible to power-users)
- Central authentication service
- DNS, DHCP and NTP services
- Monitoring and alarming system
- Logging
- versioning for everything (if it is a committable file, use semver on its repo)
Binary package requirements
- Ability to build and install binary packages with the same version but different USE flags. For example, MySQL server package (
-minimal
and MySQL client & libs packageminimal
)- don't go there: this imposes a significant amount of maintenance work and may still break. Rather provide large enough base sets and accept that some packages install too much (you can still disable them at runtime) and build the few deviations from the rule on the servers from source --Tiziano (talk) 14:39, 3 January 2014 (CET)
- Yes, we need to and can go there :-) I agree with you, that we should do this only if necessary, apache for example can be built once and has the ability to turn features (module loading) on/off via its configuration. Other software does not provide such run-time configuration which results in unwanted server-software and dependencies on the installed hosts (
net-analyzer/zabbix
for example). I clearly do not want to have a dedicated build environment for each of those packages, I would rather see a build env, called minimal for example, which is used to build all those database packages with only lib and clients enabled (use the same env for PostgreSQL, OpenLDAP, MySQL etc.). As stated before, the whole build process needs to be automated, so I don't see a considerable increase of maintenance work coming up here. The dependency problem is mitigated through the fact that we have a frozen portage tree for all our build envs and therefore use the same versions everywhere. --Chrigu (talk) 12:04, 6 January 2014 (CET) - Yes and no on this one. We clearly need to keep the list of packages that require this at bare minimum.
net-analyzer/zabbix
for instance doesn't warrant this, we just won't start the server on non server nodes. Easy as cake. The server code and it's deps wont do any harm on say a desktop or other server box. Even though I can't think of example, I do believe we will be needing this possibility when we encounter packages that need to be built using different profiles for different use cases, things like having a php with-curlwrappers vs one with the curl module sans curlwrappers. The important point I take from this is that creating new profiles with small deviations from our default must be very easy (ie. not much work). Basically we need the infras support for n different build profiles to be fully automated and well documented. Lucas (talk) 19:52, 9 January 2014 (CET)- The
net-analyzer/zabbix
is definitely a good example, I don't want to install and maintain MySQL, Apache, PHP, snmpd (including all the deps) etc. on hosts which just need a Zabbix agent. I would also like to pragmatically avoid unused deps, in order to minimize reverse-updates and security updates (which must be provided nonetheless if the software is in use or not). --Chrigu (talk) 13:20, 10 January 2014 (CET)
- The
- Yes, we need to and can go there :-) I agree with you, that we should do this only if necessary, apache for example can be built once and has the ability to turn features (module loading) on/off via its configuration. Other software does not provide such run-time configuration which results in unwanted server-software and dependencies on the installed hosts (
- don't go there: this imposes a significant amount of maintenance work and may still break. Rather provide large enough base sets and accept that some packages install too much (you can still disable them at runtime) and build the few deviations from the rule on the servers from source --Tiziano (talk) 14:39, 3 January 2014 (CET)
- Providing binary packages for different major (and sometimes minor) versions, for example:
dev-db/mysql-5.X.Y
anddev-db/mysql-6.X.Y
. - Provide binary packages for pre-compiled Linux kernels and modules (not just a binary package of
sys-kernel/gentoo-sources
)- This makes it possible to build stage4 images from binary packages.
- Most likely there will be separate packages for servers and desktops built with different genkernel configs.
- Handle reverse dependency updates and ABI changes
Build host requirements
- Build binary package for all required software
- Support for multiple environments (development, staging and production)
- Support for multiple architectures (such as x86, amd64 etc.)
- Support for multiple build profiles
- system (or base) profile, such as desktop or server (stage3) (all the packages contained within the
/etc/portage/make.profile
or viaemerge @system
) - application profiles, such as php5-app, django-app etc.)
- simple inheritance is used for things like python-app -> django-app
- stacks consist of one system profile and multiple application profiles
- don't do this: Gentoo itself has only a few profiles and even there issues arise when combining them (for example desktop + selinux-hardened) --Tiziano (talk) 14:40, 3 January 2014 (CET)
- Those are build-profiles (for example chroots or some sort of overlay-fs) not Gentoo (portage) profiles, we definitely need to clarify those terms ;) --Chrigu (talk) 20:01, 5 January 2014 (CET)
- system (or base) profile, such as desktop or server (stage3) (all the packages contained within the
- All build profiles will use a system profile as their base profile
- Ability to update an existing build profile, without the need to build it from scratch
- Ability to do fully automated clean builds (ie. for new archs or new stacks)
- Ability to automatically update all development profiles on a predefined frequency such as daily, weekly or monthly an be notified about build failures
- jenkins ci can do this using one jenkins master and a least one build slave per architecture.
- Other options would be travis ci (not ready for in-house use) or cruise control
- Rabe already has a jenkins instance: [1]. The instance Jenkins-01 is more or less modern and should be easy to reintegrate with puppet.
- Each build profile stores the built binary packages under a per-defined directory which will be accessible via a HTTP URL such as
https://packages.example.com/ENVIRONMENT/gentoo/ARCH/BUILD-PROFILE-NAME
.- Clients will have
PORTAGE_BINHOST="https://packages.example.com/ENVIRONMENT/gentoo/ARCH/SYSTEM-PROFILE-NAME https://packages.example.com/ENVIRONMENT/gentoo/ARCH/APP-STACK-PROFILE-NAME ..."
set in their/etc/portage/make.conf
.
- Clients will have
- Application build profiles stores only the extra packages within the above directory, packages included in a base profile won't be duplicated.
- Old or no longer supported packages will be removed automatically
- Build a stage 3 tarball, which can be used for the automatic installation via PXE/TFTP.
- must be able to build a stage tarball for each of the available environment-arch-system profile combinations
- Handle reverse dependency updates and ABI changes (aka
revdep-rebuild
) - Handle perl and python (maybe more) dependency updates (aka
perl-cleaner
&python-updater
) - Ability to build kernel and modules
Portage tree clone requirements
- The official portage tree needs to be cloned via Git, which basically enables one to:
- keep the control over portage tree updates
- provide an old version of the tree
- cherry pick updates
- this should be avoided at all cost since it can lead to various sorts of breakages (ebuild <-> ebuild, ebuild <-> eclass, ebuild <-> profile, eclass <-> profile interaction) --Tiziano (talk) 14:24, 3 January 2014 (CET)
- Yes, I agree. Nonetheless, we need the possibility to do cherry picking, for example to react on zero-day exploits. --Chrigu (talk) 19:53, 5 January 2014 (CET)
- this should be avoided at all cost since it can lead to various sorts of breakages (ebuild <-> ebuild, ebuild <-> eclass, ebuild <-> profile, eclass <-> profile interaction) --Tiziano (talk) 14:24, 3 January 2014 (CET)
- Support for a development, staging and production branch
- Ability to automatically sync from upstream
- Easy merge support from one branch to the next higher one (staging -> production)
- Notification support for new GLSAs which affect packages within the cloned trees.
- Either via automatic update and merge of
/usr/portage/metadata/glsa
or via external mechanisms such as consulting the RDF feed. - Having an inventory by collecting puppet facts allows to check for security updates in a central location --Tiziano (talk) 14:31, 3 January 2014 (CET)
- Either via automatic update and merge of
Portage overlay requirements
- One Git based portage overlay
- Contains own portage profiles
- Contains own or modified ebuilds or legacy ones removed from the official tree
- Support for development, staging and production environment (via Git branches)
- Layman compatibility
- Portage has now direct repository support (as has cave/paludis) and layman may be omitted --Tiziano (talk) 14:32, 3 January 2014 (CET)
Portage profile requirements
- Multiple Portage profiles stored within the overlay.
- One for base, desktop and server (maybe more in the future, such as streambox)
- desktop and server both inherit from the base profile which serves as the lowest common denominator.
- One for base, desktop and server (maybe more in the future, such as streambox)
- Support for multiple architectures (such as x86 and amd64)
- Avoid definition duplications via parent profile inheriting.
- All the profiles have an official Gentoo profile as their master
- Profiles include only packages belonging to a base system, not an application stack (those will be managed via puppet recipes)
- Profiles can be used to unmask packages required but not belonging to the base system
- Profiles sets all the default values for the client's
make.conf
, such as USE flags, BINHOSTS, GENTOO_MIRRORS, CFLAGS, CHOST etc.- Warning: many such variables are not incremental and therefore need duplication of Gentoo base profile variables (requiring that someone tracks changes in those variables) --Tiziano (talk) 14:29, 3 January 2014 (CET)
- keep the profiles (and the inheritance structure) as simple as possible, rather duplicate than inherit for small deviations to avoid inheritence issues --Tiziano (talk) 14:33, 3 January 2014 (CET)
Package host requirements
- Serving files via HTTPS
- Binary packages for all the clients (
PORTAGE_BINHOST
), which were built by the build host- Binary packages will be accessible via a HTTP URL such as
https://packages.example.com/ENVIRONMENT/gentoo/ARCH/BUILD-PROFILE-NAME
. - Clients will have
PORTAGE_BINHOST="https://packages.example.com/ENVIRONMENT/gentoo/ARCH/SYSTEM-PROFILE-NAME https://packages.example.com/ENVIRONMENT/gentoo/ARCH/APP-STACK-PROFILE-NAME ..."
set in their/etc/portage/make.conf
.
- Binary packages will be accessible via a HTTP URL such as
- Binary packages for all the clients (
- Support for all three environments (development, staging and production)
- Possibility to authenticate clients either via HTTP basic auth or client certificates.
- Old or no longer supported files will be removed automatically
- Can be implemented on the build host
File mirror host requirements
- Hosts all the files required to build a package (
GENTOO_MIRRORS=mirror.example.com/public/gentoo/distfiles
)- Acts as a caching mirror for already downloaded packages from an official mirror
- Serves fetch-restricted files (
dev-java/oracle-jdk-bin
for example), to authorized clients
- Files are served via HTTPS
- Distinguishes between three groups of files
- public: Files which are available to all clients (theoretically even to the entire internet)
- site-local: Files which are only available to authenticated clients belonging to the same infrastructure (for example those which would put us into legal troubles if available to the public)
- stack-local: Files which are only available to authenticated clients belonging to the same infrastructure and the software stack group (private files of a specific customer)
- Provides an easy way to let an administrator manually upload new files, for example via WebDAV-CGI, SFTP or a similar mechanism.
- Possibility to authenticate clients either via HTTP basic auth or client certificates.
- Old or no longer supported files will be removed automatically
- Can be implemented on the build host
Puppet requirements
- moved to stoney_orchestra:_Requirements, included below for reference.
- Support for all three environments (development, staging and production)
- Version controlled via Git
- ENC and hiera support with data from ldap
- Puppet recipes for
- installing, updating, removing and (re-)configuring specific software belonging to an application stack (see build host).
- (re-)configuring software belonging to a system stack
- Updating the system stack (
emerge @system
) aka system update. - installing, updating and removing of kernel packages (including the handling of the ensuing reboot)
- use best-of-breed tools like hiera and augeas (this might mean targeting 3.3.x due to module data support in ARM-9)
- Use a sane prexisting puppet architecture concept
Install host requirements
- Ability to install physical and virtual machines
- Distinguish machines by their Ethernet MAC address
- Provide a PXE/TFTP boot mechanism
- Partition and format the (virtual) harddisks
- Install a stage3 image which was built by the build host
- Bootstrap puppet, enabling it to take over the individual installation and customization.
- Group hosts into
- environments (development, staging and production)
- architectures (such as x86, amd64 etc.)
- portage profiles (system profiles such as desktop and server)
-
stacks (comprising a complete product as a service with the underlying infrastructure)this is the task of Puppet --Chaf (Diskussion) 09:42, 19. Dez. 2013 (CET)
Public key infrastructure requirements
- Local certificate authority for signing X.509 certificates.
- Master certificate authority root certificate which is only used to sign Sub-CA certificates
- Sub certificate authorities used for various cases such as
- Puppet certificates [2]
- User certificates
- Client certificates
- Host certificates
- Ability to sign, revoke and extend certificates
- Publish certificate revocation status either via CRL and/or OCSP
- CRL is not worth the hassle due to it not defining how often the CRL must be consulted. Since we are in the same physical net OCSP should be far superior here (thank to its live checking support). On the other hand puppet does not do OCSP yet (redmine: #110111) so we might need to implement both or implement OCSP as well as develop our own automated revocation for puppet.
- Choose DNs below
dc=rabe,dc=ch
- register a PEN-OID as issued by IANA if custom schema work is required
- Use a @rabe email when requesting a PEN at IANA, last time the @purplehaze.ch was a problem!
- Some of the aforementioned sub-CAs might be implemented as robot CAs with a self service interface (ie for authorized users).
- Consider using CMP or CMC as an API to signing, revoking et. al.
- Since the underlying RFCs of both these protocols are rather new they are not yet broadly supported.
- Keep local root CA offline!
- Maybe use an old netbook as root CA :P
- Support GPG keys for signing packages
Git hosting requirements
- Public repositories hosted on GitHub (mainly) under the radiorabe organization (almost anything which doesn't leak sensitive informations)
- Private repositories hosted on the internal infrastructure
- Accessible via https and a web interface
- contains some repos with uber-private data the gets compartmentalized even further (ie. hiera datafiles in different repos)
- One repository per component
- Daily backup of all repositories
- Branches for development, staging and production
- New features are added to the development branch only and later merged up to staging and production
- Must support pull-requests so we can implement a review process (when pulling through the envs)
- Sing-Offing might also be required
- Adhere to Semantic Versioning for version/release tags.
- Tag releases as
vX.Y.Z
those will be automatically appear on GitHub as downloadable tarballs, which can be referenced within the corresponding ebuilds. - Hit 1.0.0 as soon as code lands on production or earlier
- Commit .lock files when reaching 1.0.0 where applicable (Gemfile.lock, composer.lock) or earlier if needed
- Tag releases as
- Must be able to trigger remote events (ie. update master through mcollective after code was promoted to production in a PR)
- Support the git-flow branching model
Messaging requirements
- I'm talking AMPQ, JMS, STOMP, 0MQ and the likes
- not sure if we need something in this space for the infra
- it could facilitate comms between components
- stuff like mcollective and RadioDNS need something in this space
Monitoring, logging and alarming system requirements
@TODO
- centralized logging is used throughout
- with tools that help find and fix problems and do post mortems
- all systems are always monitored by a full monitoring suite
- the monitoring suite must support alarming users through multiple paths
- alarming should include a fallback strategy and a way to acknowledge alarms
- it must have a easy way to configure scheduled maintenance either before or while the maintenance is undergoing
- monitoring, logging and alarming are all automatically configured during regular provisioning of machines
- alerting uses jabber by default with fallbacks to email and sms-through-gsm depending on the site.
Implementation proposal
Build farm proposal
The build farm consists of a system of multiple vms to build binary packages for multiple environments, architectures and build profiles.
- Git webhook on internal gitlab install pushes changes to jenkins master.
- Jenkins master dishes out jobs to jenkins slave machines for needed architecture and build profile.
- Jenkins slaves only get used once and wipe/reprovision themselves after master has stored build artefacts.
- We have build-slave templates available for each architecture/build profile combo.
- Upon use those get provisioned to the needed environment using puppet.
- All of this is set up using puppet and fully automated, even building of new build-slave templates and the whole releng on those.
- The build farm also keeps old templates and stable boxes on hold so it can use them to build differentials.
- Artefacts slaves will be producing:
- "vagrant"-style boot boxes
- full binpkg repos for a given env/arch/build profile combo
- stage3 balls for each arch/build profile
- stage4 balls for each environment
- build logs
-
/var/db/pkg
- puppet report data
- test results and code analysis results
- When we come to continuos deployment the jenkins master will also be able to trigger puppet when merges to master happen.
- This rolls out releases to the sub-system that was signed off by a merge to a master branch (see branching strategy in git proposal).
Links
build orchestration
package building
- chromite build utility from chromium os (source repo)
- as far as I recall chromium os does highly parallel building making their build really fast with a slight trade of in long termn stability (ie. build might fail due to dependencies being built out of oder),
- the chromium os developer guide might also be of interest, among other things it shows that google do split the build into a package building part and an image creation part.
- entropy is sabayons portage replacement, it focuses on binaries due to sabayon being a binary distribution
- their build system "Matter" might be of interest, it seems to automate large parts of tracking gentoo portage with its tinderbox subsystem
- sabayon has
kernel-switcher
for updating kernels - kernel ebuilds live here and probably rely on the sabayon-kernel eclass.
"stage4"/box/iso building
- packer.io can be used to build stage4 (containing a kernel) images and seems to work for gentoo. Packer often gets used to build Vagrant boxes.
- gentoo script from packer-warehouse used with packer to create a minimal gentoo vagrant box
- currently packer and packer-warehouse do not seem capable of building gentoo machines out of the box, I tested this with osx/virtualbox using gentoo stage3 and portage snapshots Lucas (talk) 11:19, 11 January 2014 (CET)
- veewee vagrant box builder (builds stage4 images in a manner similar to packer
- has support for a massive amount of guest os types
- installs puppet/chef using gem due to the oldish versions in gentoo (and probably elsewhere)
- supports kvm and others as host os
- while testing with osx/virtualbox I was able to build and export a vagrant box from gentoo stage3 and portage snapshots without any hiccups Lucas (talk) 11:19, 11 January 2014 (CET)
- is in dire need of DRY: [3] to make it worth forking
- has support for a massive amount of guest os types
- mkstage4
- aimed at creating backup stage4 tarballs of gentoo systems
- written in bash
- pretty simple, might come in handy as automation tool
kernel
- at the moment we build tarballs for the kernel+initramfs and the modules using
genkernel
and have a separate ebuild which installs them - ideally we would like to have an ebuild which takes the kernel sources (like the ebuild for
sys-kernel/gentoo-source
does), builds it according to some default configuration or a user configuration if available (savedconfig.eclass
) and then installs the kernel and the modules as well as some minimal headers+configuration to build other packages requiring the sources to be present - TODO: check whether dracut has some advantages regarding module loading over genkernel-generated initramfs
Portage tree clone proposal
Portage overlay proposal
Portage profile proposal
Package and file mirror proposal
Puppet proposal
- Adhere to Craig Dunns architecture [4]
- on the system level (ie for each bar-metal or virtual machine)
- roles contains the business view (ie. role::puppet::master)
- profiles the implementation (such as profile::puppet::master)
- on the architecture level (ie. in the cloud-fabric)
- roles contains the business view (ie. role::cloud-storage, role::product1)
- profiles contain the implementation (ie profile::storage-cluster, profile::storage-webinterface-farm)
- on the system level (ie for each bar-metal or virtual machine)
- Keep profiles, roles (as per craig) and Puppetfile in github.com/radiorabe/puppet
- This is where we keep feature/*, develop and master (ie staging) branches
- An internal clone then contains all these + production (what exactly is in prodution, ie. our release schedule is considered sensitive in this implementation)
- This lets us use the git-flow branching model with almost no changes (the one change being us gating stuff into production on the closed clone)
- github may use hooks to push content to our internal git when they happen
- All other modules need their own repo and must be published to the puppet module forge
- Use librarian-puppet (or r10k) for composing the final puppet envs
- r10k eschews git submodule support we used in puppet-syslogng but has support for multiple envs out of the box
- librarian-puppet would need to be run once per environment to achieve what r10k does
- provide develop, master and production branches from private repo as puppet environments on master
Install host proposal
- use the existing server on tftp-01 on the RaBe infra as a shortcut
- replace that instance with one native to the infra when it is ready for that
- iPXE [5]
Links
- Tools that run puppet on freshly installed machines (and also do some provisioning)
- puppetlabs razor bare metal/cloud provisioning tool
- vagrant cloud provisioning aimed at provisioning developer boxes (with virtualbox). Has 3rd party support for various cloud systems. Vagrant might be interesting for creating dev clouds. I've seen this being used on production sites.
Public key infrastructure proposal
- write certificate policy (in german!)
- hold a key ceremony for the root and level 1
- offline ceremony on an old netbook with centos or similar (not debian, probably not gentoo to make this happen soonish)
- Sign RaBe root cert and level 1 intermediate cert
- store root cert key on 2 sdcards and as 1 printout somewhere safely
- store level 1 intermediate key on sdcards for use by admins
- use level 1 intermediate key to sign level 2 cas as needed
- level 2 robot ca key for puppet (managed by
puppet ca
) - level 2 ca for client certs
- level 2 ca for host certs
- more level 2 certs
- level 2 robot ca key for puppet (managed by
- use OpenSSL as default software for PKI
- ssl has the largest userbase which should make it easier on new admins
- features that openssl does not implement get used as soon as openssl catches up (ie. CMP)
git hosting proposal
- adhere to git-flow for all the things. Automate said usage as far as possible.
git-flow branching | |||
---|---|---|---|
Branch | Environment | Merge from | Description |
master
|
production | release/ or hotfix/
|
Released code with a git tag for each merge.
|
release/v0.0.0
|
staging | develop
|
Contains final releasing work like updating versioning and changelog. This is where we keep semver concerns in check if they where not taken care of already. |
hotfix/v0.0.0
|
staging | master
|
Only for critically urgent fixes. In most cases doing a release from develop is preferred.
|
develop
|
development | feature/ or master
|
Only feature branches that are ready for production should get merged here. master gets merged here after each merge to it. Merging is done with pull requests and review.
|
feature/featurename
|
development | develop
|
New features get implemented here until they are considered ready for production and merged to develop .
|
support/v0.0.0
|
LTS | Marked experimental in most implementations and unused for now. |
- Install gitlab on a vm and integrate external mirrors from github and ldap users from stoney-ldap.
- keep repo of public mirrors in hieradata so we can configure them from puppet.
- each organisation in stoney-ldap automatically gets a private project in gitlab.
- Configure web hook intrastructure and integrate with continuous integration system.
- Make continuous integration show feedback back in gitlab.
- check for
git annotate
support or use img badges.
- check for
On organization projects in gitlab
- Each project comes with default repos.
Repo | Description |
---|---|
puppet
|
Set up using a template, contains a Puppetfile and Puppetfile.lock and a hieradata directory. |
role
|
Read only copy of global role module for reference. |
profile
|
Read only copy of global profile module for reference. |
- Everything in the latter two modules is configurable through hieradata in the first repo.
- The default setup automatically updates
role
andprofile
when they get new merges. - A software agent (ci) regularly clones
develop
, does a full build and pushes the results back tofeature/tinderbox
- This agent autmatically creates pull requests if tinderbox builds did not fail.
- Org leaders may then merge these PRs and bake them into a local release.
- Some kind of UI helps them do this without much technical knowledge.
- More repos may be added by the customer.
- project organizations are private, per customer.
Links
- gitlab seems nice even though is is ruby on rails under the hood
- gitlab-mirrors is a companion app to gitlab for adding readonly mirror repos to gitlab. We might consider hacking it to not use
git remote prune
. - git-flow with jenkins and gitlab
- gitlab hook for jenkins