We faced an interesting issue while getting one of our client's applications ready for production. We needed to add a different Google Analytics tracking code for production servers, so the marketing dashboard is not polluted with staging and development data.
Normally this is a very straightforward task: use an environment variable to represent the tracking code and set the corresponding tracking code for the environment. So theoretically, something like this should work:
Things aren't as rosy in real life though. Checking with the GA dashboard shows that the events were only in the production account, and there were no events in the staging account. Digging into it more, it looked like the tracking code was always the production one, even when the app is coming from the staging server. We initially thought that the environment variables must have been set incorrectly, but when we SSH'd in to the production and staging servers we were able to confirm that the environment variables were properly setup.
After discussing the issue with the deployment infrastructure team, we found that the culprit came from an unlikely (at least at that time) source: the docker build image.
Twelve-Factor and Docker
The Twelve-Factor App is a set of guidelines for web apps that recommends best practices for the development, management, and deployment of these software-as-a-service applications. The guideline most related to the problem at hand would be that referring to how configuration should be handled:
Store config in the environment
An app’s config is everything that is likely to vary between deploys (staging, production, developer environments, etc). This includes:
- Resource handles to the database, Memcached, and other backing services
- Credentials to external services such as Amazon S3 or Twitter
- Per-deploy values such as the canonical hostname for the deploy
Apps sometimes store config as constants in the code. This is a violation of twelve-factor, which requires strict separation of config from code. Config varies substantially across deploys, code does not.
Our client's infrastructure manages deployments by compiling the docker image once, then using the exact same built image across different environments, configured via environment variables. Rebuilding the application docker image for every environment can sometimes lead to subtle errors between environments (e.g. a library dependency has been updated upstream during deployment so the previous one was built with an older version), not to mention the wasted resources rebuilding since there can be more environments other than the usual production and staging—per-developer or per-feature environments for example. The practice of building once and reuse everywhere ensures that there is no variance in the application code between deployments.
One thing to note about environment variables is that the application can only “see” the variables present in the environment it's running in. This works well for Rails or PHP applications since the code runs in the environment and can manipulate the client payload as needed.
Here's an analogy: suppose the task is to generate an image with a watermark, and the watermark text would be the environment (i.e. staging or production). When the docker build runs, that watermarked image would have the text "production" as part of itself; no matter if the application runs on staging or any other environment, that watermarked image would still say "production." You can think of the "dynamic watermark text depending on the environment" as a similar process to the environment variable substitution: the contents depend on the environment on which the image or text is generated from, not from which environment it's served on.
Thus leading to the tracking code problem we're facing.
Now that we have a good idea regarding the mystery of the enduring tracking code, we we considered various possible solutions and workarounds.
- Ask our client's infrastructure team to build docker images per environment, instead of building it once and reusing the same image over to multiple environments.
It would be better to use the limited resources to improve other features rather than work on this request that would only benefit a very small subset of applications. Not only that but there are numerous environments available per application and building images for each of them when only the environment variables change would be wasteful on server resources. After all, a twelve-factor application's code does not vary substantially across deployments, but often configuration does.
- Hardcode both tracking codes and check the domain to know which one to use in GA.
That said, hardcoding these types of configurable variables can make things difficult to manage in the future. Configuration can change per environment and there is no telling when an explosion of environments can happen. While GA tracking codes can be assumed to only have two configurations (for staging and for production), that won't be the case for other similar identifiers. We wanted to solve the issue properly now instead of taking technical debt now and figuring it out later. We felt that this solution is more of a “hack” and leaves a bad taste in our mouths.
- Use a placeholder for the value during that static asset generation and replace it with the correct value during application startup time.
Docker images often use a script file as the CMD in order to set up preprocessing instructions before calling the intended executable. One such preprocessing could be doing a text search and replace for the tracking code. Since the CMD script would be running in the server environment, it would be able to “see” the correct environment variables to replace the placeholder with.
The downside though is that the intricacies of tools like `envsubst` or `sed` may be unfamiliar to developers and there is the danger of making a mistake such that not only the placeholder gets replaced, but something else as well, leading to runtime errors that can be difficult to debug. Alternatively, there are libraries like `envsub` which can be more familiar to application developers not used to the previously mentioned classic unix tools and can have an easier-to-understand behavior.
Just as mentioned, we can run preprocessing steps via the Docker CMD and one of such steps can be rebuilding just the frontend application as part of the startup. This time, since the build process runs on the environment it is intended to be served at, the process is able to “see” and bake in the variables intended for that environment.
There is a big downside to this however—even more so since the infrastructure runs on kubernetes. As kubernetes moves and reschedules pods around into better suited hosts, running a build introduces a delay into the application startup whenever a pod restarts. This is because with the build step as part of the application startup, the application needs to be recompiled. That means that during this delay, the application (with only this particular change) will fail the healthchecks because at this point, the application might have been started but is technically not yet “running.”
While the readiness probes can be adjusted for the increased startup time, that's an additional issue that needs to be taken care of; not to mention that as the application grows, this startup time might increase (and thus the configuration would need to be adjusted once more). This might not be such a big deal though, since we can just set the environment variable to be the production one during the docker build process, and setup a flag to rebuild the application on environments where we need a different envvar (e.g. staging or developer-specific environments).
What We Did
Considering all of that, we ended up with re-running the build step (i.e. `yarn build` for the frontend application during the docker startup process via a shell-script as the entrypoint. Although the text replacement via `envsubst` or `sed` would be many times faster than rebuilding the application, we wanted to get something working in time for a planned production deployment in the coming few days.
The project is still quite small and the compilation process is relatively quick; however as the project grows, the delay caused by the application recompile might grow with it. We plan on revisiting this issue in the near future.