Monolith to Micro Frontends

Published in

Course Hero Engineering

10 min readJan 19, 2022

It’s a Monday and you’re in a product planning meeting detailing the scope and timeline of a new feature for your web app; you’re getting ready to tell the product manager your estimated dev days, which will unfortunately be much longer than they planned, leading to a reduction in features and a much more stripped down release.

It isn’t your fault of course, the codebase is over a decade old, the backend uses nice shiny micro-services in a lot of places, but the frontend is a mish-mash of frameworks and javascript libraries that you still have to integrate and play nicely with. For some engineers, and even for some pages, you might get away with a fully client-side app, but for you, unfortunately, you have to play nicely with those pesky search engines. You’ve proposed and maybe you have even started thinking through how you could upgrade to a more modern stack, but the idea of porting a decade+ worth of code to React makes your stomach churn and gives you nightmares.

You may even have come up with a plan to do it piecemeal, but it involves having multiple versions of common dependencies, and who wants to have multiple headers, footers, sidebars, and more across multiple codebases? That’s just a mess.

I’ve found myself in this exact scenario plenty of times over the years. At Course Hero, we reached a practical limit on the performance of some of our most important pages. The framework we were using was as optimized as it could be (at least without diminishing returns), and the queries and other operations we were doing were likewise tuned and optimized as much as possible. The only solution was to start doing work in parallel, which the language (PHP) did not support conveniently.

We also had recently upgraded the frontend client-side-only stack to use React, which was wonderful, but every time we needed a page to work for SEO, we’d end up having to write two versions: A static version in PHP/Twig for the server-side response, and the client side in React, for the rich interactions we wanted to achieve.

We decided to create a solution that would address both of these problems. We wanted to be able to bypass our “legacy” framework, use React for server-side rendering (so we wouldn’t have to write two versions of the page), and not have to maintain common components like the Header and Footer in multiple codebases.

We call our solution Server-Side Includes (SSI).

Server Side Includes (SSI) / Micro Frontends

In 2018, we at Course Hero came up with SSI as a way to power any individual section of any page with any service. We could now build pages with a header and footer from our legacy app, but the body from a React powered NodeJS app. SSI pages may have only a few small pieces powered by new services, preserving our legacy code in place, or they may have every page component powered by a completely different service.

Ultimately, SSI is a distributed frontend micro-service architecture, consisting of a Proxy, Presentation & Fragment layer, and a Backend Data layer.

Requests are enriched and proxied to different Presentation services which return “skeleton HTML” which may contain Fragment tags. The Proxy then resolves the Fragments concurrently, replacing the tags with rendered HTML.

You may be familiar with Edge-Side Includes (ESI). ESI tags allow a CDN or proxy like Varnish to cache the static parts of the page, and do requests to load in the dynamic pieces. For example, we might have something like:

<h1>This title is static and cached</h1>
<p>
<esi:include src="https://yoursite.com/user-details/" />
</p>

Which instructs the Edge/Proxy to visit the given url and replace the ESI tag with the content it found there.

SSI works in a very similar way, but instead of at the Edge, it lives at the Origin. SSI also handles the client-side javascript and css from each individual fragment.

Overview

The above diagram shows the general flow of a request to Course Hero, though some pieces like the Edge/CDN and load-balancers are removed for brevity.

Every request that comes in, first hits a virtual service identified as coursehero-vs above. The virtual service allows us to redirect the request to anywhere based on arbitrary rules, such as pattern matching the URI with regex, or matching on header values. This service was in place temporarily as we slowly rolled pages out to support SSI. We can enable/disable any single url (or any url pattern) to hit our SSI proxy instead of the intended legacy app.

The proxy would then make a request to a presentation service which would return the skeleton HTML. In order to accomplish this, we use another virtual service identified with ssi-routes. This has the exact same patterns/match rules as the coursehero-vs virtual service, but directs the requests to the presentation service that serves that url pattern.

Once the presentation service returns the skeleton HTML to the proxy, the proxy parses it for server-side include tags. It then makes requests to each url in parallel, replacing the tags with the content it finds. (The SSI tag syntax is detailed later, and includes abilities to provide content to multiple areas of the page from a single request, as well as to define buckets where content can be placed.)

After the proxy is done getting all the HTML from each fragment, it stitches all the content together and returns that to the user as a single page.

Virtual Services

Our solution uses two virtual services, powered by Istio, a service mesh for Kubernetes. We will detail in a different blog post more in-depth how we work with virtual services, but in summary, we created a Custom Resource Definition (CRD) in Kubernetes to define an SSIRoute.

apiVersion: coursehero.com/v1alpha1
kind: SSIRoute
metadata:
  name: ssi-example-service
spec:
  source:
    - regex: '\/view\/([^\/]+?)\/(?:\/)?$'
  target: ssi-example-service.ssi-example-service.svc.cluster.local
  weight: 100

This contains the url pattern (or prefix) to match against, and the target service that would serve the skeleton HTML. It also supports weighting, so we can send a percentage of traffic to the new SSI service version of the page slowly, while other users continue to hit the legacy page. An operator reads these resources and dynamically updates both the coursehero-vs and ssi-routes microservices to direct initial requests to our proxy, and to direct the proxy to the proper presentation service.

Proxy — aka Kraken

The proxy handles the initial request to get the structure of the page (skeleton HTML), and resolve the page fragments the skeleton returns by issuing GET requests to the fragment urls in parallel, and then stitching the page together.

Our proxy implementation is called Kraken. Kraken is written in Go, and contains a series of middleware that accomplishes the work of authentication, page parsing/stitching, and more.

When a request comes into Kraken, it makes the initial “skeleton” request to the ssi-routes virtual-service, which will direct it to the proper Presentation service that handles that url. If it hits the legacy app, then nothing happens, and the html the legacy app returns is returned to the user unchanged. If the response contains the header Ch-Proxy-Process then Kraken is instructed to parse the html to find the SSI tags. This header is returned by our SSI services.

Kraken spins up several goroutines to handle the work, the skeleton HTML is parsed and any SSI tags are found and as they are found they are pushed to different channels depending on what they accomplish. The basic tag is our <ch-block /> tag, if this tag contains a URL, then Kraken will make a request to the given URL, replacing the tag with whatever content it gets back. These requests are the Fragment requests, and they are all done concurrently to one another. Because a Fragment can also return content for other blocks, any SSI tags that are found are pushed to their respective channels, so work can be done in parallel. (While a Fragments do support nesting <ch-block /> tags, we consider it bad practice to do that, since any blocks a Fragment returned, could not be worked on until that request was complete)

Fortunately this parsing and processing step happens in ~1ms so the overall page response time ends up being whatever the slowest Fragment is.

Kraken also handles authentication and session management. Since we are calling multiple services in parallel, we want to avoid having each of those services look up the user by their Session, and instead have the basic user information available from the request. Kraken accomplishes this through JSON Web Tokens (JWTs), which contain information about the user, such as their email, name, subscription level, and any rights they’ve been assigned. Services are passed this JWT and can use it to know who the user is, and what they have rights to, which avoids excessive calls to load in basic user information.

There is a lot more to Kraken that we’ll detail in a dedicated blog post.

Herokit

All of this loading of things in parallel is great, but now we need a SSR React framework to support everything on the frontend side. We looked at existing solutions, such as NextJS, but at the time they didn’t quite work for our use case, but the plan is to explore it in the near future.

Herokit is our internal framework written on top of Express.js. The goal with this framework is to have as little boilerplate as possible, so that we are mostly just writing straight React. Herokit provides a structure for handling the server-side calls to return either a Presentation or a Fragment, with some minor glue to hook things together.

When writing a service with Herokit, we define a route with one or more URL patterns to match against. We then write the server-side code (such as API calls, ab test bucketing, etc) which ultimately returns a React component to be rendered.

The Presentation endpoints return skeleton HTML, which is composed of static HTML and SSI tags to be resolved by Kraken. This is where we would define what fragments to load based on their URL. For example, say we have two services, an ssi-homepage service and an ssi-related-contents service. If we want to render our Homepage, and also have a Related Content bar at the top, our Presentation service might look like:

class HomepagePresentation extends Herokit.Presentation {
    urls = ['/']
    blocks = {
        'homepage': 'ssi-homepage/fragment/homepage',
        'related': 'ssi-related-contents/fragment/related-content-bar'
    }
    
    deliver(req, res) {
        return <html>
            <Herokit.Head />
            <Herokit.Body>
                <Herokit.Block name="homepage" url={this.blocks['homepage']} />
                <Herokit.Block name="related" url={this.blocks['related']} />
            </Herokit.Body>
        </html>
    }
}

This would result in Kraken receiving HTML that looked like:

<html>
    <head>....</head>
    <body>
        <ch-block name="homepage" url="https://ssi-homepage/fragment/homepage" />
        <ch-block name="related" url="https://ssi-related-contents/fragment/related-content-bar" />
    </body>
</html>

Herokit also handles de-duplicating of client-side assets, so that if multiple fragments are rendered on the same page that use the same libraries, we don’t include that library multiple times.

We house each of our frontend services (as well as common libraries they share, such as Herokit) in a monorepo. For this we initially used Lerna, and recently switched to yarn workspaces with Turborepo.

We need each service to remain independent, but ultimately allow Fragments from multiple services to be rendered together, and create lightweight client bundles without duplicated code. For this, Herokit uses Webpack and some custom code generation.

When we build a service, we generate the client-side “optimized” bundle by creating custom entry points from the Fragments that are being used. Using the above Presentation endpoint as an example, our build pipeline can use code analysis to understand what blocks are being used, the homepage and related blocks. We also know what React component each of those Fragments has as their entry point, and we generate a new file that imports those components.

import SSI_HOMEPAGE from '../../packages/ssi-homepage/src/client/homepage-entry'import SSI_RELATED_CONTENT from '../../packages/ssi-related-contents/src/client/related-entry'if (document.getElementById('block_homepage')) {
    SSI_HOMEPAGE(window['_props_homepage'],  document.getElementById('block_homepage'))
}if (document.getElementById('block_related')) {
    SSI_RELATED_CONTENT(window['_props_related'], document.getElementById('block_related'))
}

The client-side bundle is generated using Webpack, so we end up with an optimized code-split bundle with no duplicate libraries or code. This “automatic” de-duplication is only possible within the monorepo, as we have full access to the raw source of each distinct service. When using Fragments that are powered by our legacy app, we will manually shim large libraries to not include them on the monorepo side.

GraphQL

The final piece of our puzzle is the backend data layer. For that, we opted to use GraphQL, specifically the Apollo Federated Graph. This allows us to load the data as we need it from within React components, and it fully supports SSR.

Our federated graph is another monorepo, that is split into specific data services around the domain the data belongs in. Our NodeJS services here do not interact with any database directly; instead, they make REST or gRPC calls to dedicated data services. If we are loading data from our legacy app, it will make REST calls, if it is hitting our micro-services, it will make gRPC calls.

From SSI (both server and client side) we just interact with the Graph. This allows us to swap any piece away as we make service and tech changes. We can transparently port entities/fields from our legacy app to micro-services, but we don’t have to update the frontend API code.

We’ll have a dedicated blog post on how we set up our federated graph.

Wrap-up

It took a lot of pieces coming together in order for this to work. We’ve been running with this architecture for about two years now, and we’ve ported most of our high-traffic pages to SSI. Pieces of those pages are still powered by our legacy app, but several parts are now powered by NodeJS apps. We have been porting over more and more pieces in piecemeal, which is transparent to the user. This has set us on a path that will lead to being able to fully migrate pages completely out of our legacy app.

We continue to iterate and expand the scope of what we can do with SSI, and we are excited about where we can take this technology in the future.