Trade Me’s Journey Towards a Thinnest Viable Platform (TVP)

November 22, 2024 Team Topologies

Authors: Catherine Matheson (Technical Product Owner @ Trade Me), Amir Mohtasebi (Head of Engineering @Trade Me)

Review by: Eduardo da Silva (Independent Consultant and Team Topologies Valued Practitioner)

About Trade Me

Trade Me was founded in 1999 and is New Zealand’s largest marketplace and classifieds site. Approximately 700k unique users across the web and mobile apps make Trade Me one of the busiest and most popular online destinations in New Zealand. While most sessions and transactions occur on the mobile-native apps, the website still plays an essential role in the user experience for our customers.

The product-engineering team comprises three different business units: Consumer & Marketplace, Classifieds, and Technology. Approximately 180 people work in engineering-related roles in those units. In our context, a unit is a group of squads (or stream-aligned teams) that own and evolve different applications required to support that unit.

The figure above shows a very simplified version of our structure, introduced in 2019 to align our business units with the value streams the company wanted to invest in. Having squads aligned to specific value streams allows the teams to work independently most of the time. The majority of those interactions happen through X-as-a-Service, where a squad provides a service to one of multiple other squads. Occasionally, teams temporarily collaborate when an initiative sits across multiple value streams.

The need for TVP

In time code changes became slow and complicated

Trade Me’s tech stack traditionally comprised a few large monolithic applications. These monolithic applications were easy to manage and run in production via dedicated infrastructure and database administration teams. Conversely, the tooling on the path to production was custom-built and highly optimized. While it still served our engineering family very well, a few paradigm shifts spurred the need to build outside our monolith.

These paradigm shifts include but are not limited to, the advent of big data, machine learning, and AI, a focus on experimentation and prototyping, and the need to respond to our customers’ needs faster and more efficiently. Making changes to our big monolith, which had naturally evolved since 1999, could be quite costly and difficult. We had a lot of ideas for things we wanted to achieve, but implementing those ideas would take us a long time in the architecture we had. We needed to find a new way to work if we were going to deliver value to our customers faster.

Moving to a hybrid microservices and monolith architecture

We started adopting a hybrid microservices and monolithic architecture in 2022, which allowed our teams to deliver value more quickly and frequently—a major milestone. By moving domains out of our monolith into smaller services, we were able to reduce the complexity of our code changes. However, this had unintended consequences: our organization ended up with various applications and services using different technologies under the hood.

Considering the size and growth of our organization, operability and knowledge management began to increase the cognitive load of our developers, who consequently had to spend a lot of time and effort running those services in production. Equally, each team had to ensure their newly created services met our security and production-readiness requirements. Previously, our infrastructure and database team was responsible for the operations of our monolith. With the newly created microservices, that responsibility moved to the stream-aligned teams in Consumer & Marketplace and Classifieds business units.

The combination of the above conditions highlighted the need for a platform approach in our organization.

Instead of having numerous and varying numbers of services, we asked ourselves, wouldn’t it be great to have one platform that satisfies the needs of most teams in our organization?

This was an important phase because we needed to find a way to decrease the cognitive load of teams who were now maintaining and running their services. How could we abstract away the common problems they were facing so they could focus on delivering value to our customers efficiently?

While our technical leadership was exploring this problem, we found that the Thinest Viable Platform (TVP) concept from the book Team Topologies answered that question.

What did we do?

Vision and guiding principles

Our first step toward TVP was to hold a workshop with a small group of technical and product leaders to create a vision of what we were trying to achieve with it. The vision we devised was:

“Building the future of Trade Me faster, safer, and easier, together.”

As we were creating a new platform, we had to clearly lay out the guiding principles so our team could grasp what the TVP is, and what it isn’t. We didn’t want to use TVP the same way we used our monolithic platform. Here are our TVP guiding principles:

Platform

We use TVP principles to define our approach to building future platforms and services. In our definition, platforms are building blocks that abstract away the complexity of infrastructure, which reduces teams' cognitive load, making delivering value to our customers easier and faster.

Product

TVP follows the platform-as-a-product mindset. The main customers of our TVP product are our developers on the stream-aligned teams of customer-facing units. With that in mind, we defined the following main measures of success (MoS):

Time to First “Hello World” (TTFHW)
Reducing developers’ cognitive load (qualitative MoS)

The TTFHW metric measures how long it takes developers to start building a new service, from conception to when they can start writing product development code and delivering value to our customers. The more we automate that process, the less developers have to learn deeply about it.

We also measure cognitive load as an indicator of how much we can abstract the complexity of our platform. This is a qualitative metric so to gauge it, we rely on surveys, our Slack help channel, and conversations with developers on how they are finding the process.

Just big enough

TVP intends to ensure the absolute minimum requirements we expect from our production systems are implemented and abstracted away from our developers. This way, we keep the platform as simple as possible to cater to one of its primary purposes: reducing developers’ cognitive load. As the “Thinnest” in the abbreviation implies, it is just big enough. We look at parts of systems that are common across all domains. For example, every system needs monitoring. We provide this out of the box with TVP, so developers don’t need to remember to add monitoring when creating a new service; it’s automatically provided with our chosen monitoring tool.

Open-source contribution model

While we have a dedicated team to build and maintain TVP, it is open to contribution by everyone across the company. If a required feature does not exist in TVP, any team can develop and contribute a new feature in consultation with platform maintainers. It is important to acknowledge that not all team needs are consolidated in the TVP—particularly if only a single team in the organization needs it.

We have an Architecture Design and Review Forum so we can make sure there is consistency with our architectural decisions. This is important for TVP as we can make sure we avoid bloat to the platform. This model also guarantees that the platform is built from specific needs that teams have - as opposed to the classic ways of building platforms (“build it and they will come”).

Opinionated but sensible

TVP’s technical governance is very strict about the “what” and “how” of TVP implementation. The minimalist attribute of TVP allows it to be opinionated. While we have an open-source contribution model, we only want to add capabilities that a majority of teams truly need.

After creating each project with strict TVP templates as its foundation, a stream-aligned team can tweak the underlying implementation to fit their specific use cases.

These changes:

are not supposed to be applied upstream to the TVP foundation/templates and
should follow the technical governance/architecture sign-off guidelines applied to the rest of our engineering projects.

Golden path to production

TVP is the Trade Me engineering golden path to production-ready applications. As a TVP customer, a developer can be confident that all must-have guardrails are implemented and, ideally, automated. When a team decides not to use TVP, they must complete the production-readiness checklist manually. Although the TVP aims to cover a considerable percentage of our developer customers, having the option not to use the platform is an important characteristic to allow stream-aligned teams to explore new things (which may evolve into future features of the TVP).

The first release of TVP

We created a user story map to determine the right value slice for the first release. We focused on the thinnest part of TVP for this step, as it was crucial to create good conversations about what we really needed and what we could build later. We created personas to help us do this. By focusing on first building the TVP for a senior .NET developer (rather than a mobile developer or a junior, for example), we could make assumptions about what skills this persona would have and thereby infer what they would need, and not need, in a TVP.

Then, we were ready to start developing!

Current state of the TVP

The diagram above is a simplified version of how teams interact with the TVP. We have three stream-aligned teams that work on the platform. Stream-aligned teams from Consumer & Marketplace and Classifieds use a TVP to build their new services. The team that owns TVP will provide support for TVP’s users, and will continually improve the TVP user experience based on feedback with pushed updates. The process and experience resembles that of any third-party software, except all done internally.

How did the Team Topologies book help?

Team Topologies demonstrated effective methods for articulating the different needs and purposes of our teams. Developers in teams that work on platforms have developers of customer-facing stream-aligned teams as users. This concept allowed us to use a lot of the same practices we use when trying to understand the needs and wants of our external customers.

Focusing on decreasing developers' cognitive load was a shift in how our teams thought about what we were trying to achieve. Previously, our platform teams had been thinking about the outputs we could deliver, but Team Topologies helped us adopt a more outcome-focused approach. This kept the problems we were trying to solve in the front of our minds, which meant better decisions for the TVP.

By reducing the cognitive load on developers, a good platform helps dev teams focus on the germane (differentiating) aspects of a problem, increasing personal and team-level flow and enabling the whole team to be more effective.

When doing product discovery for the TVP, focusing on the “thinnest” part of the abbreviation helped us achieve the first value slice quickly so we could experiment with our developers. We considered changing the name of TVP to something customized within Trade Me, but we liked what Thinnest Viable Platform represented.

What do we wish we knew earlier?

Tell a compelling story

It’s important to tell a story about the value provided. This way, everyone from developers to product managers can get on board with the change. People naturally resist change, so getting them excited about it (and the value the change enables) will make all the difference.

Senior leadership’s support is a key element to the success of this movement across our organization. We have noticed that our senior leaders’ acknowledgement of TVP and using it in their day-to-day language and planning endorses its importance and has made it the platform of choice at Trade Me.

Abstracting away complexity

We try to distinguish how much the developers on stream-aligned teams need to know about what’s going on under the hood of TVP. We want to decrease their cognitive load but not hide all the complexity so that they can support the operations of their systems in production. We are constantly monitoring what developers are having trouble with and figuring out if it’s something we need to abstract away or if it’s something we need to provide training and documentation for.

Making it easy to upgrade and get new features

Another thing we missed with our initial architecture was how a team would update their TVP service with any changes made to the TVP templates. We ended up moving as much of the logic as possible out of the templates and into versioned libraries. This treated the changes like a third-party dependency rather than manually copying and pasting in any new features or fixes. This also allows us to easily federate updates out to multiple services. The less time developers spend on maintenance tasks, the more time they have to create value for our customers.

What’s next?

At the start of our TVP journey, we focused on time to “Hello World” as one of our measures of success. The changes we have made over the last year have decreased that duration from three weeks to one day. While we continue to measure it to make sure developers are satisfied with how long it takes, it’s not going to be our focus anymore. Our key measurement has changed to measuring what version of TVP services are running while continuing to focus on decreasing cognitive load. We want to make sure critical services are benefiting from the latest changes we add to TVP, while also making sure the upgrade experience is as seamless as possible.

We are continually maintaining and improving TVP from the cloud infrastructure and application layer, and have started moving up the stack. We have started building a front-end TVP template, as well as path-to-production elements (such as fitness functions) to complete the end-to-end development stack in our organization.

Whether we are adding new features or decommissioning parts that we no longer need, TVP will continue to grow as the needs of our customers change.