Software Infrastructure Group DNA
One of the most important enablers for rapid growth is having excellent infrastructure. Infrastructure can come in many forms. It might be a framework which allows you to bootstrap products easily. It might be a set of tools or libraries that make your life easier. It might be a platform which provides you with all your runtime needs, or it might even be a set of services providing common functionality needed by multiple products. Either way, the pattern is clear: Infrastructure is what makes you more productive by allowing you to focus as much as possible on your unique product needs and providing common solutions for the rest.
In Wix, we strongly believe in building all of our products on top of amazing infra and invest a lot of effort in building infra groups that can back the development of dozens of products and hundreds of engineers. This requires defining the DNA of infra groups and the most critical values for their success.
This article tries to dive into the details of what makes a great infra group and how to develop, maintain and support infra successfully. In order to do that we will go through a series of topics and try to describe them from an infra group perspective.
Who is my user?
This is probably the point that distinguishes the most an infra group from a product group and drives the logic behind most of the points in this document. While product groups are almost always customer facing, infra groups are mostly facing other groups in the organization. It means that just like products are obsessed with giving great experience to their customers, infra groups must be obsessed with providing great experience to other groups. Just like with products, if people don’t enjoy using your infra, or find it difficult to use or feel that you don’t provide good support or docs or quality or don’t answer their needs — it means you have a bad product. Just like with products, you must have a deep understanding of your customers, collect feedback, learn about their needs and prioritize smartly. And just like with products, you have to innovate, identify your early adopters, work closely to understand you are doing things that help them, and adjust quickly.
Does it mean that infra never reaches real customers? Absolutely not. For example, infra in many occasions might involve actual UI or service which is embedded into other product flows or even hosts other products. In those cases it can be argued that still the most important user here is the products which use your infra and what matters most is that you give their users the experience they expect of you.
Backlog & priorities
One of the most challenging parts of being an infra group is translating your customers needs into actual tasks and prioritizing them. Just like with products, two users might ask for two distinct things and it is your job to understand that you can do a single thing that will solve both problems or to identify a pattern in some of the requests or to understand that some request is too specific for the user’s use case and requires a more generic solution.
The nice thing is that unlike products, your users work with you in the same organization. Which means you can easily communicate with all of them, understand their needs, and think together with them on the best solutions. This also means that you must be totally transparent about your decisions regarding how you prioritize their request and when you expect it to be delivered. Prioritizing is your responsibility and as always needs to take into account things such as impact, effort, technical debt, product strategy, etc.
A good infra group should make sure to actively collect requirements from its users at least once every month, should have a point of contact, always open to receiving new ad-hoc requirements and should present every 2–3 months its current plan after taking into consideration all the requirements.
Execution
Infra groups sometimes have the bad reputation of not having deadlines or execution plans and not being transparent about their progress. However, we believe that it actually should be the other way around — as an infra group, the delivery of other groups is actually dependent on you. It means that any glitch in your execution might cause a ripple effect which can be devastating for your dependents.
This means that you should plan carefully, not bite more than you can chew and be extra transparent about your requirements, progress & delays. There needs to be a shared place where everyone can see the current status of all the things that the team is working on and its backlog based on priorities.
Contribution SLA
No matter how we look at it, there will always be cases where the infra group is not able to satisfy all of the requirements in a timely manner. We provide a more scalable model by allowing groups who can’t wait, to contribute their changes themselves. In order for this to work we have a few prerequisites:
- Code base of an infra project must always be in high quality and easy to contribute to
- Any contribution must first be discussed and designed together with the infra group
- Infra group must be available to work on design with contributor in a matter of 1 week (assuming contributor obligates to work on that change soon thereafter)
- Infra group must be available to review the contribution in a matter of 1–2 days
- Once review process starts, an infra developer will be allocated to pair and assist in merging the contribution as soon as possible
- Contribution must be aligned with the best practices and standards of the infra group
- Once contribution is approved, infra team takes ownership on the contribution
- Contributing team should be ready to assist with problems in the first 2 months if requested to do so by the infra team
Remember that SLA means that you are committed to the intervals mentioned above. This means that when thinking of teams growth, you must take into consideration the estimated amount of contribution and how much effort it will require from you.
High quality & High availability
Remember that choosing to use some infra is actually an act of trust. It means that the product team agrees to rely on you and use something which may have drastic effect on his quality. This should not be taken lightly. With bad infrastructure, buildings can collapse. This means that when it comes to infra, we should pay extra attention to the quality of what we produce, make sure it is complete and easy to use and make sure it is super stable and reliable. We should put also extra attention on production monitoring and make sure we monitor it from the aspect of each an every product which uses our infra.
Support & docs
Infra groups must take into account that a big part of their time will need to be allocated for helping people with using their infra, answering questions, helping to debug issues, fixing ad-hoc issues, reviewing PR’s, etc. This is something which is less common in product groups and is something that developers are less accustomed to doing. Nevertheless, good support is a critical part of an infra group and must be emphasized as one of its main values.
It means that the team should be super responsive (on Github & on Slack for example), answer questions professionally & politely and follow up with people to make sure their problem is solved.
A big part of this is also having great docs describing how to use the infra and making sure they are constantly maintained and updated with any lessons learned from supporting the users. I can’t emphasize enough the importance of great docs and great support. Those are the kind of things that if done right can help an infra group gain a lot of credit, and if done wrong may result in it losing all of its credit.
Dev advocates
One of the most frightening things we see about infra groups is that sometimes they forget or don’t know how to put themselves in their users’ shoes and make sure that the products they provide them with, actually answer their needs. One thing that we highly recommend doing is making sure that every infra developer gets experience with working on a product and vice versa (can be achieved through a “student exchange” program and through internal mobility). Another super important tool is the dev advocate.
In general a dev advocate is a developer whose job is to be the missing link between the infra group and its users. A dev advocate will sometimes work with a product group to use his group’s infra. Sometimes he will provide support for other groups through the group’s support channels. He is expected to be able to give lectures, improve docs, create examples and most importantly — provide feedback to his group about gaps in developer experience.
Early adopters
When innovating, it is critical not to develop in a void. When creating something new, infra groups must always go through the process of finding the first 2–3 early adopters of its new solution, collect requirements from them, come up with design together and work closely together with those groups to experiment with the new solution.
In this early stage period, infra group should be even more responsive to any issue that the early adopters might have. Essentially, they are their alpha testers, and if you want to have happy alpha testers, you must give them the best service.
Choosing the right early adopters is critical for the group’s success. Choosing users that find too little value in your new solution, will cause them not to be responsive enough for you. Choosing users that are “special cases” can create an infrastructure that is too-fitted to its most complicated case.
Data collection
It is the infra group’s responsibility to collect and measure data about the usage of its products. We want to make sure that we know how people use the infra, whether it does what we expect it to do, whether people have any issue with it, etc.
Also, obviously for infra that is embedded in products or infra that hosts products, we want to collect any analytics that can be useful for the products that use our infra. It means that they should also have easy ways to distinguish data (and its related application context) collected by the infra used in their product from the same infra in other products.
Backward compatibility & Deprecation policy
Usually in infra at some point we’ll need to do a breaking change which will force anyone using this infra to change something in order to keep using it. As a general rule we avoid such breaking changes as much as possible. When we must do a breaking change, we must notify about it gracefully — notify all users through a deprecation message and break only after enough time has passed.
We recommend maintaining a central board for migration tasks and to allow consumers a minimum of 6 weeks notice prior to deprecation. When possible, a build warning will show up during the grace period. After the deadline for the deprecation, the warning will switch to an error and will block further development.
API design
In a big software ecosystem, product groups rely on multiple infra groups providing many tools, libraries, services and platforms. Therefore, it is critical that API’s provided by the different infra groups will be consistent, easy to grasp and add the minimum cognitive load for people consuming those API’s. Introducing too many ways to do things that are similar causes frustration, confusion and can seriously hurt velocity.
In order to make sure we keep those practices, all of the infra groups should have shared guidelines, conventions and methodologies for defining their API. If this doesn’t create a problematic bottleneck, it might also be a good idea to have a dedicated cross-company team which can help with design and review of new API’s. If it does become a bottleneck, there are other solutions you can consider to allow bigger scale, such as embedding members of this team into big enough infra groups.
Conclusion
As mentioned earlier, we believe that the concepts and values described here are necessary for an infra group to be successful. Some of them are unique for infra group. Some of them are also critical for any group, but needs extra emphasis in infra group. Do we follow all of it to the letter at Wix? We are far from being perfect, but we definitely do our best and keep trying to improve.
Obviously there are many more things that are critical for any team to be successful, but this is a topic for a different article. If you are interested in reading more about general engineering values we believe in, The 10x Engineer might be a good place to start.