How to Scale End-to-End Observability in AWS Environments

How Mercari Scales Vision, Culture, & Reliability

How Mercari Scales Vision, Culture, & Reliability.png

In a recent fireside chat with Mohan Bhatkar, Head of Engineering for the Customer Reliability Platform at Mercari, Inc. sat down with Blameless Co-Founder Ashar Rizqi. They talked about scaling while avoiding silos, exciting day-to-day challenges, instilling a culture of empowerment, and more. Here are their top insights and the lightly edited transcript of their conversation.


    Originally published on Failure is Inevitable.

    Key Highlights

    • How Mercari aligned their organization even while the company size has grown many folds: Mercari created a camp team structure which aligns the business end, product, and engineering on key business goals pertaining to customer satisfaction.
    • Exciting challenges that Mercari is taking on: Mohan is most excited by empowering his teams by coming up with solutions that scale, rather than workarounds.
    • Instilling a culture of empowerment:  Mercari employs a unique team structure which it calls camps. Each camp is an autonomous group of experienced software engineers, product managers, and engineering managers who are dedicated to a particular part of the company vision. This structure helps balance reliability and innovation while driving consistency, enabling teams to deliver value and innovate faster.
    • How the architecture team functions to limit siloes: Each camp delegates a point person to meet regularly with the architecture team, allowing them to stay in close lockstep.
    • Keeping team members happy and engaged: Mercari focuses on retention by treating employee experience with as much priority as customer experience. Additionally, Mercari believes that all employees deserve equal opportunities.
    • How Blameless fits into Mercari’s vision, process, and culture: Blameless helps Mercari build scalable, repeatable processes around SRE best practices such as incident response, and aggregate usable metrics for continuous improvement.

    Ashar: Mohan, it’s great to connect. You have a fascinating story: you moved from India to Japan, learned Japanese and went into technology, and have built a pretty amazing journey and career. Can you walk us through that?

    Mohan: This year I will be completing 10 years in this field. It's been a long journey. Back in the day, I was doing Electronics Engineering in my university, I was not doing Computer Science Engineering. When I started engineering during university, I was always fascinated about the coding part.

    It interests me more than the electronics part. I actually wanted to change my field within the university itself. But once you select the branch or major, you cannot change easily. I continued my electronics engineering, but at the same time, I struggled along with the basics of programming—C++, the databases. In my last year of university, I was only aware of the Indian software companies or maybe US based software companies which are established in India, but I was not aware of any Japanese software companies.

    But, some Japanese software companies came for recruitment at my university. My university is one of the top universities in the state of Maharashtra. That's how I learned about Rakuten and Japan itself. Before I was completely unaware. During recruitment, they wanted to hire software engineers, and I was interested as Rakuten is one of the major web giants in Japan for all the platforms.

    I was not sure that I could do it because I was from an electronics background and just had knowledge of basic programming. I did the test and then got selected. That was the best thing that happened to me because it was a very unique experience.

    I had to learn Japanese after my selection because most software services are provided in Japanese. I started learning Japanese and working in a software engineering role. Then I developed services for the Rakuten Travel segment. Rakuten Travel is one of the leading online travel services in Japan. They provide hotels, bus transit, car rentals, flight reservations, and more to mainly Japanese customers. At the same time, we were able to launch our inbound site in more than 10 languages so that people outside Japan can come to Japan and explore.

    I worked as a software engineer for four years, developed several reservation platforms, and then moved into the tech lead role in the travel division itself. I later moved into the engineering manager role. It was a pretty long stint in Rakuten Travel, more than eight years. Then I moved to Mercari. 

    The main reason I moved to Mercari was to challenge myself in a completely new way and get out of my comfort zone. It was a growing environment, continuously evolving, whereas Rakuten is a very established company.

    The main reason I moved to Mercari was to challenge myself in a completely new way and get out of my comfort zone. It was a growing environment, continuously evolving, whereas Rakuten is a very established company.

    If you look at the history of Mercari, it is one of the first ever unicorn startups of Japan. It has grown at a very large scale within a small amount of time. I was always curious about it, and wanted to see how I could challenge myself and learn in that environment. After one year of joining Mercari as an Engineering leader they have not disappointed me. Every day you are so excited to do something new, something better for yourself, for our customers, for our company.

    I was working as an engineering manager for the Customer Reliability teams in Mercari. The number of people in the last two years working at Mercari has grown rapidly. The company size has grown many folds. The processes or the structures which used to work before no longer work now at such a scale. From the beginning of this year, the company decided not to go into the siloed structure, but to create more alignment between business, product, and engineering.

    We came up with the camp system of organization. Camp is basically an autonomous group of all the experienced software engineers, product managers, and engineering managers who are dedicated to a particular part of the company vision. I got an opportunity to play the engineering head role for one of the camps.

    The company size has grown many folds. The processes or the structures which used to work before no longer work now at such a scale. From the beginning of this year, the company decided not to go into the siloed structure, but to create more alignment between business, product, and engineering.

    I am responsible for the Customer Reliability Engineering camp. Mercari is a customer-to-customer (C2C) buying and selling platform marketplace. It is not a B2C. Obviously, we do not have control on how customers behave. We basically trust that everyone is fair, and that they won't do anything bad, but at the same time, we have to make sure that the platform itself is safe and secure for everyone. That is how I think about Customer Reliability Engineering. People are selling items and people transacting between themselves, but are they violating any terms and conditions? Are they putting any illegal things on our platform? Are they doing it in a secure way or harming other people?

    We have a moderation platform based on rules and AI technology, where we moderate things and support our customer support team so they can rectify or minimize the number of inquiries or grievances to the users.

    Another aspect is how we can deliver user friendly security to the customers. If you want to implement security, we can implement a lot of things like multi-factor authentication. Everywhere they go, we can restrict them to do certain things. But that's not user friendly. We have to balance between those two. I think we are also providing a good experience for customers to make inquiries to our customer support.

    Ashar: Thank you for sharing, Mohan. There's some really awesome stuff in what you said. Personally, I hated programming. The guy who founded C++ was my professor in college. He was pretty bad. I was the opposite of you in university. I wanted to do hardware and electronics. Major kudos to you for following your passion. For me, I was kind of forced into software, but I love it now.

    I'm curious to hear more about culture and values. You touched on this when you were talking about Mercari. There's something very unique in the way that you're describing your work. You said, “We've decided to organize ourselves in camps around values so that we can better service our customers.” Can you talk a little bit more about that? For a company that's scaling so quickly, to organize yourself and stick true to your values, that's a pretty amazing accomplishment.

    Mohan: What was fascinating to me, when I was looking at Mercari, is that they have three key values which really stick with you. One is to Be a Pro. They expect every one of their employees to be professional in your behavior. You strive to attain the expertise level or skill up your level so that you are a professional.

    Another is Go Bold, which is one of the most interesting values. What they mean by Go Bold is to try something new. Even if you fail, it is okay. If you look into Mercari’s history, they have tried a lot of things in the last six years. We have launched the Mercari marketplace for Japan then the US which worked well. We launched for the UK also, but it didn't work out so well.

    It was a combination of efforts by product engineering, business, and upper level management, but sometimes things don’t work out. At the same time, it doesn't mean that you need to stop. You have to try better. Last year we also launched our payment platform also, which is integrated with our C2 marketplace. It is turning out well. I think what the Go Bold value tells us is that failures are inevitable. You cannot avoid them, but it should not stop you from retrying something, either.

    I think what the Go Bold value tells us is that failures are inevitable. You cannot avoid them, but it should not stop you from retrying something, either.

    The third value is All for One. You need the unit to work in teams. You alone might be an expert, but you cannot accomplish something big if you work alone. You don't know everything. I think they really focus on all these values.

    This is what inspired the camp system. Previously, we had the business, product, engineering, mobile, web, backend, and platforms. They were siloed and the structure itself was not aligned. They were trying to create their own quarterly OKRs, but they were not aligned. That's when we got together as All for One, and thought about what we should be delivering for the customer. 

    We have more than eight product development camps covering the business functionalities of all the marketplace applications. However, they act as one team.

    We have more than eight product development camps covering the business functionalities of all the marketplace applications. However, they act as one team.

    Ashar: I really appreciate that. I think that's really unique. I love hearing about values and how teams organize themselves. No matter how big you scale and how you change, it is the values that become the guiding light and the North Star.

    That's the beauty of values. Once you have them and people internalize them, then people feel empowered to make decisions and take risks to move fast yet stay within a certain framework. 

    In your role as the head of engineering for the Customer Reliability Engineering camp, what excites you most on a day to day basis with what you're doing? What are some of the biggest challenges that you're seeing? Mercari’s system is complex. How do you moderate these infinite possibilities of people potentially violating Terms and Conditions? It’s not like Facebook where you can just hire thousands of moderators to go through and moderate posts.

    Mohan: As a head of engineering for the Customer Reliability Engineering camp, my key responsibility is to make sure my teams’ efforts are aligned to the vision and mission of the company. Our camp’s vision is to create a trusted marketplace where everyone feels safe and secure. Every camp has their own vision. We also have the annual roadmap and we try to stick to that and deliver the values for the customers.

    As a head of engineering for the Customer Reliability Engineering camp, my key responsibility is to make sure my teams’ efforts are aligned to the vision and mission of the company. Our camp’s vision is to create a trusted marketplace where everyone feels safe and secure. Every camp has their own vision.

    My role is basically to make sure that we are aligned, and are delivering. My approach to work is setting clear goals and directions for all the teams, then giving them as much resourcing as possible so that they can deliver on this value. But, not all the things go as planned. I need to dig into those things, and think about what is going wrong. Then I think deeply about what learnings we could take and do better next time.

    What excites me is thinking about solutions to day-to-day issues and improvements. The most exciting part would be that I cannot simply think about workarounds. I need to think about solutions which can work at scale, but which are also cost effective. At the core, that is my key responsibility as well as what most excites me.

    What excites me is thinking about solutions to day-to-day issues and improvements. The most exciting part would be that I cannot simply think about workarounds. I need to think about solutions which can work at scale, but which are also cost effective.

    The key challenges... we all can agree it is not easy. You try to do your best. I need to understand the situation of my teams. In order to deliver something really good, we need to remember that people are the key aspect. People, processes, and technology. All these three go together. As a head of engineering, I need to balance all these three when we are trying to do our daily work.

    Mercari is growing at a very fast pace. The things which used to work last year or before do not work anymore. Again, we need to think of solutions which can work at a scale and in a cost effective manner. One of the biggest challenges which I face is balancing development velocity and reliability. It's a common challenge for all the managers or the leaders. If you try to do one, there is a tradeoff on the other.

    So I think a lot about how we can deliver at a high speed, but also with quality. Since last year, Mercari has tried to scale in all the aspects of the organization, whether it's hiring, onboarding, product development, alignment, or growing their member system. One of the practices I would suggest is agile adoption for all teams.

    I think a lot about how we can deliver at a high speed, but also with quality. Since last year, Mercari has tried to scale in all the aspects of the organization, whether it's hiring, onboarding, product development, alignment, or growing their member system. One of the practices I would suggest is agile adoption for all teams.

    Previously, with respect to delivery, it was the deadline-driven approach or waterfall approach. But we have changed because the organization itself is changing. On the reliability side, as far as incident management, SLO management, etc., we're not quite yet there. But we are trying our best. We are introducing these things to certain teams to ensure that we are delivering value for customers, but with quality.

    Ashar: What are some of the cultural principles that you think about or instill in the team? Let's say that there was no tooling in place to help achieve that. You talked about doing deep thinking. What does that look like for you? What are some of the cultural principles and practices that you try to instill as head of engineering to empower teams to be able to make the right decisions? What have you seen that works well, what have you seen that doesn't work well?

    Mohan: From the beginning of 2018, a lot of changes started to happen in the company, not only in the technological aspect, but also from the organizational aspect. One of the changes on the technology side was moving into a microservices architecture, which many companies are doing.

    Before that, the entire C2C marketplace was working on one single monolithic application. Our CTO and some of the engineering members made a decision to move to microservices, so we started shifting. We started taking out components from our monolithic application and building microservices. But, when you try to do that, everyone tries to reinvent the wheel. That's not the most viable or scalable solution.

    Sometime later, we realized that something was not working. We needed to retrospect at every stage. Through last year, we have established the platform team. This platform team enables all the product development teams to use this platform, so you don't have to reinvent something, or build something from scratch.

    The platform team takes care of all the phases of the software development lifecycle. It provides all the tools and kits so you can build your application. We were fortunate to create this platform. Otherwise, we would have ended up making the same mistakes over and over again.

    That was one good decision and it has worked out well for us so far. It has all the underlying infra teams, the cloud based teams, and all the observability platforms. On the top, they have dedicated teams who provide support to build, test, deploy and operate. 

    We also established this architecture inside the engineering division. The architect team’s job is to provide the best technical solutions or discuss with the product development teams on how they can develop their system based on the particular needs of the business or the customers. 

    At the same time, what kind of reliability goals should they be striving for? This particular service is very critical for customers. These reliability goals ensure that teams implement their systems in a consistent, secure, and reliable way. 

    On top of that, we have the product development teams, who coordinate with them and deliver the product.

    At the same time, what kind of reliability goals should they be striving for? This particular service is very critical for customers. These reliability goals ensure that teams implement their systems in a consistent, secure, and reliable way.

    Ashar: This is something that we did recently at Blameless; we established a platform team internally as well. We saw early on the importance of enabling the other development teams to operate at scale quickly and efficiently, as well as with reliability skills. 

    I'm curious about architecture teams. Architecture teams can sometimes become very siloed. Then information just flows one way: the architects that are sitting on top and telling the other product development teams what to do.What do your teams do differently to work together? What is the secret sauce?

    Mohan: There is a phase in every company where that can happen. The architecture team becomes siloed and the other teams don't know what they are trying to do. I think there was a phase in Mercari where that might have been the case. 

    But in Mercari’s culture, we have an automated process where all the systems go through the architecture review. An architect will give suggestions. When we are delivering a new system in production, we are ensuring that teams are not forcing anyone to do this or that. It's a collaborative approach that we need in order to target a certain level of reliability for particular services. These are lists for that particular critical service and how we can do it together.

    On an organizational level, we have the architect team, but we also have a contact point in each camp or in each team who can share those responsibilities with the architect team. We call it an architect champion. This person connects that camp with the architect team, and they have regular meetings. The architect champions convey architecture guidelines or new updates with their teams.

    Ashar: That's really powerful. You said you have a reliability goal. At a systems level, you get the product engineers to provide their view, and then project managers organize and coordinate across. Focusing on the goal of reliability is what eliminates the siloed nature of traditional architecture teams. 

    How do you keep teams engaged at Mercari? Tell us a little bit more about team engagement, retention, and career progression.

    Mohan: It’s about giving equal opportunities to everyone in the company. If your employees are happy, then you'll move forward. Mercari gives equal importance to the employee experience as the customer experience. If the company is multiplying in size, there are going to be problems.

    It’s about giving equal opportunities to everyone in the company. If your employees are happy, then you'll move forward. Mercari gives equal importance to the employee experience as the customer experience.

    There is going to be a situation like COVID-19, where we don't know where we are heading. That's when we essentially focus on the fact that our employees are our best asset. We need to provide opportunities equally to all of the employees.

    We believe everyone is a software engineer. What we mean by software engineer is not that we want everyone to be exporting everything in the software. Not everyone can develop iOS, Android, or back end. That is not possible. But what we do mean is that if somebody wants to try something in engineering, we should not tell them that because they don't have experience or expertise they cannot try it.

    The opportunity is open for everyone to try something. The end goal is to achieve a certain impact on customers and on the company. How you do it, the way you approach it, is a different part. This is a cultural aspect of how the company or engineering division thinks about it.

    The opportunity is open for everyone to try something. The end goal is to achieve a certain impact on customers and on the company. How you do it, the way you approach it, is a different part.

    When it comes to the team engagement, one of the really fascinating things to me was the team lens of vision and mission. We have a vision and mission, not only for the company, not only for the division level or camp level, but on a team level. Even if it is a team of say, five or six people, they have a team vision or mission. They have a roadmap to achieve that. 

    For example, we have a team who is developing internal tools only for customer support people. They are not developing any products for the real customers, but for our colleagues who are customer support. Their vision and mission is to improve the efficiency or productivity of those people through automation.

    At the same time, all the teams have a particular mission on a camp level. If you combine everything at the camp level, it is all connected at the company level. We are trying to align all these individual visions to the actual end goal of the company or the customers. 

    However, this is difficult. Not everyone will feel connected. It is not possible that 100% of your employees feel connected. There are challenges. Some people might not be motivated by the work they are doing. As an engineering manager or manager of that person, we need to look into the detailed issues. What is happening in the team? Are there any motivational problems? Are there any technical challenges or the process challenges? Then we try to solve those issues.

    When it comes to giving growing opportunities to everyone, the company is really, really open. If somebody wants to try something, we never say no. We always say, let us figure out what we can do about it. If we say no, then the person can get demotivated. That is how we engage the teams; try to align them towards a bigger mission rather than thinking about the smaller issues.

    If somebody wants to try something, we never say no. We always say, let us figure out what we can do about it. If we say no, then the person can get demotivated. That is how we engage the teams; try to align them towards a bigger mission rather than thinking about the smaller issues.

    Ashar: Thank you for sharing that. That really resonates. As you grow and scale very quickly, it becomes very easy for the vision and mission to diffuse.  It's critical to take a more compartmentalized approach. Every team should have a mission statement. That really helps create a sense of belonging and alignment. In a leadership role, it really gives you visibility and the opportunity to make decisions.

    We'd love to learn a little bit more about your journey with Blameless as well. How do you see Blameless fit into your mission as a technology leader and how has your team's experience been so far? 

    Mohan: The journey with Blameless has been pretty amazing. The word Blameless itself represents a very key part of our culture. Our CTO and senior management always say that we should not blame people. We should blame the systems and process.

    The journey with Blameless has been pretty amazing. The word Blameless itself represents a very key part of our culture. Our CTO and senior management always say that we should not blame people. We should blame the systems and process.

    Exactly one year ago, we were introduced to Blameless. I was trying to gather all the issues related to incident management by talking with different stakeholders inside the company like customer support, our payment platform people, the mobile engineers, backend engineers, and infra. We were trying to summarize what the issues are.

    Our CTO introduced us to Blameless and I requested a demo and connected with Morgan (Blameless Account Executive). In the last quarter of 2019, we did the proof of concept. We had the onboarding session, including our CTO, VP, and myself, and then we got a demo, where we saw that Blameless could work out most of our problems.

    Then we did a POC for the entire quarter with different teams from security, the payment platform, client, and mobile teams. The next quarter, we did a formal trial. The Blameless team actually came to Tokyo and did the training in person for more than 10 teams and over a hundred members.

    Last quarter, we decided that we would adopt Blameless as an incident management tool  across all of Mercari. Then this quarter onwards, we have started the adoption. Introducing Blameless to more than 600 people takes time, the journey is going really well so far.

    Last quarter, we decided that we would adopt Blameless as an incident management tool  across all of Mercari. Then this quarter onwards, we have started the adoption. Introducing Blameless to more than 600 people takes time, the journey is going really well so far.

    However, due to COVID-19, we could not do the in-person training for the company-wide rollout. So, we came to the idea that we will prepare video-based training content, which is self-sufficient and very easy to follow. We also provided the sandbox environment for people to try it out. So far it’s been a very good approach to training the teams and establishing our process. 

    The biggest value that Blameless provided us is having the right structure to organize our entire incident management process. Previously, I think we used Google Forms, Google Docs, and  JIRA. We were not able to drive any metrics out of our tooling because it was difficult.

    Blameless provides us a tool to measure everything about an incident. If we can't measure something, we cannot improve. With Blameless, we are at a stage where we can start measuring the metrics or important things  related to the incident. We can understand our MTTA, MTTR, how much time we are taking to finish the postmortem, follow up actions, and how we are reducing the customer impact time. We really look forward to implementing more and more on this line.

    The biggest value that Blameless provided us is having the right structure to organize our entire incident management process... Blameless provides us a tool to measure everything about an incident. If we can't measure something, we cannot improve.

    If you enjoyed this blog post, check out these resources:


    Get similar stories in your inbox weekly, for free



    Share this story:
    blameless
    Blameless

    Blameless is the industry's first end-to-end SRE platform, empowering teams to optimize the reliability of their systems without sacrificing innovation velocity.

    How to Scale End-to-End Observability in AWS Environments

    Latest stories


    How ManageEngine Applications Manager Can Help Overcome Challenges In Kubernetes Monitoring

    We tested ManageEngine Applications Manager to monitor different Kubernetes clusters. This post shares our review …

    AIOps with Site24x7: Maximizing Efficiency at an Affordable Cost

    In this post we'll dive deep into integrating AIOps in your business suing Site24x7 to …

    A Review of Zoho ManageEngine

    Zoho Corp., formerly known as AdventNet Inc., has established itself as a major player in …

    Should I learn Java in 2023? A Practical Guide

    Java is one of the most widely used programming languages in the world. It has …

    The fastest way to ramp up on DevOps

    You probably have been thinking of moving to DevOps or learning DevOps as a beginner. …

    Why You Need a Blockchain Node Provider

    In this article, we briefly cover the concept of blockchain nodes provider and explain why …

    Top 5 Virtual desktop Provides in 2022

    Here are the top 5 virtual desktop providers who offer a range of benefits such …

    Why Your Business Should Connect Directly To Your Cloud

    Today, companies make the most use of cloud technology regardless of their size and sector. …

    7 Must-Watch DevSecOps Videos

    Security is a crucial part of application development and DevSecOps makes it easy and continuous.The …