Site Reliability Engineer - Observability

  • Tokyo
  • Partial Remote
  • Full-time
  • August 1, 2025
Conditions
location-icon
Apply from Anywhere 👍
visa-icon
Relocation to Japan 👍
(Overseas visa sponsorship supported)
Requirements
language-icon
Language Requirements
Japanese: Not Required 👍
English: Business Level
career-icon
Minimum Experience
Mid-level or above

About KOMOJU

KOMOJU (by Degica) is the leading cross-border payment gateway for Japan. We power payments for companies like video game distribution platform Steam and the popular mobile app TikTok. Today we help thousands of merchants by providing them with the payment infrastructure they need through developer-friendly API’s to integrations on popular platforms like Shopify and Wix; we help our merchants grow in all markets they are expanding.

 

About the position

As our systems grow in complexity, scale, and traffic, maintaining their reliability and availability becomes increasingly challenging—and critical. We're looking for a Site Reliability Engineer (SRE) with a focus for observability to help us meet these demands.

In this role, you'll be at the forefront of ensuring that our infrastructure is not just running, but understandable and measurable. Observability is a core pillar of our reliability strategy—it's how we detect issues before they impact our merchants and users, quickly understand the root causes of incidents, and continuously improve our systems performance and reliability.

You’ll design and evolve our observability platform, including metrics, logging, tracing, and alerting, and partner with development teams to embed observability into every stage of the software lifecycle. Your work will directly impact our ability to scale confidently and respond to incidents swiftly.

This is a key role for someone who wants to build resilient systems, empower teams with actionable insights, and make a real difference in how we operate at scale.

While we are a remote-first company, this position is based in Tokyo, and we expect candidates to be willing to relocate to Japan.

 

Responsibilities

  • Design, implement, and maintain our observability stack (metrics, logging, tracing, dashboards).
  • Define and monitor SLIs/SLOs to ensure service health and reliability.
  • Correspond with engineering teams to instrument applications for better visibility.
  • Build and maintain dashboards and alerts that provide actionable insights and minimize alert fatigue.
  • Troubleshoot system performance and reliability issues using observability data.
  • Educate and guide engineering teams on best practices in monitoring, alerting, and incident response.
  • Contribute to postmortems and continuously improve system transparency and resiliency.

 

Requirements

  • 3+ years in SRE roles.
  • Hands-on experience with observability tools, preferably Datadog.
  • Proficiency in Terraform.
  • Background in software development.
  • Proficiency in at least one scripting or programming language (Ruby/Rails, Python, Go, Shell Script, etc.).
  • Experience working with AWS.
  • Familiarity with monitoring design principles: RED, USE, SLI/SLO, alert tuning.
  • Ability to analyze logs, metrics, and traces to diagnose issues and identify trends.

 

Nice to have

  • Knowledge of CI/CD pipelines and integrating observability into build and deploy processes.
  • Familiarity with incident response, on-call rotations, and post-incident reviews.
  • Business-level Japanese.

 

Benefits

  • At Degica, we embrace remote work while also offering office space for those who prefer in-person collaboration
  • 10 days regular vacation, additional 5 days summer and 5 days winter vacation
  • Paid birthday holiday
  • Budget for self-learning allowance, to ensure our employees’ skills remain current
  • Language training for Japanese

Global payments made simple

KOMOJU is the leading cross-border payment gateway for Japan. They power payments for companies like the video game distribution platform Steam and the popular mobile app TikTok. Today, they help thousands of merchants by providing the payment infrastructure they need, from developer-friendly APIs to integrations on popular platforms like Shopify and Wix. They help merchants grow in all the markets they are expanding into.

Developer-centric, inclusive culture

Engineers at KOMOJU work in a developer-centric and inclusive culture where engineers have a say in both product and technical decisions. The culture is largely self-organizing, which means engineers have both a stake and ownership in what they work on. Engineers play to their strengths, but are also able to invest in areas where they want to grow within the team. At KOMOJU, engineers are the main drivers behind their growth and position in the company.

International at core

Around half of the team members come from outside Japan. English is the primary language used within the engineering team, and throughout the company many people are bilingual.

As an international company, KOMOJU understands the importance of bilingualism. They offer all employees a choice between optional English and Japanese lessons to help create a culture of diversity and ensure smooth collaboration across teams.

Passionate about technology

Developers at KOMOJU are passionate about their craft. To foster innovation, they have a monthly open hack day where developers can work on whatever they want. It could be trying a new programming language or tool, fixing a long-standing annoyance, or something fun and experimental, like building a game.
View KOMOJU's company page

↑ Back to top ↑

Site Reliability Engineer - Observability at KOMOJU
APPLY NOW  ➜