Job Description

About KOMOJU

KOMOJU (by Degica) is the leading cross-border payment gateway for Japan. We power payments for companies like video game distribution platform Steam and the popular mobile app TikTok. Today we help thousands of merchants by providing them with the payment infrastructure they need through developer-friendly API’s to integrations on popular platforms like Shopify and Wix; we help our merchants grow in all markets they are expanding.

About the position

As our systems grow in complexity, scale, and traffic , maintaining their reliability and availability becomes increasingly challenging—and critical. We're looking for a Site Reliability Engineer (SRE) with a focus for observability to help us meet these demands.

In this role, you'll be at the forefront of ensuring that our infrastructure is not just running, but understandable and measurable . Observability is a core pillar of our reliability strategy—it's how we detect issues before they impact our merchants and users, quickly understand the root causes of incidents, and continuously improve our systems performance and reliability.

You’ll design and evolve our observability platform, including metrics, logging, tracing, and alerting , and partner with development teams to embed observability into every stage of the software lifecycle. Your work will directly impact our ability to scale confidently and respond to incidents swiftly.

This is a key role for someone who wants to build resilient systems , empower teams with actionable insights , and make a real difference in how we operate at scale.

While we are a remote-first company, this position is based in Tokyo, and we expect candidates to be willing to relocate to Japan.

Responsibilities

Design, implement, and maintain our observability stack (metrics, logging, tracing, dashboards).
Define and monitor SLIs/SLOs to ensure service health and reliability.
Correspond with engineering teams to instrument applications for better visibility.
Build and maintain dashboards and alerts that provide actionable insights and minimize alert fatigue.
Troubleshoot system performance and reliability issues using observability data.
Educate and guide engineering teams on best practices in monitoring, alerting, and incident response.
Contribute to postmortems and continuously improve system transparency and resiliency.

Requirements

3+ years in SRE roles.
Hands-on experience with observability tools, preferably Datadog.
Proficiency in Terraform.
Background in software development.
Proficiency in at least one scripting or programming language ( Ruby/Rails , Python, Go, Shell Script, etc.).
Experience working with AWS.
Familiarity with monitoring design principles: RED, USE, SLI/SLO, alert tuning.
Ability to analyze logs, metrics, and traces to diagnose issues and identify trends.

Nice to have

Knowledge of CI/CD pipelines and integrating observability into build and deploy processes.
Familiarity with incident response , on-call rotations, and post-incident reviews.
Business-level Japanese.

Benefits

At Degica, we embrace remote work while also offering office space for those who prefer in-person collaboration
10 days regular vacation, additional 5 days summer and 5 days winter vacation
Paid birthday holiday
Budget for self-learning allowance, to ensure our employees’ skills remain current
Language training for Japanese

Job Tags

Summer work, Work at office, Remote work, Relocation,

Similar Jobs

Center for Hearing and Deaf Services, Inc.

ASL/Sign Language Interpreter Job at Center for Hearing and Deaf Services, Inc.

...Job Title: ASL/Sign Language Interpreter Job Description: Provides interpreting services in a variety of settings in the community for individuals in need of signed interpreting. Job Relationships: Responsible to: Erie Office ASL Interpreting Coordinator Required...

Asante

Medical Laboratory Scientist (Certified) Job at Asante

...Cytology, Microbiology, Molecular, and Histology. Our lab is also a Red Cross Depot Center and offers an extensive esoteric menu through... ...Lab department ONLY: A specialty certification from the American Society for Clinical Pathology (ASCP) may be considered in lieu...

Drama Kids of Manasota

Assistant Drama Teacher Job at Drama Kids of Manasota

...looking for PART-TIME, caring, fun, and motivated people who are passionate about developing children to join our growing team. Our teachers believe in what they do and that they can affect positive change every day. Their energy and enthusiasm encourage kids to become...

Banner Health

Associate Manager Housekeeping Nights Job at Banner Health

...moves for all hospital departments; tracks inventories and movements in computer software program. Coordinates services for major housekeeping requests, such as construction planning, event planning, and all emergency-related facility responses.5. Works with patients,...

POP MART

Area Loss Prevention Agent Job at POP MART

Area Loss Prevention Agent Glendale, CA(On-site) POP MART, founded in 2010 (SEHK: 9992), is a market-leading entertainment company and a global champion of designer toy culture. Through global artist development, IP operations, designer toy culture evangelism, and...

Site Reliability Engineer - Observability Job at Degica, Japan

OU1EWVg5d1I2bjZ0dFl2ODFBQVhjWlZBd1E9PQ==