Tokyo

JAPAN AI - Software Engineer, AI Platform

Tokyo
Partial Remote
Full-time
April 15, 2026

About JAPAN AI

JAPAN AI, Inc. was established in April 2023 as a group company of Geniee, Inc. (TSE Growth Market) with the mission of dramatically expanding human potential through AI technology. We drive cutting-edge AI R&D both domestically and internationally.

Our ambition goes far beyond building AI chatbots. We are building "the brain of the enterprise" — a next-generation core system where AI autonomously executes business operations by integrating all of a company's SaaS tools. With JAPAN AI STUDIO at the center, we are implementing a world where — given a database — no separate application is needed; AI performs the work and returns only the results.

Through the transformative power of AI, we aim to create new value and contribute to the advancement of society as a whole. Join us in leading AI innovation and shaping a future where technology empowers people to achieve more.

Why We're Hiring

"The brain of the enterprise" must never go down.

In a world where JAPAN AI STUDIO autonomously executes tasks such as approval workflows, resource allocation, and prospect discovery 24/7 for approximately 200 client companies, a platform uptime of 99.9% is the bare minimum. At the same time, optimizing inference and infrastructure costs in an environment where hundreds of workflows run concurrently — and improving developer experience — are critical demands.

If the Agent Harness Engineer is "the person who builds the engine," the Software Engineer (AI Platform) is "the person who builds the environment where the engine runs reliably." Kubernetes cluster design and operations, observability infrastructure, inference cost optimization, CI/CD pipeline development — this is a position that supports the entire infrastructure of "the brain of the enterprise" through the power of backend engineering.

Mission

"Support a world where 'the brain of the enterprise' never stops — 24/7, 365 days a year."

Design, build, and operate the shared foundation — backend services, execution environments, observability, and governance — that enables AI agents to operate safely, quickly, and reliably. Maximize the reliability and cost efficiency of the entire platform.

Role & Expectations

As a Software Engineer (AI Platform), you will power the reliability, performance, and cost efficiency of the entire AI platform through backend engineering.

Design, implement, and operate backend services while also optimizing Kubernetes clusters and cloud infrastructure
Design and build observability infrastructure (tracing, logging, metrics) to rapidly detect and resolve failures unique to AI agents
Deliver improvements with direct business impact through inference cost and infrastructure cost optimization
Maintain 99.9% uptime through SLI/SLO design and operations, on-call, and incident response
Improve developer experience for in-house engineers through CI/CD pipeline construction and development environment improvements

Why You'll Love This Role

At the intersection of Backend × Infrastructure — A new domain where you support the entire platform through the power of backend engineering.
Platform engineering for the AI era — Go beyond traditional infrastructure / SRE to tackle AI-specific challenges: inference cost optimization, GPU management, agent tracing, and more.
Large-scale cloud infrastructure design — Gain experience designing and operating large-scale distributed systems with Kubernetes, event-driven architectures, and autoscaling.
Cost optimization with real impact — Inference and infrastructure cost optimization directly translates to business impact. Improving $/request ripples across all products.
Powering every product — Support 99.9% uptime for a production environment used by ~200 companies. Every AI agent runs on the infrastructure you build.
Rapid-growth environment — In a startup that has grown to 200+ people and 9 products in just 3 years, you will have significant autonomy in technical decision-making.

Job Description

Backend Services & Platform Development
- Design, implement, and operate backend services for the AI platform
- Design, build, and operate Kubernetes clusters
- Architect and optimize cloud infrastructure (GCP)
- Codify and automate infrastructure with IaC (Terraform)
- Cost/performance optimization (autoscaling, caching, batch processing, GPU management)
Observability & Governance
- Design and build the observability stack (tracing, logging, metrics)
- Implement AI agent-specific tracing (inference request tracking, tool call visualization)
- Build data access and permission management infrastructure
- Address security requirements
SRE & Reliability
- Maintain platform uptime of ≥99.9%
- Design and operate SLIs / SLOs
- On-call, incident response, and post-mortems
- Continuously improve incident MTTR
Developer Experience
- Build and improve CI/CD pipelines
- Maintain development and staging environments
- Create and maintain infrastructure documentation for internal engineers

Example Scenarios

Scenario 1: Backend service optimization for the inference pipeline
A surge in inference requests degrades backend service latency. You analyze request patterns, redesign the caching strategy, and implement asynchronous processing in the backend service. Result: 40% improvement in P95 latency while reducing inference costs by 20%.

Scenario 2: Building the agent tracing infrastructure
Root-cause analysis for AI agent failures is taking too long. You design and implement an OpenTelemetry-based tracing infrastructure that visualizes the full flow from inference request → tool call → external API integration. Result: 50% reduction in MTTR.

Scenario 3: Cost optimization in a multi-tenant environment
In a multi-tenant environment serving ~200 concurrent customers, you build a dashboard that visualizes per-tenant resource consumption. By optimizing resource allocation based on usage patterns, you achieve a 15% improvement in infrastructure cost ($/request).

Key Results (KR/Metrics)

Platform uptime ≥ 99.9%
Agent execution latency P95 / P99
Infrastructure cost efficiency ($/request)
Developer experience score (internal NPS)
Incident MTTR ≤ target value

Team Structure

Approximately 120 members are part of the development organization.

Software Engineers (AI Platform) work across the following groups:
- Infra — Cloud infrastructure and SRE
- Data — Data pipelines and analytics infrastructure
- Agent Harness — Agent execution framework
Closely collaborating roles:
- Agent Harness Engineer — Agent execution infrastructure design and implementation
- Agentic Product Engineer — Agent feature development
- AI Quality Scientist — Evaluation pipeline collaboration
- Product Manager — Product design and non-functional requirements definition

You May Be a Good Fit If You

Bachelor's degree or equivalent practical experience in Computer Science, Software Engineering, Artificial Intelligence, Machine Learning, Mathematics, Physics, or related fields
3+ years of practical experience as a backend engineer
Production product development experience in Python
Design and operations experience on cloud platforms (AWS / GCP / Azure)
Understanding and operational experience with Kubernetes / container orchestration
Distributed system design and operations experience
Language requirement (at least one):
- Japanese: Fluent — able to discuss product development without friction
- English: Business level

Strong Candidates May Also Have

IaC practical experience (Terraform / Pulumi, etc.)
GPU cluster operations and optimization experience
ML infrastructure / MLOps construction experience
AI workload operations experience (inference servers, model serving)
Event-driven architecture experience (Kafka / RabbitMQ, etc.)
SRE / DevOps practices (SLI / SLO design, Chaos Engineering, etc.)
Security engineering experience
Technical communication ability in English

Tech Stack

Languages: Python (backend), TypeScript / React / Next.js (frontend) / NX
Infrastructure: GCP (containers / K8s), Docker, Terraform
Messaging: Kafka / Pub/Sub
Monitoring: Prometheus, Grafana, OpenTelemetry
CI/CD: GitHub Actions
Tools: Slack, Confluence, Linear, Google Workspace, GitHub, Notion
AI Dev Support: Claude Code MAX Plan, Cursor, ChatGPT, Devin
Hardware: Mac (Apple Silicon), dual monitors

Learning & Development Support

AI Tool Usage Support: Company covers the cost of using AI tools such as JAPAN AI SaaS services, Cursor, ChatGPT, Claude, etc.
Development Tool Support: If a desired development tool is paid, the cost is covered (up to ¥30,000 per year)
Book Purchase Assistance: Company covers the cost of purchasing books for learning, such as technical books (up to ¥30,000 per half-year)
Language Learning / Qualification Support: Company covers the cost of Japanese or English learning programs and qualification acquisition
Refresh Allowance: Company covers the cost of services used for personal refreshment (up to ¥5,000 per month)
Housing Allowance: Housing allowance provided for those living in designated areas (up to ¥30,000 per month)

Hiring Process

Application Review
Coding Assessment
Interviews (4–5 rounds)
Offer

A reference check will be conducted prior to the final interview.

APPLY NOW ➜Japanese Required ⚠️

About Geniee

Geniee actively utilizes AI technology in product development and is an in-house product.

With “GENIEE SFA/CRM” and “GENIEE CHAT”, users can create automatic summarization of minutes using ChatGPT. Geniee provides AI-related functions, such as automatic email creation, that help customers improve their business efficiency and productivity. Under these circumstances, they provide implementation consulting, product provision, and services related to AI technology.

In order to further promote research and development, they’ve established a new subsidiary, “JAPAN AI Co., Ltd.” in April 2023. JAPAN AI Co., Ltd. has a purpose of “passing down Japan’s traditions and using AI to increase the potential of businesses.” They develop and provide various AI products to improve the productivity of Japanese companies and revitalize the industry. In order to develop advanced products, they also conducting research in areas such as various large-scale language models such as ChatGPT and Generative AI.