Infrastructure Engineer

  • Osaka
  • Partial Remote
  • Full-time
  • August 21, 2024
Conditions
yen-icon
Β₯6M - Β₯7.5M /yr
location-icon
Apply from Anywhere πŸ‘
visa-icon
Relocation to Japan πŸ‘
(Overseas visa sponsorship supported)
Requirements
language-icon
Language Requirements
Japanese: Not Required πŸ‘
English: Business Level
career-icon
Minimum Experience
Mid-level or above

Company description:

Rokken is a software development company that provides customized solutions to meet its clients' needs. Our expertise primarily focuses on designing innovative applications in the fields of machine learning and real-time 3D visualization, combined with experience in the medical sector.

 

Job offer description:

We are seeking a skilled Infrastructure Engineer to design, implement, and manage our growing in-house computational infrastructure. This is an exciting opportunity to shape the future of our technical capabilities from the ground up. The ideal candidate will have extensive experience with distributed computing systems, cloud technologies, and infrastructure management. This role is crucial in enhancing our computational capabilities for machine learning, software development, and other resource-intensive tasks.

 

Current Infrastructure and Growth Plans:

We currently have around 25 PCs in our cluster, with plans for significant expansion. As our first Infrastructure Engineer, you'll have the unique opportunity to design and implement scalable systems that will grow with our company.

 

Language Requirements

  • English: Business Level (required)
  • Japanese: Not required

 

Minimum Experience

  • Mid-level or above

 

In this role, you will:

  • Design, implement, and manage our in-house computational cluster, planning for future growth.
  • Optimize resource allocation and utilization across our growing number of PCs.
  • Set up and maintain distributed computing systems for ML training and other tasks.
  • Implement and manage containerization and orchestration solutions.
  • Establish and maintain a distributed file system for efficient data storage and access.
  • Configure and manage various development and operational tools.
  • Implement monitoring, backup, and security solutions for the cluster.
  • Set up and manage web services and ML backends.
  • Collaborate closely with ML engineers and software developers to understand their needs and design optimal solutions.
  • Document all systems, processes, and best practices.
  • Participate in backend development projects, particularly related to running ML tasks on the cluster.

 

Minimum Qualifications:

  • 3+ years of experience in infrastructure management or similar roles.
  • Strong Linux administration skills.
  • Proficiency with containerization technologies (e.g., Docker).
  • Experience with distributed computing systems and workload managers.
  • Excellent problem-solving and analytical skills.
  • Strong documentation and communication skills.
  • Proficiency with version control systems, especially Git.

 

Preferred Qualifications:

  • Experience with cluster management tools (e.g., Slurm, Kubernetes).
  • Knowledge of distributed file systems (e.g., Ceph).
  • Familiarity with CI/CD pipelines and self-hosted runners.
  • Experience with network management and VPN solutions (e.g., Tailscale).
  • Knowledge of monitoring and logging solutions for distributed systems.
  • Experience with package management systems (e.g., PIP, VCPKG).
  • Familiarity with backup strategies for large-scale systems.
  • Experience with hybrid cloud-local setups.
  • Familiarity with cloud technologies and services.
  • Backend development experience, particularly in the context of ML workloads.

 

Relocation Support for Overseas Candidates

  • Assistance for visa sponsorship and application.
  • Fully furnished apartments available on demand.
  • Support for daily life in Japan:
    • Registering at the city office and opening a bank account.

 

Trial period: 

  • Three months

 

Remote work policy:

  • While remote work is allowed, due to the nature of cluster management tasks, we expect the successful candidate to live in close proximity to our office location.
  • Initial Phase:
    • On-site work at our office location.
  • Following Initial Assessment:
    • Flexible work arrangements possible, with regular on-site presence required.

If you're passionate about building and managing high-performance computing environments and want to play a crucial role in shaping the technological future of a growing company, we'd love to hear from you!

Rokken is a software development company that provides customized solutions to meet its clients' needs.

Their expertise primarily focuses on designing innovative applications in the fields of machine learning and real-time 3D visualization, combined with experience in the medical sector.

View Rokken's company page

↑ Back to top ↑

Infrastructure Engineer at Rokken
APPLY NOW  βžœ