NEWSLETTER

LINE

Tokyo

APPLY NOW ➜

Software Engineer, Data Platform

Tokyo
Remote OK
Full-time
January 18, 2021

Job Description

The Data Platform Team provides platforms that support data analytic activities for all services offered on the LINE app.
We are currently looking for software engineers in the following three Parts: Ingestion Part (collects and aggregates data), Platform Dev Part (develops applications related to data analysis), and Hadoop Part (develops proprietary Hadoop clusters).

Our Mission

Ingestion Part

Develop next-gen data processing and pipeline for the LINE messaging platform
Design and build a distributed data processing system, such as Kafka, Elasticsearch, Hadoop, etc.
Build a large-scale data pipeline for real-time data analysis
Design and build an automated and reliable system in order to provide highly available services

Platform Dev Part

Work closely with other teams (e.g. Ingestion Part and Hadoop Part) to develop tools for distributed systems in order to support the tasks of internal planners and data scientists
Develop API servers to connect systems
Revamp and operate the above tools and services

Hadoop Part

Develop a proprietary Hadoop distribution and its ecosystem in order to operate Vanilla Hadoop clusters consisting of a few thousand servers in a stable and effective manner
Develop a security system to safely maintain confidential data
Automate operation and build monitoring systems to further refine the platform, instead of using Hadoop out-of-box

About the Team

The Data Platform Team is responsible to collect data from LINE services, format data for analysis, and provide formatted data as a tool on its distributed system. Members of this team are expected to communicate and work closely with various departments, including service developers, Hadoop engineers, and data scientists. This team also communicates frequently with colleagues in both domestic and overseas offices. The team consists of members with different nationalities where communication is mainly in Japanese with occasional use of English. As with other departments, English is used in documentations and chat conversations. This team develops a wide range of tools for distributed systems (e.g. Spark and Presto) as well as more traditional tools for batch processing, admin screens, etc. Refer to the following for specific applications and peripheral systems developed by the Data Platform Team.

Announced at LINE Developer Day 2018 (November 2018)

Presented at LINE Developer Meetup in Tokyo #27 (January 2018)

Responsibilities

Ingestion Part

Design, develop, and operate the data pipeline platform
Design and develop a platform (e.g. Fluentd and Filebeat) to collect a large amount of data from servers and clients on the LINE platform
Design and build large-scale ETL platforms (e.g. Kafka and Flink) to take collected logs and format, process, and save in storage, such as Hadoop and Elasticsearch
Maintain the above platforms and provide user support
Design and develop a system to provide the above log collection platforms and ETL platforms as internal SaaS

Platform Dev Part

Work with planners to design, develop, and operate tools that fulfill internal requirements
Provide user support
Work with other teams to conduct troubleshooting when necessary

Hadoop Part

Develop tools to operate and monitor clusters
Develop LINE's proprietary Hadoop ecosystem
Fix bugs on Hadoop and each software on the Hadoop ecosystem

Current Product Phase/Exciting Challenges/Opportunities

The analytic platform offered by the Data Platform Team has completed the prototype phase and is currently being used by internal employees. The current focus is to develop new features, ensure scalability, enhance reliability, and automate operations in order to respond to various internal demands. Developing new features is not limited to simply adding features—this team, if needed, also takes up large-scale development to rebuild a feature from scratch. This means that you will have the opportunity to work with our large-scale data system from top to bottom.

Tools/Development Environments

Streaming systems - Kafka/Flink
Data collection tools - Fluentd/Filebeat/Logstash/Elasticsearch
Hadoop ecosystems - HDFS/YARN/Hive/Presto/Spark/HBase
Operating/monitoring tools - Kubernetes/Ansible/Grafana/Prometheus +
Promgen/imon - internal monitoring tool
BI - Tableau
Development environments - IntelliJ/Eclipse/Github/Jenkins
Programming languages - Java/Kotlin/Scala/Python

Qualifications

BS/MS degree or higher in Computer Science or Informatics (or equivalent work experience)
Strong fundamental knowledge of computer science, such as data structure, algorithm design, and computational analysis
At least 3 years of hands-on software development experience with Java
Experience with concurrent/multi-threaded programming
Experience with development and system operation in Linux/Unix environments
Ability to set up machines using provisioning tools, such as Ansible
Ability to set up monitoring

Preferred Qualifications

Technical knowledge and competency to analyze, debug, and optimize large-data middleware, such as Kafka and Hadoop
Ability to design, analyze and solve problems related to large-scale systems or distributed systems
Experience in designing data pipeline platform
Understand the semantics of distributed data pipeline (e.g. 'at-least-once' and 'exactly-once') and have the technical capability to build systems
Proficient at data analytic engines, including Elasticsearch, Hadoop, Spark, and Presto
Proficient in data collecting tools, including Fluentd, Embulk, and Filebeat
Experience developing or operating frameworks and platforms
Experience building a system leveraging container-related technologies, such as Kubernetes
Knowledge of relational database engines
Experience with provisioning mission critical, 24/7 systems
Experience with troubleshooting/tuning JVM GC
Experience with Maven, Gradle, and Git

Ideal Candidate

Technical knowledge and competency to analyze, debug, and optimize large-data middleware, such as Kafka and Hadoop
Ability to design, analyze and solve problems related to large-scale systems or distributed systems
Experience in designing data pipeline platform
Understand the semantics of distributed data pipeline (e.g. 'at-least-once' and 'exactly-once') and have the technical capability to build systems
Proficient at data analytic engines, including Elasticsearch, Hadoop, Spark, and Presto
Proficient in data collecting tools, including Fluentd, Embulk, and Filebeat
Experience developing or operating frameworks and platforms
Experience building a system leveraging container-related technologies, such as Kubernetes
Knowledge of relational database engines
Experience with provisioning mission critical, 24/7 systems
Experience with troubleshooting/tuning JVM GC
Experience with Maven, Gradle, and Git

Compensations

Annual salary system (To be determined based on skills, experiences and abilities after discussions)

Annual compensation will be divided into 12 months and paid on a monthly basis.
Separate incentives available (*1)
Compensation revision: twice a year
Allowances: commuting allowance, LINE Pay Card Benefit Plan (*2)
(*1) In addition to your annual compensation, you may receive incentives (twice a year) depending on the company's and individual performance and evaluation on your performance. (Incentives are not guaranteed to be provided. An incentive payment will only be paid if you remain employed as of the payment date.
(*2) This is an allowance separate from the salary meant for employees to use for their health, personal development, support for raising the next generation, and more.

APPLY NOW ➜

About LINE

LINE is the most popular communication app in Japan, Thailand and Taiwan. The diverse platform is continuing to grow rapidly throughout Asia, expanding services around the globe.

Under its corporate mission of "Closing the Distance," we strive to bring people around the world closer to each other, to information, and to services.

Our vision is to become the “life infrastructure” for its users, always ready to fulfill their needs, 24 hours a day, 365 days a year.

With mobile-focused projects in a wide variety of areas including communication, contents, entertainment and others, LINE is expanding into projects related to development, operations, advertising, Fintech, AI, Blockchain and more.

GLOBAL SERVICE

LINE is a global development team with development centers in more than seven countries. Based on the messenger service, we develop and operate various services such as fintech, news, games, and music services.

LARGE SCALE

LINE is a service with more than 185 million MAUs and handles traffic for a variety of services based on this user in real time.

WORK WITH BEST TALENTS

LINE has the ability of the best engineers in each field to develop the best services. It creates individual growth in a global engineering culture.