Tokyo

Machine Learning Engineer

Tokyo
Remote OK - Anywhere in Japan
Full-time
June 9, 2023

Treasure Data builds a Programmable Platform to efficiently enable and scale Customer-centric Data Platform applications across a range of verticals from automotive, to CPG, and even finance.

The Machine Learning team is a relatively new team whose initial goal is to productize an Automated Machine Learning (AutoML) platform. This team works closely with the Backend Workflow team within the Core Services group in order to deliver ML pipelines as a service for all customers. Additional ML solutions already in our roadmap include Text Analysis, Causal Analysis/Discovery, Explainable AI (XAI), Uplift modeling, and Exploratory Data Analysis (EDA).

Productizing our ML products as a scalable cloud service requires diverse knowledge and experience not only in data science but also in software engineering. Areas such as container platform Java APIs (AWS ECS), AWS cloud API programming, Python ML libraries, SQL query processing for large data, Pandas dataframe processing for feature engineering, and AWS infrastructure management using Terraform are all topics you will be directly involved with over the course of your work in the ML team at Treasure Data.

This position is ideal for those with not only Data Science and Machine Learning skills, but also conventional cloud engineering skills for developing, deploying, and operating these critical ML products. This ML engineer position will expect candidates to understand machine learning algorithms, have experience in data analysis, and desire to grow as a software engineer in cloud computing environments.

Success in this role requires a passion for developing and productizing ML products with strong interests and knowledge in data science along with strong experience in software engineering.

You do this by collaborating with others to achieve our shared goals together in a self-organized team; pursuing autonomy with ownership, while increasing trust and sustainability to evolve continuously together. You would be able to effectively communicate ideas, software system designs, implementations, and decisions in a clear and concise manner to make others understand.

Required Experience

A BS in Computer Science or a related field, or equivalent work experience.
3+ years of experience with software development
Hands-on experience on building and maintaining machine learning products or services
Strong Python coding skills along with other typed programming language experiences such as Java or C++, meaning that not only scripting skills.
Strong data science knowledge, including state-of-the-art ML models, libraries, and techniques.
Experience with SQL query processing and Pandas Dataframe API programming
Industry experience using public cloud IaaS providers like AWS.
Quickly catch up with new technologies or company standards.
Understand software development life cycle such as mock, CI (circleci), unit testing, and GitHub actions.

Preferred Experience

Experience with Automl frameworks such as AutoGluon, H2O AutoML, PyCaret, FLAML, and so on.
Experience in Explainable AI (XAI) such as SHAP and LIME.
Experience working with big data technologies such as Hadoop, Hive, Presto, Spark, BigQuery, and Redshift.
(Awarded) experience in data science competitions such as Kaggle.
OSS contribution experiences.
Strong EDA skills using Pandas and Notebooks for feature engineering and so on.
Experience with Container runtimes such as AWS ECS/EKS and Kubernetes.
Familiar with Infrastructure as Code using Terraform, or CloudFormation.
Demonstrated ability working collaboratively in cross-functional teams and a strong track record for delivery as part of a team.
Familiar with security best practices including knowledge about Security Groups, IAM, networks.
Experience with distributed teams across different time zones.

Your Duties Will include

Work with product managers and engineering colleges to define and deliver new ML products.
Continuously learning new ML algorithms or techniques.
Work with distributed development teams to operate ML as a Service by participating in on-call rotations.
Proactively and continuously improving existing systems and processes together with team members.

More About Treasure Data and Core Services

We design, build, and operate a distributed and dynamically programmable orchestration system that controls everything from SQL queries against our multi-tenant data lake to customer-specified code (Python and more) in serverless environments. Fronted by Ruby on Rails APIs, backed by priority queues and process supervisors, this layer is responsible for managing all customer data operations.

To power these operations, we self-host and operate distributed SQL engines (Trino, Hive) similarly in a multitenant environment to process both customer- and machine-generated queries. We self-host these engines in order to uniquely and deeply integrate data governance features for everything from basic access control through sophisticated PII and GDPR requirements.

The data lake at the foundation of all of this is built with first-class governance facilities, and adaptively schedules and performs continuous optimization of all data in its care. It is fed by streaming and microbatch ingestion layers (100k+/sec event counts), that also provide in-stream custom processing specified in a sandboxed environment. Constructed from a dynamically-typed (schema-on-read) block store, we have unique indexing and optimization challenges to solve.

Who We Are:

Treasure Data employees are enthusiastic, data-driven and customer-obsessed. Our actions reflect our values of honesty, reliability, openness and humility. Treasure Data moved to remote-based work in March 2020 and is committed to ensuring it remains agile to accommodate shifting preferences of its workforce. While we are not working shoulder-to-shoulder, we still work side-by-side, finding unique ways to connect and create together while also respecting each other’s life priorities outside of work. We offer competitive salary and benefits and named one of the 2021 Best Places to Work. Treasure Data is an equal opportunity employer dedicated to building an inclusive and diverse workforce. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

What We Do:

Treasure Data is the only enterprise Customer Data Platform (CDP) that harmonizes an organization’s data, insights, and engagement technology stacks to drive relevant, real-time customer experiences throughout the entire customer journey. Treasure Data helps brands give millions of customers and prospects the feeling that each is the one and only. With its ability to create true, unified views of each individual, Treasure Data CDP is central for enterprises who want to know who is ready to buy, plus when and how to drive them to convert. Flexible, tech-agnostic and infinitely scalable, Treasure Data provides fast time to value even in the most complex environments.

APPLY NOW ➜

About Treasure Data

Treasure Data is a best-of-breed enterprise customer data platform (CDP) that powers the entire business to shape customer-centricity in the age of the digital customer. We do this by connecting all data into one smart customer data platform, uniting teams and systems to power purposeful engagements that drive value and protect privacy for every customer, every time. Trusted by leading companies around the world, Treasure Data customers span the Fortune 500 and Global 2000 enterprises.