[email protected] / Blog / GitHub / LinkedIn / hyojun.me


Interests


Work Experience

Software Engineer, Tech Leader @ NAVER Cloud

Jan 2023 – Current

(Since January 2023, I have been with NAVER Cloud following my team's transition there.)

Designed and developed a Kubernetes-based in-house ML training platform and led the launch of a public cloud product. Contributed extensively to platform architecture, API design and implementation, Kubernetes, and infrastructure. The ML training platform supports HPC infrastructure for large-scale models, maximizes GPU utilization, and efficiently supplies GPU resources where needed. Additionally, it enables researchers to easily perform distributed learning.

Supported automation tasks to enhance the efficiency of the HyperCLOVA (LLM) model training process and resolved infrastructure, OS, and software issues. Additionally, designed and implemented the EventBus system to integrate various systems in the ML pipeline.

Carried out various SRE-related tasks, such as introducing GitOps through Helm packaging and ArgoCD, transitioning from Prometheus to the VictoriaMetrics-based distributed monitoring system, and promoting SRE culture.

Enhanced the team's technical capabilities through activities such as code reviews, design feedback, mentoring, and multiple technical sharing sessions. Launched “Kubernetes deep dive” training course for all developers in the company.

Software Engineer @ NAVER

Jun 2021 – Dec 2022 (1 year 7 months)