Personal Information


Summary


AI Engineer and Platform Architect with 10+ years of experience building production systems.

  • Currently focused on LLM-powered agents, voice AI, and developer tooling
  • Kubeflow community member with contributions to auth, security (CVE fixes), and platform reliability
  • Built real-time voice agents in Rust, AI coding agents in Go, and TTS integrations
  • Deep infrastructure background: Kubernetes, MLOps platforms serving 100+ teams
  • Lifelong learner and educator — OS kernel video series on BiliBili

Experience


SAP — AI Platform Engineer (Kubeflow, MLOps) 2019.12 - Present

Keywords: Open Source, MLOps, AI Platform, Kubernetes

  • Architected and maintained a multi-tenant ML training platform serving 100+ internal teams, built on Kubeflow, Kubernetes, and Kueue.
  • Designed the platform’s OAuth2 authentication layer and contributed the solution back to the Kubeflow community (kubeflow/manifests PRs #2884, #2862, #2656).
  • Built Infrastructure as Code pipelines using Helm and ArgoCD for reproducible deployments.
  • Designed and implemented a Kubernetes-native queueing system (Kueue + PyTorchJob) for GPU resource scheduling.
  • Contributed security fixes to Kubeflow Pipelines: SSRF prevention (CVE-2023-6570, PR #13126), SQL injection fix (PR #13127), and gRPC configuration improvements (PR #12438).
  • Integrated kfp-sdk with Jenkins CI/CD, enabling data scientists to deploy ML pipelines via pull request.

Keywords: Cloud Architecture, CI/CD, Docker

  • Led the company-wide containerization initiative — dockerized all production services and redesigned the service architecture for cloud-native deployment.
  • Rebuilt CI/CD pipeline with Jenkins (modeled after the Moby project workflow), cutting release cycles and improving deployment reliability.
  • Migrated infrastructure provisioning to Terraform, replacing manual AWS operations with reproducible Infrastructure as Code.
  • Rewrote Ansible playbooks for provisioning and deployment, standardizing the team’s automation workflow.

Strikingly — Platform Engineer 2018.03 - 2019.06

Keywords: Public Cloud, Infrastructure as Code, Monitoring

  • Managed full AWS infrastructure with Terraform (VPC, NAT gateway, auto-scaling groups, security groups) and maintained GitLab CI pipelines for automated deployments.
  • Built end-to-end monitoring and observability stack using Prometheus, Grafana, and Elasticsearch.
  • Developed serverless data pipelines with AWS Lambda (Python/Go), including blockchain data scraping into Elasticsearch.
  • Maintained production Kubernetes cluster on Tencent Cloud for a gaming workload.

eHi Car Rental — Systems Engineer 2015.07 - 2018.03

Keywords: Private Cloud, Log Analytics, Monitoring

  • Designed and deployed a centralized Elastic Stack (ELK) for log aggregation across 500+ Windows and Linux instances — replacing manual log analysis.
  • Built Python automation for daily data extraction and real-time alerting with ElastAlert; reduced incident detection and resolution time from 1 hour to 5 minutes.
  • Maintained Zabbix, Cacti, and Piwik for infrastructure and application monitoring.
  • Contributed ElastAlert Chinese documentation to the open source community.

Power Dekor — IT Engineer 2012.02 - 2015.06

  • Designed and built corporate network infrastructure supporting 100+ clients.

Open Source & Projects


AI & Agent Projects

  • Car Agent (Private) — Real-time voice-interaction AI agent for in-car scenarios. Built with Rust (core agent + relay server), iOS (SwiftUI client), integrated with LLM providers, TTS (Kokoro), and STT (FunASR streaming). Features WebSocket-based relay, tool orchestration, and terminal streaming viewer.
  • Cos — AI Coding Agent implemented in Go with Bubbletea TUI. Full-featured terminal-based coding assistant.
  • Kokoros — Kokoro TTS model ported to Rust for real-time, high-quality text-to-speech inference with low-latency optimization.
  • OptiTranslate — macOS menu-bar AI translator (Swift). Opt+Space to translate selected text, saves results to Markdown.
  • FingerSaver (Private) — Multi-agent terminal manager. Split-pane TUI for orchestrating multiple coding agents simultaneously.
  • SIN (Internal, SAP) — Agentic CLI for infrastructure operations, built on Backstage as the backend platform. Enables natural-language driven infrastructure management across Kubeflow and Kubernetes environments.

Kubeflow Community Contributions

CS Education

  • CS Videos — Operating systems and computer science video series on BiliBili. Topics include kernel internals, memory management, API Gateway design, eBPF, HTTPS internals, and universal hashing. Implemented with Manim (Python).
  • Binary Bomb Lab Guide — Walkthrough for CSAPP’s binary bomb exercise, with companion video series.

Technical Skills


  • Languages: Rust, Go, Python, Swift, Groovy, Shell
  • AI/ML: LLM Integration, Agent Orchestration, Prompt Engineering, PyTorch, Kubeflow Pipelines, TTS/STT Systems
  • Platform: Kubernetes, Docker, Helm, ArgoCD, Terraform, Ansible, Jenkins
  • Cloud: AWS, Alibaba Cloud, Tencent Cloud
  • Observability: Prometheus, Grafana, Elasticsearch, Kibana, ElastAlert
  • Languages: English (fluent), 中文 (native)

Education


  • 2012 — Computer Application and Technology (2009 - 2012)