Education

Master of Science in Robotics

Aug 2020 - May 2022
Carnegie Mellon University

Advisor: Prof. Jeff Schneider; Full RA-ship funded; GPA: 3.8/4.0; Research topics: DL & RL.

Bachelor of Science in Computer Sciences

Sep 2017 - May 2020
University of Wisconsin-Madison

On Dean’s List; GPA: 3.96/4.0; Graduated with Distinction.

Experiences

Software Engineer, LLM/VLM

Oct 2024 - Present
Waymo, Mountain View, California
  • Perception scene understanding.

Engineer, Deep Perception

Sep 2023 - Oct 2024
Bot Auto, Houston, Texas
  • Designed and implemented a single-model perception foundation model for end-to-end video 2D & 3D Detection, LiDAR semantic segmentation, camera-LiDAR fusion, traffic light & vehicle light detection, tracking & prediction.
  • For efficiency reasons, the model does not employ Bird’s-Eye-View (BEV) feature representation. The model is fully sparse and can detection objects in ±400m.
  • Conducted self-supervised pre-training of unified multi-modal Transformer backbone using MAE and Voxel-MAE, respectively. Adopted mixed precision training for numerical stability when handling 12-bit raw image. Multi-stage fine-tuning is also necessary to orchestrate multi-task learning.
  • Worked collaborately with experienced team members to implement custom TensorRT plugins and optimize the performance of TensorRT engines.

Research Engineer, Deep Perception

June 2022 - Sep 2023
TuSimple Inc., San Diego, California
  • Upgraded monocular 3D objection detection module (for vehicles and vulnerable road users) on the autonomous driving truck.
  • Devised a novel attention mechanism that elevates the yaw prediction accuracy of current monocular 3D pedestrian detection module by 20% while reducing the model size by 50% and maintaining the same inference time.
  • Designed and built a BEV 3D objection detection and segmentation model with multi-modal input sources, including multi-cam raw images, perspective view image segmentation results, lidar occlusion and occupancy grids and lidar segmentation results.
  • Worked on the Transformer-based motion prediction project. Devised a novel multi-scale feature representation for map context and LiDAR voxels. Designed a velocity-based local attention which helps agents better understand their surrounding environment. On Waymo Open Dataset leaderboard, our model, dubbed MGTR, has achieved 1st place in terms of the overall prediction mAP and soft mAP.

Machine Learning Engineer Intern, Deep Perception

May 2021 - Aug 2021
TuSimple Inc., San Diego, California
  • Initiated the Emergency Vehicle (EV) Detection project. Built the general codebase and baseline models.
  • Tackled siren recognition task. Designing pipelines for sound event detection and sound source localization.
  • Revealed the flaws of previous state-of-the-art models by Grad-CAM visualization and heatmap analysis.
  • With performance-boosting add-on modules, the finalized model achieved 99% detection accuracy and 10% gain in terms of localization precision & recall in comparison with other state-of-the-art methods.

Computer Vision Research Intern

Sep 2018 - June 2019
SenseTime Group Ltd., Shenzhen & Hong Kong
  • Focused on a broad range of research topics, such as DL acceleration, data-denoising, self-/weakly- supervised learning, autoML, image restoration and DNN black-box visualization & interpretation.
  • Improved the company’s PyTorch-based classification & detection frameworks by adding new features as well as inspecting and refactoring existing code base.
  • Explored the limits of large-scale weakly-supervised training of WebVision dataset which consists of 16 million samples, achieving 2nd place of the WebVision 2019 Challenge at CVPR ‘19 hosted by the Computer Vision Laboratory, ETH Zurich.
  • Made two paper submissions as the 1st author, one for ACM Multimedia involving image segmentation & super-resolution, and another for AAAI related to NAS, detection & GCN.

Selected Projects

CARLA Self-driving via Reinforcement Learning
Auton Lab, The Robotics Institute, CMU

A research project sponsored by Argo AI and advised by Prof. Jeff Schneider.

  • Designing asynchronous multi-agent RL algorithms (distributed PPO & SAC) for autonomous driving on CARLA. % - Reducing the training time from several weeks to 40 hours while improving the agent performance by 10%.
  • Boosted the RL training speed by 40 times while improving the agent performance by 10%.
  • Achieving 80.64 total driving score on CARLA Leaderboard evaluation routes assuming having access to a perfect detection model.
Apple E-Waste Recycling
Biorobotics Lab, The Robotics Institute, CMU

A research & engineering project sponsored by Apple Inc. Work in progress.

  • Designed the phone recognition pipeline and generated photorealistic phone CAD synthetic dataset.
  • Improved the phone classification accuracy by 6% (top-1 acc. boosted from ~90% to 96%).
Learning-based Image Synthesis
16-726, The Robotics Institute, CMU

Published websites containing 5 assignments and 1 final project, which are

  • Colorizing the Prokudin-Gorskii Photo Collection;
  • Gradient Domain Fusion;
  • When Cats meet GANs;
  • Neural Style Transfer;
  • GAN Photo Editing (best project finalist);
  • Image Generation via Independent Semantic Synthesis.
Learning for 3D Vision
16-889, The Robotics Institute, CMU

This course covers topics including explicit, implicit, neural 3D representations, differentiable rendering, neural rendering, mesh and point cloud processing, radiance Fields, multi-plane images, implicit surfaces, etc. Will publish websites containing 5 assignments and 1 final project, currently have done 2 of them, which are

  • Rendering Basics with PyTorch3D;
  • Single View to 3D;
  • Volume Rendering and Neural Radiance Fields;
  • Neural Surfaces;
  • Point Cloud Classification and Segmentation.
Parallelized Password Cracker via CUDA
CS759, College of Engineering, UW-Madison

A CUDA application that can compromise both simple passwords & hash-encoding passwords (e.g. MD5).

Human Pose Estimation via Synthetic Data
Department of Computer Science, UW-Madison

A research project advised by Prof. Yin Li.

  • Improved 2D & 3D human pose estimation in real/synthetic video using SMPL models.
  • Modeled real-world noise, variation & occlusion for better 3D joint predictions.
Large-scale YouTube Video Semantic Analysis
Department of Life Sciences Communication, UW-Madison

A research project funded by Prof. Kaiping Chen

  • Large-scale data mining & applied deep learning to understand scientific videos on YouTube.
  • This work has been accepted by a top-tier conference in communication & media area.
pytorl: PyTorch Toolbox for Reinforcement Learning
Personal Repository on GitHub

Implemented 4 DQN(and its variants) algorithms & a distributed DQN learning algorithm similar to Google Gorila via parameter server architecture.

Publications

  • Multi-Granular Transformer for Motion Prediction with LiDAR
  • Yiqian Gan, Hao Xiao, Yizhe Zhao, Ethan Zhang, Zhe Huang, Xin Ye, Lingting Ge
    In Proceedings of 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)
  • Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
  • Adam Villaflor, Zhe Huang, Swapnil Pande, John Dolan, Jeff Schneider
    In Proceedings of the 39th International Conference on Machine Learning (ICML 2022)
  • Distributed Reinforcement Learning for Autonomous Driving
  • Zhe Huang
    Tech. Report, CMU-RI-TR-22-09, The Robotics Institute, CMU (Master's Thesis)
  • Gradual Network for Single Image De-raining
  • Zhe Huang*, Weijiang Yu*, Wayne Zhang, Litong Feng, Nong Xiao
    In Proceedings of the 27th ACM International Conference on Multimedia (MM 2019) [Oral]