Education

Master of Science in Robotics

Aug 2020 - May 2022
Carnegie Mellon University

Advisor: Prof. Jeff Schneider; Full RA-ship funded; GPA: 3.8/4.0; Research topics: DL & RL.

Bachelor of Science in Computer Sciences

Sep 2017 - May 2020
University of Wisconsin-Madison

On Dean’s List; GPA: 3.96/4.0; Graduated with Distinction.

Experiences

Research Engineer, Deep Perception

June 2022 - Present
TuSimple Inc., San Diego, California
  • Upgraded monocular 3D objection detection module (for vehicles and vulnerable road users) on the autonomous driving truck.
  • Devised a novel attention mechanism that elevates the yaw prediction accuracy of current monocular 3D pedestrian detection module by 20% while reducing the model size by 50% and maintaining the same inference time.
  • Designed and built a BEV 3D objection detection and segmentation model with multi-modal input sources, including multi-cam raw images, perspective view image segmentation results, lidar occlusion and occupancy grids and lidar segmentation results.
  • Worked on Transformer-based motion prediction project. Devised a novel multi-scale feature representation for map context and LiDAR voxels. Designed a velocity-based local attention which helps agents better understand their surrounding environment.

Machine Learning Engineer Intern, Deep Perception

May 2021 - Aug 2021
TuSimple Inc., San Diego, California
  • Initiated the Emergency Vehicle (EV) Detection project. Built the general codebase and baseline models.
  • Tackled siren recognition task. Designing pipelines for sound event detection and sound source localization.
  • Revealed the flaws of previous state-of-the-art models by Grad-CAM visualization and heatmap analysis.
  • With performance-boosting add-on modules, the finalized model achieved 99% detection accuracy and 10% gain in terms of localization precision & recall in comparison with other state-of-the-art methods.

Computer Vision Research Intern

Sep 2018 - June 2019
SenseTime Group Ltd., Shenzhen & Hong Kong
  • Focused on a broad range of research topics, such as DL acceleration, data-denoising, self-/weakly- supervised learning, autoML, image restoration and DNN black-box visualization & interpretation.
  • Improved the company’s PyTorch-based classification & detection frameworks by adding new features as well as inspecting and refactoring existing code base.
  • Explored the limits of large-scale weakly-supervised training of WebVision dataset which consists of 16 million samples, achieving the 2nd place of the WebVision 2019 Challenge at CVPR ‘19 hosted by the Computer Vision Laboratory, ETH Zurich.
  • Made two paper submissions as the 1st author, one for ACM Multimedia involving image segmentation & super-resolution, and another for AAAI related to NAS, detection & GCN.

Selected Projects

CARLA Self-driving via Reinforcement Learning
Auton Lab, The Robotics Institute, CMU

A research project sponsored by Argo AI and advised by Prof. Jeff Schneider.

  • Designing asynchronous multi-agent RL algorithms (distributed PPO & SAC) for autonomous driving on CARLA. % - Reducing the training time from several weeks to 40 hours while improving the agent performance by 10%.
  • Boosted the RL training speed by 40 times while improving the agent performance by 10%.
  • Achieving 80.64 total driving score on CARLA Leaderboard evaluation routes assuming having access to a perfect detection model.
Apple E-Waste Recycling
Biorobotics Lab, The Robotics Institute, CMU

A research & engineering project sponsored by Apple Inc. Work in progress.

  • Designed the phone recognition pipeline and generated photorealistic phone CAD synthetic dataset.
  • Improved the phone classification accuracy by 6% (top-1 acc. boosted from ~90% to 96%).
Learning-based Image Synthesis
16-726, The Robotics Institute, CMU

Published websites containing 5 assignments and 1 final project, which are

  • Colorizing the Prokudin-Gorskii Photo Collection;
  • Gradient Domain Fusion;
  • When Cats meet GANs;
  • Neural Style Transfer;
  • GAN Photo Editing (best project finalist);
  • Image Generation via Independent Semantic Synthesis.
Learning for 3D Vision
16-889, The Robotics Institute, CMU

This course covers topics including explicit, implicit, neural 3D representations, differentiable rendering, neural rendering, mesh and point cloud processing, radiance Fields, multi-plane images, implicit surfaces, etc. Will publish websites containing 5 assignments and 1 final project, currently have done 2 of them, which are

  • Rendering Basics with PyTorch3D;
  • Single View to 3D;
  • Volume Rendering and Neural Radiance Fields;
  • Neural Surfaces;
  • Point Cloud Classification and Segmentation.
Parallelized Password Cracker via CUDA
CS759, College of Engineering, UW-Madison

A CUDA application that can compromise both simple passwords & hash-encoding passwords (e.g. MD5).

Human Pose Estimation via Synthetic Data
Department of Computer Science, UW-Madison

A research project advised by Prof. Yin Li.

  • Improved 2D & 3D human pose estimation in real/synthetic video using SMPL models.
  • Modeled real-world noise, variation & occlusion for better 3D joint predictions.
Large-scale YouTube Video Semantic Analysis
Department of Life Sciences Communication, UW-Madison

A research project funded by Prof. Kaiping Chen

  • Large-scale data mining & applied deep learning to understand scientific videos on YouTube.
  • This work has been accepted by a top-tier conference in communication & media area.
pytorl: PyTorch Toolbox for Reinforcement Learning
Personal Repository on GitHub

Implemented 4 DQN(and its variants) algorithms & a distributed DQN learning algorithm similar to Google Gorila via parameter server architecture.

Publications

  • Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning
  • Adam Villaflor, Zhe Huang, Swapnil Pande, John Dolan, Jeff Schneider
    Proceedings of the 39th International Conference on Machine Learning (ICML 2022)
  • Distributed Reinforcement Learning for Autonomous Driving
  • Zhe Huang
    Tech. Report, CMU-RI-TR-22-09, The Robotics Institute, CMU (Master's Thesis)
  • Gradual Network for Single Image De-raining
  • Zhe Huang*, Weijiang Yu*, Wayne Zhang, Litong Feng, Nong Xiao
    In Proceedings of the 27th ACM International Conference on Multimedia (MM 2019) [Oral]