Education
Advisor: Prof. Jeff Schneider; Full RA-ship funded; GPA: 3.8/4.0; Research topics: DL & RL.
On Dean’s List; GPA: 3.96/4.0; Graduated with Distinction.
Experiences
- Perception scene understanding.
- Designed and implemented a single-model perception foundation model for end-to-end video 2D & 3D Detection, LiDAR semantic segmentation, camera-LiDAR fusion, traffic light & vehicle light detection, tracking & prediction.
- For efficiency reasons, the model does not employ Bird’s-Eye-View (BEV) feature representation. The model is fully sparse and can detection objects in ±400m.
- Conducted self-supervised pre-training of unified multi-modal Transformer backbone using MAE and Voxel-MAE, respectively. Adopted mixed precision training for numerical stability when handling 12-bit raw image. Multi-stage fine-tuning is also necessary to orchestrate multi-task learning.
- Worked collaborately with experienced team members to implement custom TensorRT plugins and optimize the performance of TensorRT engines.
- Upgraded monocular 3D objection detection module (for vehicles and vulnerable road users) on the autonomous driving truck.
- Devised a novel attention mechanism that elevates the yaw prediction accuracy of current monocular 3D pedestrian detection module by 20% while reducing the model size by 50% and maintaining the same inference time.
- Designed and built a BEV 3D objection detection and segmentation model with multi-modal input sources, including multi-cam raw images, perspective view image segmentation results, lidar occlusion and occupancy grids and lidar segmentation results.
- Worked on the Transformer-based motion prediction project. Devised a novel multi-scale feature representation for map context and LiDAR voxels. Designed a velocity-based local attention which helps agents better understand their surrounding environment. On Waymo Open Dataset leaderboard, our model, dubbed MGTR, has achieved 1st place in terms of the overall prediction mAP and soft mAP.
- Initiated the Emergency Vehicle (EV) Detection project. Built the general codebase and baseline models.
- Tackled siren recognition task. Designing pipelines for sound event detection and sound source localization.
- Revealed the flaws of previous state-of-the-art models by Grad-CAM visualization and heatmap analysis.
- With performance-boosting add-on modules, the finalized model achieved 99% detection accuracy and 10% gain in terms of localization precision & recall in comparison with other state-of-the-art methods.
- Focused on a broad range of research topics, such as DL acceleration, data-denoising, self-/weakly- supervised learning, autoML, image restoration and DNN black-box visualization & interpretation.
- Improved the company’s PyTorch-based classification & detection frameworks by adding new features as well as inspecting and refactoring existing code base.
- Explored the limits of large-scale weakly-supervised training of WebVision dataset which consists of 16 million samples, achieving 2nd place of the WebVision 2019 Challenge at CVPR ‘19 hosted by the Computer Vision Laboratory, ETH Zurich.
- Made two paper submissions as the 1st author, one for ACM Multimedia involving image segmentation & super-resolution, and another for AAAI related to NAS, detection & GCN.
Selected Projects
A research project sponsored by Argo AI and advised by Prof. Jeff Schneider.
- Designing asynchronous multi-agent RL algorithms (distributed PPO & SAC) for autonomous driving on CARLA. % - Reducing the training time from several weeks to 40 hours while improving the agent performance by 10%.
- Boosted the RL training speed by 40 times while improving the agent performance by 10%.
- Achieving 80.64 total driving score on CARLA Leaderboard evaluation routes assuming having access to a perfect detection model.
A research & engineering project sponsored by Apple Inc. Work in progress.
- Designed the phone recognition pipeline and generated photorealistic phone CAD synthetic dataset.
- Improved the phone classification accuracy by 6% (top-1 acc. boosted from ~90% to 96%).
Published websites containing 5 assignments and 1 final project, which are
- Colorizing the Prokudin-Gorskii Photo Collection;
- Gradient Domain Fusion;
- When Cats meet GANs;
- Neural Style Transfer;
- GAN Photo Editing (best project finalist);
- Image Generation via Independent Semantic Synthesis.
This course covers topics including explicit, implicit, neural 3D representations, differentiable rendering, neural rendering, mesh and point cloud processing, radiance Fields, multi-plane images, implicit surfaces, etc. Will publish websites containing 5 assignments and 1 final project, currently have done 2 of them, which are
- Rendering Basics with PyTorch3D;
- Single View to 3D;
- Volume Rendering and Neural Radiance Fields;
- Neural Surfaces;
- Point Cloud Classification and Segmentation.
A CUDA application that can compromise both simple passwords & hash-encoding passwords (e.g. MD5).
A research project advised by Prof. Yin Li.
- Improved 2D & 3D human pose estimation in real/synthetic video using SMPL models.
- Modeled real-world noise, variation & occlusion for better 3D joint predictions.
A research project funded by Prof. Kaiping Chen
- Large-scale data mining & applied deep learning to understand scientific videos on YouTube.
- This work has been accepted by a top-tier conference in communication & media area.
Implemented 4 DQN(and its variants) algorithms & a distributed DQN learning algorithm similar to Google Gorila via parameter server architecture.