Projects

A selection of my projects

Journey of Water @ Walt Disney World

Interactive water attraction inspired by Moana

Journey of Water @ Walt Disney World

Conducted computer vision, deep learning, and controls R&D for this attraction. So happy to see it bringing joy to so many people!

CLVR Jaco Play Dataset

Dataset of diverse robot trajectories from CLVR @ USC

CLVR Jaco Play Dataset

Diverse multimodal dataset of 1085 teleoperated robot episodes. Includes multi-view camera observations, cartesian position and velocity data, and natural language annotations. Tasks include picking and placing of various items (of various categories) and receptacles.

Improving Language Understanding via Privileged Multimodal Training

Final Project for CSCI 535 (Multimodal and Probabilistic Learning for Human Communications)

Improving Language Understanding via Privileged Multimodal Training

Novel method to improve the performance of downstream unimodal language understanding tasks by leveraging multimodal data during training; employing optimization towards the task-specific objective while regularized against a multimodal teacher network in latent space.

LCMVAE: Language-conditioned Masked Variational Autoencoder

Final Project for CSCI 566 (Deep Learning Applications)

LCMVAE: Language-conditioned Masked Variational Autoencoder

A simple solution for multimodal representation learning, which takes advantage of modality-specific pre-trained encoders and a flexible, lightweight, latent-mixing network to effectively generate meaningful multimodal representations.

A deep learning approach to the screening of malaria infection: Automated and rapid cell counting, object detection and instance segmentation using Mask R-CNN

Published in Computerized Medical Imaging and Graphics

A deep learning approach to the screening of malaria infection: Automated and rapid cell counting, object detection and instance segmentation using Mask R-CNN

Application of a deep-learning model, Mask R-CNN, trained to identify healthy and Plasmodium-infected red blood cells. Capable of generating segmentation masks on top of bounding box classifications for immediate visualization and stage-specific identification. Potential to reduce errors common in manual counting through subsequent standardization.

Reward-Induced Representation Learning

CLVR implementation challenge

Reward-Induced Representation Learning

Pre-trained predictive models on various tasks to facilitate RL training on a novel downstream task of the same distribution. The resulting reward-induced representation proved useful in both speeding up downstream PPO training, and improving final rollout performance.

Lilypod

Undergraduate Mechatronics Capstone Project

Lilypod

AI-powered water monitoring system. Measures parameters of its surrounding environment, collects samples of the pollutants, and performs data analytics in the cloud to produce valuable information.

NaviGAIT

Vision for the blind, through positional sound

NaviGAIT

Final project for 30.007 Engineering Design Innovation course at SUTD. An ultrasonic-based wearable for the blind, using 3D positional audio feedback to assist in obstacle avoidance, and IMU-based fall detection and notification functionality over Twilio API. Won Best Demonstration Award at public project exhibition; awarded by industry guests.

Automatic Watermark Removal

Final Project for CS 484 (Computational Vision)

Automatic Watermark Removal

Given a set of images with the same watermark, our method segments watermark boundaries (employing graph cut) and subsequently estimates the watermark's parameters (color + alpha-blending constant). It then removes the watermark using an inverse watermarking transform.

Portrait AutoColorizer

Final Project for SYDE 522 (Machine Intelligence)

Portrait AutoColorizer

This one started because I wanted to colorize my grandparents' old portraits. Our AutoColorizer fine-tunes a VGG16 network pretrained on ImageNet. It is then trained on a dataset of 200,000 celebrity portraits. We formulate the colorization process as the prediction of the AB channels given the L channel of an image in the LAB color space.