I'm a final-year Ph.D student in the Computer Science Department at The University of Texas at Austin, fortunately working with Prof. Qixing Huang. My primary research interest lies in large vision language models, large-scale 3D pre-training, 3D unsupervised/self-supervised learning, and point cloud processing.

Before that, I received my B.S. degree from the Department of Computer Science at Peking University in 2019 with First Class Honors.

I'm looking for jobs starting from summer 2024. If you are interested, please drop me an email!

News

  • [2024/07] "ViGoR" is accepted at ECCV 2024!
  • [2024/01] Both "MaskFeat3D" and "MVNet" are accepted at ICLR 2024!
  • [2023/10] I started my internship at NVIDIA Autonomous Vehicle Research Group at Santa Clara, CA.

Work Experience

NVIDIA, 10/2023 - 01/2024.
Research Intern.
AWS AI, 06/2023 - 10/2023.
Improve visual grounding capability in Large Vision Language Models.
Applied Scientist Intern.
Microsoft Research Asia, 05/2022 - 11/2022.
3D point cloud unsupervised learning / pre-training.
Research Intern.
Wormpex AI Research, 06/2021 - 10/2021.
3D point cloud unsupervised learning / pre-training.
Research Intern.
Kuaishou Technology, 06/2020 - 10/2020.
3D point cloud understanding.
Research Intern.

Publications

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li

We aim to improve the visual grounding capability of Large Vision Language Models (LVLMs) by using fine-grained reward modeling.

[paper] [data]

Multi-View Representation is What You Need for Point-Cloud Pre-Training
Siming Yan, Chen Song, Youkang Kong, Qixing Huang.

We introduce a new method for pre-training 3D point clouds by leveraging pre-trained large-scale 2D networks. Additionally, a multi-view consistency loss ensures the 2D projections maintain 3D information by capturing pixel-wise correspondences across views.

[paper]

3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining
Siming Yan, Yuqi Yang, Yuxiao Guo, Hao Pan, Peng-shuai Wang, Xin Tong, Yang Liu, Qixing Huang.

We introduce a novel method for 3D self-supervised pretraining of point clouds using Masked Autoencoders (MAEs). Diverging from traditional 3D MAEs that focus on reconstructing point positions, our proposed approach employs an attention-based decoder, independent of the encoder design, to recover high-order geometric features of the underlying 3D shape.

[paper]

Implicit Autoencoder for Point Cloud Self-supervised Representation Learning
Siming Yan, Zhenpei Yang, Haoxiang Li, Li Guan, Hao Kang, Gang Hua, Qixing Huang.

We introduce a novel method for 3D self-supervised pretraining of point clouds using Masked Autoencoders (MAEs). Diverging from traditional 3D MAEs that focus on reconstructing point positions, our proposed approach employs an attention-based decoder, independent of the encoder design, to recover high-order geometric features of the underlying 3D shape.

[paper] [code]

HPNet: Deep Primitive Segmentation Using Hybrid Representations
Siming Yan, Zhenpei Yang, Chongyang Ma, Haibin Huang, Etienne Vouga, Qixing Huang.

We introduce a new deep-learning model for segmenting 3D shapes represented as point clouds into primitive patches. It stands out by using hybrid feature representations, combining a learned semantic descriptor, two spectral descriptors based on geometric parameters, and an adjacency matrix highlighting sharp edges.

[paper] [code]

Extreme Relative Pose Network under Hybrid Representations
Zhenpei Yang*, Siming Yan*, Qixing Huang. (* indicates equal contribution)

We present a novel approach for estimating relative poses in RGB-D scans, especially effective for small or non-overlapping scans. The method involves scene completion followed by matching completed scans. We use hybrid representations combining 360-degree images, 2D image-based layouts, and planar patches, allowing for adaptable feature representations for relative pose estimation.

[paper] [code] [video]

Calcium Removal from Cardiac CT Images Using Deep Convolutional Neural Network
Siming Yan, Feng Shi, Yuhua Chen, Damini Dey, Sang-Eun Lee, Hyuk-Jae Chang, Debiao Li, Yibin Xie

We introduce a deep learning-based method featuring a multi-step inpainting process to address the issue of coronary calcium causing beam hardening and blooming artifacts in cardiac computed tomography angiography (CTA) images.

[paper]

Unsupervised neural network models of the ventral visual stream
Chengxu Zhuang, Siming Yan, Aran Nayebi, Martin Schrimpf, Michael Frank, James DiCarlo, Daniel Yamins

Recent advancements in unsupervised learning have significantly narrowed the gap in modeling the development of the primate ventral visual stream using deep neural networks. These networks, previously limited due to their reliance on extensive supervised training, which is implausible for mimicking infant development, now show promising results with unsupervised methods.

[paper] [code]

A short version is presented at Conference on Cognitive Computational Neuroscience (CCN), 2019.
Scene Synthesis via Uncertainty-Driven Attribute Synchronization
Haitao Yang, Zaiwei Zhang, Siming Yan, Haibin Huang, Chongyang Ma, Yi Zheng, Chandrajit Bajaj, Qixing Huang.

We present a novel approach for generating 3D scenes using deep neural networks. It utilizes parametric prior distributions learned from training data to regularize neural model outputs and predict an over-complete set of attributes. This allows for the application of consistency constraints to eliminate infeasible predictions.

[paper] [code]

Recurrent Feedback Improves Feedforward Representations in Deep Neural Networks
Siming Yan, Xuyang Fang, Bowen Xiao, Harold Rockwell, Yimeng Zhang, Tai-Sing Lee
arXiv 2019.

Introducing feedback loops and horizontal recurrent connections to a deep convolutional neural network enhances its robustness against noise and occlusion, suggesting these modifications improve feedforward representations by injecting top-down semantic meaning.

[paper]

Teaching

TA with Etienne Vouga.

CS 391L Machine Learning (Spring, Fall '21)

TA with Qiang Liu and Adam Klivans.