|
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang,
Yi Yang
Arxiv, 2024
arXiv /
Data /
Code /
bibtex
TIP-I2V is the first dataset comprising over 1.70 million unique user-provided text and image prompts for image-to-video generation. It contributes to the development of better and safer image-to-video models.
|
|
Replication in Visual Diffusion Models: A Survey and Outlook
Wenhao Wang,
Yifan Sun,
Zongxin Yang,
Zhengdong Hu,
Zhentao Tan,
Yi Yang
Arxiv, 2024
arXiv /
Project /
bibtex
In this survey, we provide the first comprehensive review of replication in visual diffusion models, marking a novel contribution to the field by systematically categorizing the existing studies into unveiling, understanding, and mitigating this phenomenon.
|
|
AnyPattern: Towards In-context Image Copy Detection
Wenhao Wang,
Yifan Sun,
Zhentao Tan,
Yi Yang
Arxiv, 2024
arXiv /
Code (ICD) /
Code (Style) /
Data /
bibtex
This paper explores in-context learning for image copy detection (ICD), i.e., prompting an ICD model to identify replicated images with new tampering patterns without the need for additional training. To accommodate the “seen → unseen” generalization scenario, we construct the first large-scale pattern dataset named AnyPattern, which has the largest number of tamper patterns (90 for training and 10 for testing) among all the existing ones.
|
|
Image Copy Detection for Diffusion Models
Wenhao Wang,
Yifan Sun,
Zhentao Tan,
Yi Yang
NeurIPS, 2024
arXiv /
Code /
Data /
bibtex /
poster
In this paper, we introduce ICDiff, the first Image Copy Detection (ICD) specialized for diffusion-generated replicas. To this end, we construct a Diffusion-Replication (D-Rep) dataset and correspondingly propose a novel deep embedding method.
|
|
VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
Wenhao Wang,
Yi Yang
NeurIPS, 2024
arXiv /
Github /
Hugging Face /
Wisemodel /
bibtex /
Zhihu /
poster
✨ Top6/121,084 in the Hugging Face Dataset Trending List on Mar. 19th 2024.
VidProM is the first dataset featuring 1.67 million unique text-to-video prompts and 6.69 million videos generated from 4 different state-of-the-art diffusion models. It inspires many exciting new research areas, such as Text-to-Video Prompt Engineering, Efficient Video Generation, Fake Video Detection, and Video Copy Detection for Diffusion Models.
|
|
Pattern-Expandable Image Copy Detection
Wenhao Wang,
Yifan Sun,
Yi Yang
IJCV, 2024
Code /
Data /
bibtex
This paper proposes a specific open-world visual recognition task, i.e. Pattern-Expandable Image Copy Detection (PE-ICD). To lay the foundation for PE-ICD research, we propose Pattern Stripping (P-Strip), which separates the tamper patterns from a query by decomposing the query feature into a content feature and multiple pattern features.
|
|
TransHP: Image Classification with Hierarchical Prompting
Wenhao Wang,
Yifan Sun,
Wei Li,
Yi Yang
NeurIPS, 2023
arXiv /
Code /
bibtex /
poster
This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task. Different from prior HIC methods, our hierarchical prompting is the first to explicitly inject ancestor-class information as a tokenized hint that benefits the descendant-class discrimination.
|
|
A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection
Wenhao Wang,
Yifan Sun,
Yi Yang
AAAI, 2023 (Oral)
arXiv /
Dataset&Code /
bibtex /
poster
We contribute a new ICD dataset, i.e., Negative-Distractor for Edited Copy (NDEC), with emphasis on the seldom-noticed hard negative problem. We propose a novel Asymmetric-Similarity Learning (ASL) method for ICD.
|
|
Attentive WaveBlock: Complementarity-enhanced Mutual Networks for Unsupervised Domain Adaptation in Person Re-identification and Beyond
Wenhao Wang,
Fang Zhao,
Shengcai Liao,
Ling Shao
TIP, 2022
arXiv /
Code /
bibtex
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB), which can be integrated into the dual networks of mutual learning to enhance the complementarity.
|
|
Learning Anchored Unsigned Distance Functions with Gradient Direction Alignment for
Single-view Garment Reconstruction
Fang Zhao,
Wenhao Wang,
Shengcai Liao,
Ling Shao
ICCV, 2021 (Oral)
arXiv /
Code /
bibtex
We propose a novel learnable Anchored Unsigned Distance Function (AnchorUDF)
representation for 3D garment reconstruction from a single image.
|
|
DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations
Wenhao Wang,
Shengcai Liao,
Fang Zhao,
Cuicui Kang,
Ling Shao
BMVC, 2021
arXiv /
Code /
bibtex
We propose a new person re-identification task, i.e. how to use labeled synthetic dataset and unlabeled real-world dataset to train
a universal model. A DomainMix framework is introduced to give a basic solution to the task.
|
|
Meta AI Video Similarity Challenge: Descriptor Track
Wenhao Wang,
Yifan Sun,
Yi Yang
CVPR, 2023 (Rank 2)
Introduction /
Solution /
Code /
Presentation
We propose Feature-Compatible Progressive Learning (FCPL), which trains various models that produce mutually-compatible features.
|
|
Meta AI Video Similarity Challenge: Matching Track
Wenhao Wang,
Yifan Sun,
Yi Yang
CVPR, 2023 (Rank 2)
Introduction /
Solution /
Code /
Presentation
We use Temporal Network (TN) to ensemble the features from the descriptor track directly.
|
|
FGVC9: eBay eProduct Visual Search Challenge
Wenhao Wang,
Yifan Sun,
Zongxin Yang,
Yi Yang
CVPR, 2022 (Rank 1)
Introduction /
Solution /
Code /
Certificate
The paper demonstrates the effectiveness of vision-language models in product retrieval tasks for the first time.
|
|
Facebook AI Image Similarity Challenge: Matching Track
Wenhao Wang,
Yifan Sun,
Weipu Zhang,
Yi Yang
NeurIPS, 2021 (Rank 1)
Introduction /
Solution /
Code /
Presentation
In this paper, a data-driven and local-verification approach is proposed.
|
|
Facebook AI Image Similarity Challenge: Descriptor Track
Wenhao Wang,
Yifan Sun,
Weipu Zhang,
Yi Yang
NeurIPS, 2021 (Rank 3)
Introduction /
Solution /
Code /
Presentation
In this paper, a bag of tricks and a strong baseline are proposed for image copy detection.
|
|
The 3rd Large-scale Video Object Segmentation Challenge: Video Object Segmentation Track
Zongxin Yang,
Jian Zhang,
Wenhao Wang,
etc
CVPR, 2021 (Rank 1)
Introduction /
Solution /
Code /
Certificate
This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object
scenarios.
|
Professional Activities
|
Journal Reviewer of Transactions on Pattern Analysis and Machine Intelligence, International Journal of Computer Vision, Transactions on Image Processing, Transactions on Circuits and Systems for Video Technology, Knowledge-Based Systems, Transactions on Intelligent Transportation Systems, IEEE/CAA Journal of Automatica Sinica,
Transactions on Big Data, Transactions on Artificial Intelligence, Journal of Visual Communication and Image Representation, and Neural Networks.
Conference Reviewer of ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV, AAAI, and ACM MM.
|
|