Publications | Speech and Multimodal Intelligent Information Processing (SMIIP) Lab

Google Scholar Page

journal papers: (# denotes the corresponding author, * denotes supervised students, interns or research fellows in Ming’s lab)

Fei Su (*), Cancan Li (*), Juan Liu (#), Ming Li (#), “M2S-AVSR: Modality-aware Multi-View Self-supervised Representations for Audio-Visual Speech Recognition”, submitted to IEEE Transactions on Audio, Speech, and Language Processing.
Wenxing Liu (*), Yueran Pan (*), Dong Zhang, Hongzhu Deng, Xiaobing Zou, Ming Li (#), “Multimodal Large Language Models for ADOS-M1 Behavioral Assessment”, submitted to Neurocomputing.
Hongxi Yi, Yufei Xie, Dong Zhang, Wenxing Liu (*), Ming Li, and Dah-Jye Lee, “VLM-Guided Semantic Augmentation and Uncertainty-Aware Tri-modal Fusion for Group Emotion Recognition”, submitted to IEEE Transactions on Multimedia.
Yi, Hongxi; Zhang, Dong; Xie, Yufei; Liu, Wenxing (*); Li, Ming; Lee, Dah-Jye, “Emotional Description-Guided Vision–Language Semantic Alignment for Group Emotion Recognition”, submitted to IEEE Transactions on Affective Computing.
Xin Tong, Liwen He, Zhaowen Deng (*), Weibo Li (*), Ziheng Tang (*), Yixuan Li, Yutong Ren, Matthew Louis Mauriello, Ming Li (#), “Glitter: Exploring an LLM Virtual Agent for Supporting Practitioners in Behavioral Interventions of Autistic Children“, International Journal of Human–Computer Interaction, 2026.
Wenxing Liu (*), Yueran Pan (*), Dong Zhang, Hongzhu Deng, Xiaobing Zou and Ming Li (#), “Detecting Children with Autism Spectrum Disorder based on Script-Centric Behavior Understanding with Emotional Enhancement“, Neurocomputing, 2026.
Yucong Zhang (*), Xin Zou, Jinshan Yang, Wenjun Chen, Juan Liu, Faya Liang (#) and Ming Li (#), “Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Fold Paralysis“, Computer Speech and Language, 2025.
Ming Cheng(*), Ming Li (#), “Multi-Input Multi-Output Target-Speaker Voice Activity Detection for Unified, Flexible, and Robust Audio-Visual Speaker Diarization“, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025.
Yikang WANG, Xingming WANG, Chee Siang LEOW, Qishan ZHANG, Ming LI, Hiromitsu NISHIZAKI, “Enhancing the Robustness of Speech Anti-spoofing Countermeasures through Joint Optimization and Transfer Learning“, IEICE TRANSACTIONS on Information and System, 2025.
Yucong Zhang (*), Yuchen Song (*), Juan Liu (#), Ming Li (#), “An Automatic Laryngoscopic Image Segmentation System Based on SAM Prompt Engineering: From Glottis Annotation to Vocal Fold Segmentation”, Frontiers in Molecular Biosciences, 2025.
Ming Cheng (*), Yuke Lin (*), Ming Li (#), “Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation“, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025.
Bang Zeng (*), Ming Li (#), “USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction“, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025.
Bang Zeng (*), Ming Li (#), “Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection“, Computer Speech and Language, 2025.
Yueran Pan (*), Biyuan Chen, Wenxing Liu (*), Ming Cheng (*), Dong Zhang, Hongzhu Deng, Xiaobing Zou, and Ming Li (#), “Assessing the Expressive Language Levels of Autistic Children in Home Intervention”, IEEE Transactions on Computational Social Systems, 2025.
Danwei Cai (*), Zexin Cai (*), Ze Li (*), Ming Li (#), “Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning“, IEEE Transactions on Audio, Speech, and Language Processing, 2025.
Bing Li, Dong Zhang (#), Cheng Huang, Yun Xian, Ming Li and Dah-Jye Lee, “Location-guided Head Pose Estimation for Fisheye Image“, IEEE Transactions on Cognitive and Developmental Systems, 2024.
Weiqing Wang (*), Ming Li (#), “Online Neural Speaker Diarization with Target Speaker Tracking“, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
Chengyan Yu (*), Shihuan Wang, Dong Zhang, Yingying Zhang, Chaoqun Cen, Zhixiang You, Xiaobing Zou, Hongzhu Deng (#), Ming Li (#), “HSVRS: A Virtual Reality System of the Hide-and-Seek Game to Enhance Gaze Fixation Ability for Autistic Children“, IEEE Transactions on Learning Technologies, 2024.
Xiaoyi Qin (*), Na Li, Shufei Duan, Ming Li (#), “Investigating Long-Term and Short-Term Time-Varying Speaker Verification“, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
Danwei Cai (*), Ming Li (#), “Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
Chengyan Yu( *), Dong Zhang, Wei Zou, Ming Li (#), “Joint Training on Multiple Datasets with Inconsistent Labeling Criteria for Facial Expression Recognition“, IEEE Transactions on Affective Computing, 2024.
Bang Zeng (*), Hongbin Suo, Yulong Wan, Ming Li (#), “Simultaneous Speech Extraction for Multiple Target Speakers Under Meeting Scenarios”, J. Shanghai Jiaotong Univ. (Sci.) (2024).
Xiaoyi Qin (*), Ze Li (*), Dong Liu (*), Ming Li (#), “Speaker verification in deliberately disguised scenarios”, Computer Engineering and Applications, 2024.
Zexin Cai, Ming Li (#), “Integrating Frame-Level Boundary Detection and Deepfake Detection for Locating Manipulated Regions in Partially Spoofed Audio Forgery Attacks“, Computer Speech and Language, 2023.
Li Li (*), Yi (Esther) Su (#), Wenwen Hou, Muyu Zhou, Yixiang Xie (*), Xiaobing Zou, Ming Li, “Expressive language profiles in a clinically screening sample of Mandarin-speaking preschool children with Autism Spectrum Disorder“, Journal of Speech, Language, and Hearing Research, 2023.
Feng-lei Zhu, Shi-huan Wang, Wen-bo Liu, Hui-lin Zhu, Ming Li, Xiao-bing Zou, “A multimodal machine learning system in early screening for toddlers with autism spectrum disorders based on the response to name“, Frontiers in Psychiatry, 2023.
Cheng, Ming (*); Zhang, Yingying; Xie, Yixiang; Pan, Yueran (*); Li, Xiao; Liu , Wenxing (*); Yu, Chengyan (*); Zhang, Dong; Xing, Yu; Huang, Xiaoqian; Wang, Fang; You, Cong; Zou, Yuanyuan; Liu, Yuchong; Liang, Fengjing; Zhu, Huilin; Tang, Chun; Deng, Hongzhu; Zou, Xiaobing (#); Li, Ming (#), “Computer-aided Autism Spectrum Disorder Diagnosis with Behavior Signal Processing”, IEEE Transactions on Affective Computing, 2023.
Yaogen Yang (*), Haozhe Zhang (*), Zexin Cai (*), Yao Shi (*), Ming Li (#), Dong Zhang, Xiaojun Ding (#), Jianhua Deng and Jie Wang, “Electrolaryngeal Speech Enhancement based on Bottleneck Feature Refinement and Voice Conversion”, Biomedical Signal Processing and Control, 2022.
Zhesi Zhu, Dong Zhang(#), Cailong Chi, Ming Li, and Dah-Jye Lee, “A Complementary Dual-branch Network for Appearance-based Gaze Estimation from Low-resolution Facial Image”, IEEE Transactions on Cognitive and Developmental Systems, 2022.
Xiaoyi Qin (*), Danwei Cai (*), Ming Li (#), “Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
Weiqing Wang (*), Qingjian Lin (*), Danwei Cai (*), Ming Li (#), “Similarity Measurement of Segment-Level Speaker Embeddings in Speaker Diarization”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
Zexin Cai (*), Yaogen Yang (*), Ming Li (#), “Cross-lingual Multispeaker Speech Synthesis under Limited-Data Scenarios”, Computer Speech and Language, 2022.
Danwei Cai (*), Weiqing Cai (*), Ming Li (#), “Incorporating visual information in audio based self-supervised speaker recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
Yanze Xu (*), Ming Li (#), Huahua Cui and Mingyang Xu, “Paralinguistic singing attribute recognition using supervised machine learning for describing the singing voice in vocal pedagogy”, EURASIP Journal on Audio, Speech, and Music Processing, 2022.
Xiao Li, Dong Zhang (#), Ming Li, Dah-Jye Lee, “Accurate Head Pose Estimation Using Image Rectification and Lightweight Convolutional Neural Network”, IEEE Transactions on Multimedia, 2022.
Wenbo Liu (*), Ming Li (#), Xiaobing Zou (#), Bhiksha Raj, “Discriminative Dictionary Learning for Autism Spectrum Disorder Identification”, Frontiers in Computational Neuroscience, 2021.
Jianing Teng, Dong Zhang (#), Ming Li, Dah-Jye Lee, “Typical Facial Expression Network Using Facial Feature Decoupler and Spatial-Temporal Learning”, IEEE Transactions on Affective Computing, 2021.
Weiqing Wang (*), Jin Pan (*), Hua Yi, Zhanmei Song, Ming Li (#), “Audio-based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29 (2021): 1119-1133.
Weicong Chen, Dong Zhang (#), Ming Li, Dah-Jye Lee, “STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition”, IEEE Transactions on Affective Computing, 2020.
Weicheng Cai (*), Jinkun Chen (*), Jun Zhang, Ming Li (#), “On the fly Data Loader and Utterance-level Aggregation for Speaker and Language Recognition”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28 (2020): 1038-1051.
Ming Li, Dengke Tang (*), Junlin Zeng (*), Tianyan Zhou (*), Xiaobing Zou (#), “An Automated Assessment Framework for Atypical Prosody and Stereotyped Idiosyncratic Phrases related to Autism Spectrum Disorder”, Computer Speech and Language, 56 (2019): 80-94.
Ming Li, Hao Xu (*), Xingchang Huang (*), Zhanmei Song (#), Xiaolin Liu, Xin Li, “Facial Expression Recognition with Identity and Emotion Joint Learning”, IEEE Transaction on Affective Computing, accepted in 2018, published at 12, no. 2 (2021): 544-550.
Zhicheng Li, Bin Hu, Ming Li, Gengnan Luo, “String Stability Analysis for Vehicle Platooning under Unreliable Communication Links with Event-Triggered Strategy”, IEEE Transactions on Vehicular Technology, 68, no. 3 (2019): 2152-2164.
Kong-Yik Chee, Zhe Jin, Danwei Cai (*), Ming Li, Wun-She Yap (#), Yen-Lung Lai, Bok-Min Goi, “Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing,” Pattern Recognition, 76 (2018): 273-287.
Zhicheng Li, Yinliang Xu, Ming Li (#), “Finite-time Stability and Stabilization of Semi-Markovian Jump Systems with Time Delay”, International Journal of Robust and Nonlinear Control, 28, no. 6 (2018): 2064-2081.
Wenbo Zhao (*), Ming Li (#), Joel B. Harley, Yuanwei Jin, Jose Moura, Jimmy Zhu, “Reconstruction of Lamb wave dispersion curves by sparse representation and continuity constraints“, Journal of the Acoustical Society of America, 141, no. 2 (2017): 749-763.
Wenbo Liu (*), Ming Li (#), Li Yi (#), “Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework“, Autism research, 9, no. 8 (2016): 888-898.
Ming Li (#), Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Vikram Ramanarayanan, Shrikanth Narayanan, “Speaker verification based on the fusion of speech acoustics and inverted articulatory signals”, Computer Speech & Language, 36 (2016): 196-211.
Ming Li (#), Wenbo Liu (*), “Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification“, Journal of Signal Processing Systems, 82, no. 2 (2016): 207-215.
Yinliang Xu (#), Zaiyue Yang, Wei Gu, Ming Li, Zicong Deng, “Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid“, IEEE Transactions On Smart Grid, 8, no. 4 (2015): 1568-1579.
Donna Spruijt-Metz (#), Cheng K.F. Wen, Gillian O’Reilly, Ming Li, Sangwon Lee, Adar Emken, Urbashi Mitra, Murali Annavaram, Gisele Ragusa, Shrikanth Narayanan. “Innovations in the Use of Interactive Technology to Support Weight Management”, Current Obesity Reports, 4, no. 4 (2015): 510-519.
Jangwon Kim (#), Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth Narayanan, “Automatic intelligibility classification of sentence-level pathological speech”, Computer Speech & Language, 29, no. 1 (2015): 132-144.
Ming Li (#), Shrikanth Narayanan, “Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification“, Computer Speech & Language, 28, no. 4 (2014): 940-958.
Daniel Bone (#), Ming Li, Matthew Black, Shrikanth Narayanan, “Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors“, Computer Speech & Language, 28, no. 2 (2014): 375-391.
Ming Li (#), Kyu J. Han, Shrikanth Narayanan, “Automatic Speaker Age and Gender Recognition using acoustic and prosodic level information fusion”, Computer speech and language, 27, no. 1 (2013): 151-167.
Urbashi Mitra, Adar Emken, Sangwon Lee, Ming Li, Harshvardhan Vathsangam, Daphney-stavroula Zois, Murali Annavaram, Shrikanth Narayanan, “KNOWME: a Case Study in Wireless Body Area Sensor Network Design“, IEEE Communications Magazine 50, no. 5 (2012): 116-125.
Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Shri Narayanan, Urbashi Mitra, Donna Spruijt-Metz, Murali Annavaram, “KNOWME: An energy-efficient multimodal body area network for physical activity monitoring”, ACM Transactions in Embedded Computing Systems, 11, no. S2 (2012): 1-24.
Adar Emken, Ming Li, Gautam Thatte, Sangwon Lee, Murali Annavaram, Urbashi Mitra, Shrikanth Narayanan, Donna Spruijt-Metz, “Recognition of Physical Activities in Overweight Hispanic Youth using KNOWME Networks”, Journal of Physical Activity and Health, 9, no. 3 (2012): 432-441.
Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Murali Annavaram, Shri Narayanan, Donna Spruijt-Metz, Urbashi Mitra, “Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection”, IEEE Transaction on Signal Processing, 59, no. 4 (2011): 1843-1857.
Ming Li, Viktor Rozgic, Gautam Thatte, Sangwon Lee, Adar Emken, Murali Annavaram, Urbashi Mitra, Donna Spruijt-Metz, Shrikanth Narayanan, “Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information“, IEEE Transactions on Neural Systems & Rehabilitation Engineering, 18, no. 4 (2010): 369-380.
Hongbin Suo (#), Ming Li, Ping Lu, Yonghong Yan (#), “Automatic language identification with discriminative language characterization based on SVM”, IEICE transaction on Information and Systems, 91, no. 3 (2008): 567-575.
Hongbin Suo, Ming Li, Ping Lu, Yonghong Yan (#), “Using SVM as back-end classifier for language identification“, EURASIP Journal on Audio, Speech, and Music Processing, 2008.

Conference papers: (* denotes supervised students, interns or research fellows in Ming’s lab)

Peijun Yang (*), Zhan Jin (*), Juan Liu, Hui Bu, Ming Li, “AISHELL8-FISHEYE: A Fisheye Audio-Visual Dataset for Target Speaker Extraction with Distortion-Aware Baselines”, submitted to ACM Multimedia 2026.
Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li, “DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models”, submitted to ACM Multimedia 2026.
Yucong Zhang (*), Zhang Chen (*), Juan Liu, Wei Ju, Hongbin Suo, Ming Li, “Dual-Encoder Fusion with Explicit and Implicit Injection for the Interspeech 2026 Audio Encoder Capability Challenge”, submitted to Interspeech 2026.
Zhang Chen (*), Yucong Zhang (*), Xiaoxiao Miao, Ming Li, “Toward Multimodal Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals”, submitted to Interspeech 2026.
Xueping Zhang (*), Zhenshan Zhang (*), Yechen Wang, Linxi Li, Liwei Jin, Ming Li, “MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection”, submitted to Interspeech 2026.
Fei Su (*), Cancan Li (*), Juan Liu, Wei Ju, Hongbin Suo, Ming Li, “Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement”, submitted to Interspeech 2026.
Houmin Sun (*), Zi Hu (*), Linxi Li, Yechen Wang, Weijin Li, Ming Li, “Making Separation-First Multi-Stream Audio Watermarking Feasible via Joint Training”, submitted to Interspeech 2026.
Dong Liu (*), Juan Liu, Wei Ju, Yao Tian, Ming Li, “WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Speech Conversion”, submitted to Interspeech 2026.
Peijun Yang (*), Zhan Jin (*), Juan Liu, Ming Li, “Multi-View Based Audio Visual Target Speaker Extraction”, submitted to Interspeech 2026.
Jiarong Du (*), Zhan Jin (*), Peijun Yang (*), Juan Liu, Zhuo Li, Xin Liu, Ming Li, “Audio-Visual Speech Enhancement in Complex Scenarios with Separation and Dereverberation Joint Modeling”, submitted to Interspeech 2026.
Ze Li (*), Xiaoxiao Miao, Juan Liu, Ming Li, “Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge”, submitted to Interspeech 2026.
Wang Xiang (*), Xue Zhang, Bang Zeng (*), Cunhang Fan, Juan Liu, Ming Li, “A Dual-Path Efficient EEG Encoder for Brain-Assisted Target Speaker Extraction”, submitted to Interspeech 2026.
Zhan Jin (*), Bang Zeng (*), Peijun Yang (*), Jiarong Du (*), Wei Ju, Yao Tian, Juan Liu, Ming Li, “Robust Audio-Visual Target Speaker Extraction with Multiple Enrollment Fusion”, submitted to Interspeech 2026.
Li Li (*), Ming Cheng (*), Juan Liu, Ming Li, “SPATIALLY-AUGMENTED SEQUENCE-TO-SEQUENCE NEURAL DIARIZATION FOR MEETINGS”, submitted to Interspeech 2026.
Tianyi Peng, Han Zhu, Ming Li, Xiaoxiao Miao, “ZipVoice-VC: Flow-Matching Based Voice Conversion for High-Fidelity Speech Anonymization”, submitted to Interspeech 2026.
Cancan Li (*), Fei Su (*), Juan Liu, Hui Bu, Yulong Wan, Hongbin Suo, Ming Li, “AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines“, ICASSP 2026.
Ze Li (*), Ming Cheng (*), Ming Li, “Enhancing Speaker Verification with W2v-Bert 2.0 and Knowledge Distillation Guided Pruning”, ICASSP 2026.
Xueping Zhang (*), Liwei Jin, Yechen Wang, Linxi Li, Ming Li, “Compspoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasure”, ICASSP 2026.
Yucong Zhang (*), Juan Liu, Ming Li, “ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals”, ICASSP 2026.
ZhenShan Zhang (*), Xueping Zhang (*), Yechen Wang, Liwei Jin, Ming Li, “The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures”, ICASSP 2026.
Mingjing Yi (*), Yuxi Wang (*), Ming Li, “Efficient Video to Audio Mapper with Visual Scene Detection“, APSIPA ASC 2025.
Zhang Chen (*), Yucong Zhang (*), Ming Li, “Improving Anomalous Sound Detection with Top-M Pseudo-Labeling“, NCMMSC 2025.
Jiarong Du (*), Zhan Jin (*), Bang Zeng (*), Peijun Yang (*), Ming Li, and Juan Liu, “Improving the Robustness of Audio-Visual Target Speaker Extraction With AV-HuBERT Based Lip Features”, Proc. of NCMMSC 2025.
Beilong Tang (*), Bang Zeng (*), Ming Li, “TSELM: Target Speaker Extraction using Discrete Tokens and Language Models“, NCMMSC 2025.
Zhuojun Wu (*), Dong Liu (*), Juan Liu, Yechen Wang, Linxi Li, Liwei Jin, Hui Bu, Pengyuan zhang, Ming Li, “SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis“, ACM Multimedia 2025.
Yucong Zhang (*), Juan Liu, Ming Li, “Multi-scale Scanning Network for Machine Anomalous Sound Detection“, ICONIP 2025.
Beilong Tang (*), Bang Zeng (*), Ming Li, “LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models“, ASRU 2025.
Beilong Tang (*), Xiaoxiao Miao, Xin Wang, Ming Li, “SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization“, ASRU 2025.
Yuchen Song (*), Yucong Zhang (*), Ming Li, “Exploring Pre-trained models on Ultrasound Modeling for Mice Autism Detection with Uniform Filter Bank and Attentive Scoring“, Interspeech 2025.
Yuxi Wang (*), Yikang Wang (*), Qishan Zhang, Hiromitsu Nishizaki, Ming Li, “VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset“, Interspeech 2025.
Ming Cheng (*), Fei Su (*), Cancan Li (*), Juan Liu, Ming Li, “Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge“, Interspeech 2025.
Hongyu Zhang (*), Ming Cheng (*), Jing Feng, Ming Li, “Selective Channel Attention based Target Speaker Voice Activity Detection for Speaker Diarization under AD-HOC Microphone Array Settings“, Interspeech 2025.
Ze Li (*), Yao Shi (*), Yunfei Xu, Ming Li, “Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System”, ICME 2025.
Ze Li (*), Yuke Lin (*),Yao Tian, Hongbin Suo, Pengyuan zhang, Yanzhen Ren, Zexin Cai, Hiromitsu Nishizaki, Ming Li, “The Database and Benchmark for the Source Speaker Tracing Challenge 2024”, SLT 2024.
Yueqian Lin (*), Dong Liu (*), Yunfei Xu, Hongbin Suo, Ming Li, “Bridging Facial Imagery and Vocal Reality: Stable Diffusion-Enhanced Voice Generation“, ISCSLP 2024.
Zhuojun Wu (*), Dong Liu (*), Ming Li, “Lightweight Language Model for Speech Synthesis: Attempts and Analysis“, ISCSLP 2024.
Yiwei Liang (*), Ming Li, “ Vivid Background Audio Generation based on Large Language Models and AudioLDM“, ISCSLP 2024.
Xiaoyi Qin (*), Ze Li (*), Dong Liu (*), Ming Li, “Speaker verification in deliberately disguised scenarios“, NCMMSC 2024.
Dong Liu (*), Yueqian Lin (*), Hui Bu, Ming Li, “Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction“, IALP 2024.
Dong Liu (*), Yueqian Lin (*), Yunfei Xu, Ming Li, “TMCSpeech: A Chinese Tv and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model“, ICPR 2024.
Huali Zhou (*), Yuke Lin (*), Dong Liu (*), Ming Li, “KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario“, ICPR 2024.
Yuke Lin (*), Ming Cheng (*), Fulin Zhang, Yingying Gao, Shilei Zhang (#), Ming Li (#), “VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark“, Interspeech 2024.
Haoxu Wang (*), Cancan Li (*), Fei Su (*), Juan Liu, Hongbin Suo, Ming Li, “The WHU Wake Word Lipreading System for the 2024 Chat-scenario Chinese Lipreading Challenge“, ICME challenge paper, 2024.
Rongqi Bei, Yajie Liu, Yihe Wang, Yuxuan Huang, Ming Li, Yuhang Zhao, and Xin Tong, “StarRescue: the Design and Evaluation of A Turn-Taking Collaborative Game for Facilitating Social and Fine Motor Skills of Children with Autism Spectrum Disorder”, CHI 2024.
Ze Li (*), Yuke Lin (*), Ning Jiang, Xiaoyi Qin (*), Guoqing Zhao, Haiying Wu, Ming Li, “Multi-Objective Progressive Clustering for Semi-Supervised Domain Adaptation in Speaker Verification”, ICASSP 2024.
Haoxu Wang (*), Ming Cheng (*), Qiang Fu, Ming Li,” Robust Wake Word Spotting with Frame-Level Cross-Modal Attention based Audio-Visual Conformer”, ICASSP 2024.
HaoxuWang (*), Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li, “Slidespeech: A Large Scale Slide-Enriched Audio-Visual Corpus”, ICASSP 2024.
Weiqing Wang (*), Danwei Cai (*), Ming Cheng (*), Ming Li, “Joint Inference of Speaker Diarization and ASR with Multi-Stage Information Sharing”, ICASSP 2024.
Yucong Zhang (*), Juan Liu, Yao Tian, Haifeng Liu, Ming Li, “A Dual-Path Framework with Frequency-and-Time Excited Network for Machine Anomalous Sound Detection”, ICASSP 2024.
Yuke Lin (*), Xiaoyi Qin (*), Guoqing Zhao, Ming Cheng, Ning Jiang, Haiying Wu, Ming Li1, “Voxblink: A Large Scale Speaker Verification Dataset on Camera”, ICASSP 2024.
Bang Zeng (*), Ming Cheng (*), Yao Tian, Haifeng Liu, Ming Li, “Efficient Personal Voice Activity Detection with Wake Word Reference Speech”, ICASSP 2024.
Zexin Cai (*), Ming Li, “’Invertible Voice Conversion with Parallel Data”, ICASSP 2024.
Bang Zeng (*), Hongbing Suo, Yulong Wan, Ming Li, “Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios”, NCMMSC 2023.
Yueqian Lin (*), Ming Li, “EEG-Based Speech Envelope Decoding: Structured State Space and Diffusion Model Integration”, NCMMSC 2023.
Zhixian Zhang (*), Yucong Zhang (*), Ming Li, “Pre-training Deep Learning Models with Finite Element Simulation Data for Enhanced Machine Anomalous Sound Detection”, NCMMSC 2023.
Ming Cheng (*), Weiqing Wang (*), Xiaoyi Qin (*), Yuke Lin (*), Ning Jiang, Guoqing Zhao, Ming Li, “The DKU-MSXF Diarization System for the VoxCeleb Speaker Recognition Challenge 2023”, NCMMSC 2023.
Hao Li, Weiqing Wang (*), Ming Li, “Real-time Automotive Engine Sound Simulation with Deep Neural Network”, NCMMSC 2023.
Huali Zhou (*), Yueqian Lin (*), Yao Shi (*), Peng Sun, Ming Li, “BiSinger: Bilingual Singing Voice Synthesis”, ASRU 2023.
Yuke Lin (*), Xiaoyi Qin (*), Ning Jiang, Guoqing Zhao, Ming Li, “Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification”, ASRU 2023.
Xiaoyi Qin (*), Xingming Wang (*), Yanli Chen, Qinglin Meng, Ming Li, “From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3”, IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023).
Bang Zeng (*), Hongbin Suo, Yulong Wan, Ming Li, “Low-complexity Multi-Channel Speaker Extraction with Pure Speech Cues”, APSIPA ASC, 2023.
Wenxing Liu (*), Ming Cheng (*), Yueran Pan (*), Lynn Yuan, Suxiu Hu, Ming Li, Songtian Zeng, “Assessing the Social Skills of Children with Autism Spectrum Disorder via Language-Image Pre-training Models”, The 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2023.
Xingming Wang (*), Bang Zeng (*), Hongbin Suo, Yulong Wan, Ming Li, “Robust audio anti-spoofing countermeasure with joint training of front-end and back-end models”, Interspeech 2023.
Bang Zeng (*), Hongbin Suo, Yulong Wan, Ming Li, “SEF-Net: Speaker Embedding Free Target Spekaer Extraction Network”, Interspeech 2023.
Yucong Zhang (*), Hongbin Suo, Yulong Wan, Ming Li, “Outlier-aware Inlier Modeling and Multi-scale Scoring for Anomalous Sound Detection via Multitask Learning”, Interspeech 2023.
Ming Cheng (*), Haoxu Wang (*), Ziteng Wang, Qiang Fu, Ming Li, “THE WHU-ALIBABA AUDIO-VISUAL SPEAKER DIARIZATION SYSTEM FOR THE MISP CHALLENGE 2022“, ICASSP 2023.
Ming Cheng (*), Weiqing Wang (*), Yucong Zhang (*), Xiaoyi Qin (*), Ming Li, “Target-Speaker Voice Activity Detection Via Sequence-To-Sequence Prediction”, ICASSP 2023.
Danwei Cai (*), Zexin Cai (*), Ming Li, “Identifying Source Speakers For Voice Conversion Based Spoofing Attacks For Speaker Verification”, ICASSP 2023.
Danwei Cai (*), Weiqing Wang (*), Ming Li, Rui Xia, Chuanzeng Huang, “Pretraining Conformer with ASR for Speaker Verification”, ICASSP 2023.
HaoxuWang (*), Ming Cheng (*), Qiang Fu, Ming Li, “The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 Misp Challenge: Deep Analysis”, ICASSP 2023.
Xingming Wang (*), Hao Wu, Chen Ding, Chuanzeng Huang, Ming Li, “Exploring Universal Singing Speech Language Identification Using Self-Supervised Learning Based Front-End Features”, ICASSP 2023.
Zexin Cai (*), Weiqing Wang (*), Ming Li, “Waveform Boundary Detection For Partially Spoofed Audio”, ICASSP 2023.
Yikang Wang (*), Xingming Wang (*)，Hiromitsu Nishizaki, Ming Li, “Low Pass Filtering and Band Extension for Robust Anti-spoofing Countermeasure against Codec Variabilities”, ISCSLP 2022.
Tinglong Zhu (*), Xingming Wang (*), Xiaoyi Qin (*), Ming Li, “Source Tracing: Detecting Voice Spoofing”, APSIPA ASC 2022.
Weiqing Wang (*), Ming Li and Qingjian Lin, “Online Target Speaker Voice Activity Detection for Speaker Diarization”, Interspeech 2022.
Xingming Wang (*), Xiaoyi Qin (*), Yikang Wang (*), Yunfei Xu and Ming Li, “The DKU-OPPO System for the Spoofing-Aware Speaker Verification challenge 2022”, Interspeech 2022.
Xiaoyi Qin (*), Na Li, Chao Weng, Dan Su, Ming Li, “Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings”, Interspeech 2022.
Yucong Zhang (*), Qinjian Lin, Weiqing Wang (*), Lin Yang, Xuyang Wang, Junjie Wang, Ming Li, “Low-Latency Online Speaker Diarization with Graph-Based Label Generation”, Odyssey 2022.
Jincheng He (*), Yuanyuan Bao (*), Na Xu, Hongfeng Li, Shicong Li, Linzhang Wang, Fei Xiang, Ming Li, “Single-Channel Target Speaker Separation using Joint Training with Target Speaker’s Pitch Information”, Odyssey 2022.
Haoxu Wang (*), Yan Jia (*), Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li, “Generating Adversarial Samples For Training Wake-Up Word Detection Systems Against Confusing Words”, Odyssey 2022.
Qingjian Lin, Lin Yang, Xuyang Wang, Xiaoyi Qin (*), Junjie Wang, Ming Li, “Towards Lightweight applications: Asymmetric Enroll-Verify Structure For Speaker Verification”, Proc. of ICASSP 2022.
Weiqing Wang (*), Ming Li, “Incorporating End-To-End Framework Into Target-Speaker Voice Activity Detection”, Prof. of ICASSP 2022.
Haozhe Zhang (*), Zexin Cai (*), Xiaoyi Qin (*), Ming Li, “SIG-VC: A Speaker Information Guided Zero-Shot Voice Conversion System For Both Human Beings And Machines”, Proc. of ICASSP 2022.
Xiaoyi Qin (*), Na Li, Chao Weng, Dan Su, Ming Li, “Simple Attention Module Based Speaker Verification with Iterative Noisy Label Detection”, Proc. of ICASSP 2022.
Weiqing Wang (*), Xiaoyi Qin (*), Ming Li, “Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MET Challenge”, of ICASSP 2022.
Ming Cheng (*), Haoxu Wang (*), Yechen Wang (*), Ming Li, “The DKU Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge”, of ICASSP 2022.
Yueran Pan (*), Jiaxin Wu, Ran Ju (*), Ziang Zhou (*), Jiayue Gu, Songtian Zeng, Lynn Yuang, Ming Li, “A Multimodal Framework for Automated Teaching Quality Assessment of One-to-many Online Instruction Videos”, Proc. of ICPR 2022.
Ziang Zhou (*), Yanze Xu (*), Ming Li, “Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion”, NCMMSC 2022.
Xiaoyi Qin (*), Yaogen Yang (*), Yao Shi (*), Lin Yang, Xuyang Wang, Junjie Wang, Ming Li, “VC-AUG : Voice Conversion based Data Augmentation for Text-Dependent Speaker Verification”, NCMMSC 2022.
Tinglong Zhu (*), Xiaoyi Qin (*), Ming Li, “Binary Neural Network for Speaker Verification”, Proc. of INTERSPEECH 2021, 86-90.
Weiqing Wang (*), Danwei Cai (*), Jin Wang, Mi Hong, Xuyang Wang, Qingjian Lin, Ming Li, “The DKU-Duke-Lenovo System Description for the Fearless Steps Challenge Phase III”, Proc. of INTERSPEECH 2021， 1044-1048.
Yan Jia (*), Xingming Wang (*), Xiaoyi Qin (*), Yinping Zhang, Xuyang Wang, Junjie Wang, Dong Zhang and Ming Li, “The 2020 Personalized Voice Trigger Challenge: Open Datasets, Evaluation Metrics, Baseline System and Results”, Proc. of INTERSPEECH 2021, 4239-4243.
Yao Shi (*), Hui Bu, Xin Xu, Shaoji Zhang, Ming Li, “AISHELL-3: A Multi-Speaker Mandarin TTS Corpus”, Proc. of INTERSPEECH 2021, 2756-2760.
Danwei Cai (*), Weiqing Wang (*), Ming Li, “An Iterative Framework For Self-Supervised Deep Speaker Representation Learning”, Proc. of ICASSP 2021, 6728-6732.
Xinmeng Chen (*), Xuchen Gong (*), Ming Cheng (*), Qi Deng, Ming Li, “Cross-modal Assisted Training for Abnormal Event Recognition in Elevators,” Proc. of ICMI 2021, 530-538.
Danwei Cai (*), Ming Li, “Embedding Aggregation for Far-Field Speaker Verification with Distributed Microphone Arrays”, Proc. of SLT 2021, 308-315.
Tingle Li (*), Jiawei Chen, Haowen Hou, Ming Li, “Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation”, Proc. of ISCSLP 2021, 1-5.
Murong Ma (*), Haiwei Wu (*), Xuyang Wang, Lin Yang, Junjie Wang, Ming Li, “Acoustic Word Embedding on Code-switching Query by Example Spoken Term Detection”, Proc. of ISCSLP 2021.
Danwei Cai (*), Ming Li, “The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge”, Proc. of VoxSRC 2021 workshop.
Weiqing Wang (*), Danwei Cai (*), Qingjian Lin (*), Lin Yang, Junjie Wang, Jin Wang, Ming Li. “The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge“, Proc. of VoxSRC 2021 workshop.
Weicheng Cai (*), Ming Li, “A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data”, Proc. of APSIPA ASC 2021.
Jiyang Tang (*), Ming Li, “End-to-End Mandarin Tone Classification with Short Term Context Information”, Proc. of APSIPA ASC 2021.
Huangrui Chu (*), Yechen Wang (*), Ran Ju (*), Yan Jia (*), Haoxu Wang (*), Ming Li, Qi Deng, “Call for help detection in emergent situations using keyword spotting and paralinguistic analysis”, Proc. of ICMI Satellite Workshop ASMMC21.
Ran Ju (*), Huangrui Chu (*), Yechen Wang (*), Qi Deng, Ming Cheng, Ming Li, “A Multimodal Dynamic Neural Network for Call for Help Recognition in Elevators”, Proc. of ICMI Satellite Workshop ASMMC21.
Haiwei Wu (*) and Ming Li, “Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling”, Proc. of NCMMSC 2021.
Yuanyuan Bao (*), Yanze Xu (*), Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, Jincheng He (*), Ming Li, “Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication”, Proc. of NCMMSC 2021.
Yechen Wang (*), Yan Jia (*), Murong Ma (*), Zexin Cai (*), Ming Li, “A Two-Stage Query-by-example Spoken Term Detection System for Personalized Keyword Spotting”, Proc. of NCMMSC 2021.
Ming Cheng (*), Kunjing Cai (*), Ming Li, “RWF-2000: An Open Large Scale Video Database for Violence Detection”, Proc. of ICPR 2020, 4183-4190.
Yueran Pan (*), Kunjing Cai (*), Ming Cheng (*), Xiaobing Zou, Ming Li, “Responsive Social Smile: A Machine Learning based Multimodal Behavior Assessment Framework towards Early Stage Autism Screening”, Proc. of ICPR 2020, 2240-2247.
Zexin Cai (*), Chuxiong Zhang (*), Ming Li, “From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer by Feedback Constraint”, Proc. of INTERSPEECH 2020, 3974-3978.
Xiaoyi Qin (*), Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li, “The INTERSPEECH 2020 Far-Field Speaker Verification Challenge”, Proc. of INTERSPEECH 2020, 3456-3460.
Haiwei Wu (*), Yan Jia (*), Yuanfei Nie, Ming Li, “Domain Aware Training for Far-field Small-footprint Keyword Spotting”, Proc. of INTERSPEECH 2020, 2562-2566.
Tingle Li (*), Qingjian Lin (*), Yuanyuan Bao (*), Ming Li, “Atss-Net: Target Speaker Separation via Attention-based Neural Network”, Proc. of INTERSPEECH 2020, 1411-1415.
Qingjian Lin (*), Yu Hou (*), Ming Li, “Self-Attentive Similarity Measurement Strategies in Speaker Diarization”, Proc. of INTERSPEECH 2020, 284-288.
Qingjian Lin (*), Tingle Li (*), Ming Li, “The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02”, Proc. of INTERSPEECH 2020, 2607-2611.
Qingjian Lin (*), Weicheng Cai (*), Lin Yang, Junjie Wang, Jun Zhang, Ming Li, “DIHARD II is Still Hard: Experimental Results and Discussions”, Proc. of Odyssey 2020, 102-109.
Qingjian Lin (*), Tingle Li (*), Lin Yang, Junjie Wang, Ming Li, “Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization”, Proc. of Odyssey 2020, 125-131.
Danwei Cai (*), Weicheng Cai (*), Ming Li, “Within-sample variability-invariant loss for robust speaker recognition under noisy environments”, Proc. of ICASSP 2020, 6469-6473.
Xiaoyi Qin (*), Hui Bu, Ming Li, “HI-MIA: a far-field text-dependent speaker verification database and the baselines”, Proc. of ICASSP 2020, 7609-7613.
Weicheng Cai (*), Haiwei Wu (*), Danwei Cai (*), Ming Li, “The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion”, Proc. of INTERSPEECH 2019, 1023-1027.
Zexin Cai (*), Yaogen Yang (*), Chuxiong Zhang (*), Xiaoyi Qin (*), Ming Li, “Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Feature”, Proc. of INTERSPEECH 2019, 2110-2114.
Haiwei Wu (*), Weiqing Wang (*), Ming Li, “The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge”, Proc. of INTERSPEECH 2019, 2433-2437.
Qingjian Lin (*), Ruiqing Yin, Ming Li, Hervé Bredin, Claude Barras, “LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization”, Proc. of INTERSPEECH 2019, 366-370.
Danwei Cai (*), Xiaoyi Qin (*), Weicheng Cai (*), Ming Li, “The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge”, Proc. of INTERSPEECH 2019, 2493-2497.
Danwei Cai (*), Weicheng Cai (*), Ming Li, “The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation”, Proc. of INTERSPEECH 2019, 4370-4374.
Danwei Cai (*), Xiaoyi Qin (*), Ming Li, “Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment”, Proc. of INTERSPEECH 2019, 4365-4369.
Xiaoyi Qin (*), Danwei Cai (*), Ming Li, “Far-Field End-to-End Text-Dependent Speaker Veriﬁcation based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation”, Proc. of INTERSPEECH 2019, 4045-4049.
Weiqing Wang (*), Haiwei Wu (*), Ming Li, “Deep Neural Networks with Batch Speaker Normalization for Intoxicated Speech Detection”, Proc. of APSIPA ASC 2019, 1323-1327.
Haiwei Wu (*), Weicheng Cai (*), Ming Li, Ji Gao, Shanshan Zhang, Zhiqiang Lv, Shen Huang, “DKU-Tencent Submission to Oriental Language Recognition AP18-OLR Challenge”, Proc. of APSIPA ASC 2019, 1646-1651.
Sheng Sun (*), Shuangmei Li, Wenbo Liu (*), Xiaobing Zou, and Ming Li, “Fixation Based Object Recognition in Autism Clinic Setting”, Proc. of ICIRA 2019, 615-628.
Zexin Cai (*), Zhicheng Xu (*), Ming Li, “F0 contour estimation using phonetic feature in electrolaryngeal speech enhancement”, Proc. of ICASSP 2019, 6490-6494.
Weicheng Cai (*), Shen Huang, Ming Li, “Utterance-level End-to-end Language Identification using Attention-based CNN-BLSTM”, Proc. of ICASSP 2019, 5991-5995.
Dengke Tang (*), Junlin Zeng (*), Ming Li, “An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals”, Proc. of INTERSPEECH 2018, 162-166.
Weicheng Cai (*), Jinkun Chen (*), Ming Li, “Analysis of Length Normalization in End-to-End Speaker Verification System”, Proc. of INTERSPEECH 2018, 3618—3622.
Weicheng Cai (*), Jinkun Chen (*), Ming Li, “Exploring the Encoding Layer and Loss function in End-to-End Speaker and Language Recognition System”, Proc. of Odyssey 2018.
Weicheng Cai (*), Wenbo Liu (*), Zexin Cai (*), Ming Li, “A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification”, Proc. of ICASSP, 2018, 5189-5193.
Weicheng Cai (*), Zexin Cai (*), Xiang Zhang, Ming Li, “’Insights into End-to-End Learning Scheme for Language Identification”, Proc. of ICASSP, 2018, 5209-5213.
Haiwei Wu (*), Ming Li，“Unsupervised Query by Example Spoken Term Detection Using Features Concatenated with Self-Organizing Map Distances”, Proc. of ISCSLP 2018.
Zexin Cai (*), Xiaoyi Qin (*), Danwei Cai (*), Ming Li, Xinzhong Liu, “The DKU-JNU-EMA Electromagnetic Articulography Database on Mandarin and Chinese Dialects with Tandem Feature based Acoustic-to-Articulatory Inversion”, Proc. of ISCSLP 2018.
Jinkun Chen (*), Weicheng Cai (*), Ming Li, “End-to-end Language Identification using NetFV and NetVLAD”, Proc. of ISCSLP 2018.
Danwei Cai (*), Cai Zexin (*), Ming Li, “Deep Speaker Embedding with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition”, Proc. of APSIPA ASC 2018, 1478-1482.
Ming Li, Luting Wang (*), Zhicheng Xu (*), Danwei Cai (*), “Mandarin Electrolaryngeal Voice Conversion with Combination of Gaussian Mixture Model and Non-negative Matrix Factorization”, Proc. of APSIPA ASC 2017, 1360-1363.
Tianyan Zhou (*), Yixiang Xie (*), Xiaobing Zou, Ming Li, “An Automated Assessment Framework for Speech Abnormalities related to Autism Spectrum Disorder”, Proc. of INTERSPEECH 2017 Satellite Workshop ASMMC 2017.
Jing Pan (*), Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu, “An audio based piano performance evaluation method using deep neural network based acoustic modeling“, Proc. of INTERSPEECH 2017, 3088-3092.
Weicheng Cai (*), Danwei Cai (*), Wenbo Liu (*), Gang Li, Ming Li, “Countermeasures for Automatic Speaker Verification Replay Spoofing Attack: On Data Augmentation, Feature Representation, Classification and Fusion“, Proc. of INTERSPEECH 2017, 17-21.
Danwei Cai (*), Zhidong Ni (*), Wenbo Liu (*), Weicheng Cai (*), Gang Li, Ming Li, “End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum“, Proc. of INTERSPEECH 2017, 3452-3456.
Jinkun Chen (*), Ming Li, “Automatic Emotional Spoken Language Text Corpus Construction from Written Dialogs in Fictions“, Proc. of ACII 2017, 319-324.
Wenbo Liu (*), Xiaobin Zou, Ming Li, “Response to Name: A Dataset and A Multimodal Machine Learning Framework towards Autism Study“, Proc. of ACII 2017, 178-183.
Weiyang Liu, Yandong Wen (*), Zhiding Yu, Ming Li, Bhiksha Raj, Le Song, “SphereFace: Deep Hypersphere Embedding for Face Recognition“, Proc. of CVPR 2017, 6738-6746.
Yandong Wen, Weiyang Liu, Meng Yang , and Ming Li, “Efficient Misalignment-Robust Face Recognition Via Locality-Constrained Representation”, Proc. of ICIP, 2016, page 3021-3025
Wei Fang (*), Jianwen Zhang, Dilin Wang, Zheng Chen, and Ming Li. “Entity Disambiguation by Knowledge and Text Jointly Embedding”, Proc. of CoNLL, 2016, pages 260–269.
Tianyan Zhou (*), Weicheng Cai (*), Xiaoyan Chen, Xiaobing Zou, Shilei Zhang, Ming Li, “Speaker Diarization System for Autism Children’s Real-Life Audio Data“, Proc. of ISCSLP 2016.
Danwei Cai (*), Weicheng Cai (*), Ming Li, “Locality Sensitive Discriminant Analysis for Speaker Recognition”, Proc. of APSIPA ASC 2016.
Gaoyuan He (*), Jinkun Chen (*), Xuebo Liu (*), Ming Li, “The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge”, Proc. of CCPR 2016, 707-720.
Huadi Zheng (*), Weicheng Cai(*) , Tianyan Zhou (*), Shilei Zhang, Ming Li, “Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features“, Proc. of ICPR 2016, 2872-2877.
Zhun Chen (*), Wenbo Zhao (*), Yuanwei Jin, Ming Li, Jimmy Zhu, “A Fast Tracking Algorithm for Estimating Ultrasonic Signal Time of Flight in Drilled Shafts Using Active Shape Models“, Proc. of IUS 2016.
Zhiding Yu, Weiyang Liu, Wenbo Liu (*), Yingzhen Yang, Ming Li and Vijayakumar Bhagavatula, “On Order-Constrained Transitive Distance Clustering“, Proc. of AAAI 2016, 2293-2299.
Shushan Chen (*), Yiming Zhou (*), Ming Li, “Automatic assessment of non-native accent degrees using phonetic level posterior and duration features from multiple languages“, Proc. of APSIPA ASC 2015, 156-159.
Shitao Weng (*), Shushan Chen (*), Lei Yu (*), Xuewei Wu (*), Weicheng Cai (*), Zhi Liu (*), Yiming Zhou (*), Ming Li, “The SYSU system for the INTERSPEECH 2015 automatic speaker verification spoofing and countermeasures challenge“, Proc. of APSIPA ASC 2015, 152-155.
Yingxue Wang, Shenghui Zhao, Wenbo Liu (*), Ming Li, Jingming Kuang, “Speech bandwidth expansion based on deep neural networks”, Proc. of Interspeech 2015, pp. 2593–2597
Wenbo Liu (*), Zhiding Yu, Li Yi, Bhiksha Raj, Ming Li, “Efficient Autism Spectrum Disorder Diagnosis with Eye Movement: A Machine Learning Framework“, Proc. of ACII 2015, 649-655.
Wenbo Liu (*), Zhiding Yu, Bhiksha Raj, Ming Li, “Locality Constrained Transitive Distance Clustering on Speech Data“, Proc. of INTERSPEECH 2015, 2917-2921.
Qingyang Hong, Lin Li, Ming Li, Ling Huang, Jun Zhang, “Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System“, Proc. of INTERSPEECH 2015, 1037-1041.
Weicheng Cai (*), Ming Li, Lin Li, Qingyang Hong, “Duration Dependent Covariance Regularization in PLDA Modeling for Speaker Verification“, Proc. of INTERSPEECH 2015, 1027-2031.
Ming Li, “speaker verification with the mixture of Gaussian factor analysis based representation”, Proc. of ICASSP 2015, 4679-4683.
Wenbo Liu (*), Zhiding Yu, Ming Li, “An Iterative Framework for Unsupervised Learning in the PLDA basedSpeaker Verification”, Proc. of ISCSLP 2014, 78-82.
Ming Li, Wenbo Liu (*), “Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features“, Proc. of INTERSPEECH 2014, 1120-1124.
Ming Li, “Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens“, Proc. of INTERSPEECH 2014, 437-441.
Prashanth Gurunath Shivakumar, Ming Li, Vedant Dhandhania, Shrikanth S.Narayanan, “Simplified and supervised i-vector modeling for speaker age regression”, Proc. of ICASSP 2014, 4833-4837.
Ming Li, Xin Li, “Verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion”, Proc. of ICASSP 2014, 3769-3773.
Ming Li, Andreas Tsiartas, Maarten Van Segbroeck, Shrikanth S. Narayanan, “Speaker verification using simplified and supervised i-vector modeling”, Proc. of ICASSP 2013, 7199-7203.
Ming Li, Adam Lammert, Jangwon Kim, Prasanta Ghosh, Shrikanth Narayanan, “Automatic Classification of Palatal and Pharyngeal Wall Morphology Patterns from Speech Acoustics and Inverted Articulatory Signals”, INTERSPEECH Satellite Workshop on Speech Production in Automatic Speech Recognition, 2013.
Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan and Shrikanth Narayanan, “Speaker verification based on fusion of acoustic and articulatory information”, Proc. of INTERSPEECH 2013, 1614-1618.
Andreas Tsiartas, Theodora Chaspari, Nassos Katsamanis, Prasanta Ghosh, Ming Li, Maarten Van Segbroeck, Alexandros Potamianos, Shrikanth Narayanan, “Multi-band long-term signal variability features for robust voice activity detection”, Proc. of INTERSPEECH 2013, 718-722.
Kyu Jeong Han, Sriram Ganapathy, Ming Li, Mohamed K. Omar, Shrikanth Narayanan, “TRAP Language Identification System for RATS Phase II Evaluation“, Proc. of INTERSPEECH 2013, 1502-1506.
Daniel Bone, Theodora Chaspari, Kartik Audhkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, Shrikanth Narayanan, “Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds“, Proc. of INTERSPEECH 2013, 182-186.
Ming Li, Charley Lu, Anne Wang, Shrikanth Narayanan, “Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis”, Proc. of APSIPA ASC 2012
Ming Li, Angeliki Metallinou, Daniel Bone, Shrikanth Narayanan, “Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling“, Proc. of ICASSP 2012, 1937-1940.
Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth Narayanan, “Intelligibility classification of pathological speech using fusion of multiple high level descriptors“, Proc. of INTERSPEECH 2012, 534-537.
Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth Narayanan, “Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network“, Proc. of INTERSPEECH 2012, 262-265.
Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan, “Speaker Verification using Sparse Representations on Total Variability I-Vectors”, Proc. of INTERSPEECH 2011, 2729-2732.
Ming Li, Shrikanth Narayanan, “Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors”, Proc. of ICASSP 2011, 1481-1484.
Samuel Kim, Ming Li, Sangwon Lee, Urbashi Mitra, Adar Emken, Donna Spruijt-Metz, Murali Annavaram, Shrikanth Narayanan, “Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals“, Proc. of EMBC 2011, 6033-6036.
Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth Narayanan, “Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors“, Proc. of INTERSPEECH 2011, 3217-3220.
Ming Li, Shrikanth Narayanan, “Robust ECG biometrics by fusing temporal and cepstral information”, Proc. of ICPR 2010, 1326-1329.
Ming Li, Chi-Sang Jung, Kyu Jeong Han, “Combining Five Acoustic Level methods for Automatic Speaker Age and Gender Recognition”, Proc. of INTERSPEECH 2010, 2826-2829.
Gautam Thatte, Viktor Rozgic, Ming Li, Sabyasachi Ghosh, Urbashi Mitra, Shri Narayanan, Murali Annavaram, Donna Spruijt-Metz, “Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection,” Proc. of DCOSS 2009, 273-286.
Gautam Thatte, Ming Li, Adar Emken, Urbashi Mitra, Shri Narayanan, Murali Annavaram, Donna Spruijt-Metz, “Energy-Efficient Multihypothesis Activity-Detection for Health-Monitoring Applications“, Proc. of EMBC 2009, 4678-4681.
Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, Yonghong Yan, “Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping”, Proc. of INTERSPEECH 2008, 151-154.
Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan, “Spoken Language Identification Using Score Vector Modeling and Support Vector Machine”, Proc. of INTERSPEECH 2007, 350-353.
Ming Li, Yun Lei, Xiang Zhang, Jian Liu, Yonghong Yan, “Authentication and quality monitoring based audio watermark for analog AM shortwave broadcasting”, Proc. of IIH-MSP 2007, 263-266.
Ming Li, Yun Lei, Jian Liu, Yonghong Yan, “A Novel Audio Watermarking in Wavelet Domain“, Proc. of IIH-MSP 2006, 27-32.