Papers
updated
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized
Representation
Paper
• 2311.07965
• Published
• 1
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
Embedding
Paper
• 2311.08673
• Published
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control
and Contrastive Learning with Negative Samples Augmentation
Paper
• 2311.08670
• Published
Stock Volatility Prediction Based on Transformer Model Using
Mixed-Frequency Data
Paper
• 2309.16196
• Published
Sparks of Large Audio Models: A Survey and Outlook
Paper
• 2308.12792
• Published
Research on the Impact of Executive Shareholding on New Investment in
Enterprises Based on Multivariable Linear Regression Model
Paper
• 2309.10986
• Published
A Hierarchy-based Analysis Approach for Blended Learning: A Case Study
with Chinese Students
Paper
• 2309.10218
• Published
An Empirical Study of Attention Networks for Semantic Segmentation
Paper
• 2309.10217
• Published
Contrastive Latent Space Reconstruction Learning for Audio-Text
Retrieval
Paper
• 2309.08839
• Published
AOSR-Net: All-in-One Sandstorm Removal Network
Paper
• 2309.08838
• Published
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
Paper
• 2309.08837
• Published
DiffTalker: Co-driven audio-image diffusion for talking faces via
intermediate landmarks
Paper
• 2309.07509
• Published
Machine Unlearning Methodology base on Stochastic Teacher Network
Paper
• 2308.14322
• Published
Voice Conversion with Denoising Diffusion Probabilistic GAN Models
Paper
• 2308.14319
• Published
Symbolic & Acoustic: Multi-domain Music Emotion Modeling for
Instrumental Music
Paper
• 2308.14317
• Published
• 2
Improving Music Genre Classification from Multi-Modal Properties of
Music and Genre Correlations Perspective
Paper
• 2303.07667
• Published
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech
Synthesis
Paper
• 2306.00648
• Published
• 1
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
Speech Model
Paper
• 2304.11547
• Published
Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy
Paper
• 2303.07687
• Published
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis
Paper
• 2303.07682
• Published
Improving EEG-based Emotion Recognition by Fusing Time-frequency And
Spatial Representations
Paper
• 2303.11421
• Published
• 1
Linguistic-Enhanced Transformer with CTC Embedding for Speech
Recognition
Paper
• 2210.14725
• Published
Improving Imbalanced Text Classification with Dynamic Curriculum
Learning
Paper
• 2210.14724
• Published
Semi-Supervised Learning Based on Reference Model for Low-resource TTS
Paper
• 2210.14723
• Published
MetaSpeech: Speech Effects Switch Along with Environment for Metaverse
Paper
• 2210.13811
• Published
Improving Speech Representation Learning via Speech-level and
Phoneme-level Masking Approach
Paper
• 2210.13805
• Published
Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch
Disentangling with Untranscribed Data
Paper
• 2210.13803
• Published
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging
Talking Avatar
Paper
• 2210.06877
• Published
Boosting Star-GANs for Voice Conversion with Contrastive Discriminator
Paper
• 2209.10088
• Published
Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech
Separation
Paper
• 2206.13689
• Published
SUSing: SU-net for Singing Voice Synthesis
Paper
• 2205.11841
• Published
TDASS: Target Domain Adaptation Speech Synthesis Framework for
Multi-speaker Low-Resource TTS
Paper
• 2205.11824
• Published
MetaSID: Singer Identification with Domain Adaptation for Metaverse
Paper
• 2205.11821
• Published
Singer Identification for Metaverse with Timbral and Middle-Level
Perceptual Features
Paper
• 2205.11817
• Published
MDCNN-SID: Multi-scale Dilated Convolution Network for Singer
Identification
Paper
• 2004.04371
• Published
Investigation of Singing Voice Separation for Singing Voice Detection in
Polyphonic Music
Paper
• 2004.04040
• Published
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised
Learning
Paper
• 2202.10976
• Published
nnSpeech: Speaker-Guided Conditional Variational Autoencoder for
Zero-shot Multi-speaker Text-to-Speech
Paper
• 2202.10712
• Published
AVQVC: One-shot Voice Conversion by Vector Quantization with applying
contrastive learning
Paper
• 2202.10020
• Published
• 1
Singer Identification Using Deep Timbre Feature Learning with KNN-Net
Paper
• 2102.10236
• Published
TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and
Adversarial Training
Paper
• 2208.04035
• Published
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice
Conversion
Paper
• 2308.11084
• Published
Medical Speech Symptoms Classification via Disentangled Representation
Paper
• 2403.05000
• Published