国際会議

2025

D. Yang, Y. Cai, Y. Saito, L. Wang, and H. Saruwatari, “Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis,” in NeurIPS 2025, Dec. 2025. (accepted)
G. Nishikawa, W. Nakata, Y. Saito, K. Imamura, H. Saruwatari, and T. Nakamura, “Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025. (accepted)
J. Chen, Y. Saito, D. Yang, N. Tanji, H. Doi, B. Park, Y. Shirahata, K. Tachibana, and H. Saruwatari, “CAVIARES: Corpus for Audio-Visual Expressive Voice Agent,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025. (accepted)
J. Park, S. Takamichi, D. M. Chan, S. Kando, Y. Saito, and H. Saruwatari, “Analysing the Language of Neural Audio Codecs,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2025. (accepted)
J. Yang, S. Li, T. Shinozaki, Y. Saito, and H. Saruwatari, “Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct. 2025. (accepted)
T. Takano, Y. Okamoto, Y. Kanamori, Y. Saito, R. Nagase, and H. Saruwatari, “Human-CLAP: Human-perception-based contrastive language-audio pretraining,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct. 2025. (accepted)
R. Nobukawa, M. Kitamura, T. Nakamura, S. Takamichi, and H. Saruwatari, “Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct. 2025. (accepted)
K. Seki, S. Takamichi, T. Saeki, and H. Saruwatari, “Active Learning for Text-to-Speech Synthesis with Informative Sample Collection,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct. 2025. (accepted)
K. Yamaoka, K. Morita, N. Takamune, and H. Saruwatari, “Auxiliary-Function-Based Decentralized Independent Vector Analysis for Distributed Microphone Arrays,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Oct. 2025. (accepted)
Y. Ishikawa, N. Takamune, K. Yamaoka, and H. Saruwatari, “Theoretical Formulation of Online Independent Vector Analysis Using Framewise Probabilistic Generative Model,” in Proceedings of European Signal Processing Conference (EUSIPCO), Sep. 2025. (accepted)
K. Imamura, T. Nakamura, N. Takamune, K. Yatabe, and H. Saruwatari, “Local Equivariance Error-Based Metrics for Evaluating Sampling-Frequency-Independent Property of Neural Network,” in Proceedings of European Signal Processing Conference (EUSIPCO), Sep. 2025. (accepted)
K. Takemoto, T. Nakamura, and H. Saruwatari, “Toward Score-Informed Musical Audio Editing System using Differentiable Digital Signal Processing Mixture Model,” in International Society for Music Information Retrieval Conference (ISMIR 2025) Late-Breaking Demo, Sep. 2025. (accepted)
R. Nobukawa, T. Nakamura, S. Takamichi, and H. Saruwatari, “Real-Time Drum-to-Vocal Percussion Sound Conversion System,” in International Society for Music Information Retrieval Conference (ISMIR 2025) Late-Breaking Demo, Sep. 2025. (accepted)
A. Wada, T. Nakamura, and S. Hiroshi, “Hyperbolic embeddings for order-aware classification of audio effect chains,” in The 28th International Conference on Digital Audio Effects (DAFx25), Sep. 2025. (accepted)
R. Matsushita, R. Sakai, K. Fukuda, S. Takamichi, K. Iura, Y. Saito, G. Neubig, K. Sudoh, H. Takamura, and T. Ishigaki, “Measuring time delay tolerance in third-person live commentary for Super Smash Bros. Ultimate,” in Proceedings of Conference on Games, Aug. 2025. (accepted)
Y. Kanamori, Y. Okamoto, T. Takano, S. Takamichi, Y. Saito, and H. Saruwatari, “RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio,” in Proceedings of Interspeech, Aug. 2025. (accepted)
S. Hirata, N. Takamune, K. Yamaoka, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Analysis of Projection-Back-Free Property in Fully Blind Spatially Regularized Independent Low-Rank Matrix Analysis,” in Proceedings of international congress on acoustics (ICA), May 2025.
Y. Ishikawa, T. Nakamura, N. Takamune, and H. Saruwatari, “Hearing-Aids System Using Distributed Assistive Device and Blind Speech Extraction Method under Diffuse Noise,” in Proceedings of international congress on acoustics (ICA), May 2025.
R. Ogawa, Y. Yonekura, N. Ito, N. Takamune, K. Yamaoka, Y. Saito, and H. Saruwatari, “Semi-supervised deep monaural speech enhancement with positive-negative-unlabeled learning,” in Proceedings of international congress on acoustics (ICA), May 2025.
S. Hirata, N. Takamune, K. Yamaoka, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Design of New Auxiliary Function for Fully Blind Spatially Regularized Independent Low-Rank Matrix Analysis,” in Proceedings of international congress on acoustics (ICA), May 2025.
“Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features,” in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5.

2024

Wataru Nakata*, Kazuki Yamauchi*, Dong Yang, Hiroaki Hyodo, Yuki Saito (*Equal contribution), “UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge,” in SLT 2024 Recent Breakthrough Results, Dec. 2024.
K. Yamauchi, W. Nakata, Y. Saito, and H. Saruwatari, “Decoding Strategy with Perceptual Rating Prediction for Language Model-Based Text-to-Speech Synthesis,” in Audio Imagination: NeurIPS 2024 Workshop on AI-Driven Speech, Music, and Sound Generation, Dec. 2024.
J. Lee, M. Tailleur, M. Lagrange, K. Choi, L. Heller, B. McFee, K. Imoto, and Y. Okamoto, “Challenges in Text-to-Audio Synthesis: From Foley to Sound Scenes,” in Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation, Dec. 2024.
W. Nakata, T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “NecoBERT: Self-Supervised Learning Model Trained by Masked Language Modeling on Rich Acoustic Features Derived from Neural Audio Codec,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2024.
J. X. Teh, N. Takamune, H. Saruwatari, B. Yen, M. Kingan, and Y. Hioka, “Beamforming informed independent low-rank matrix analysis for sound source enhancement in unmanned aerial vehicles,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2024.
Y. Ishikawa, O. Take, T. Nakamura, N. Takamune, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-Time Noise Estimation for Lombard-Effect Speech Synthesis in Human–Avatar Dialogue Systems,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2024.
H. Hyodo, S. Takamichi, T. Nakamura, J. Koguchi, and H. Saruwatari, “DNN-based ensemble singing voice synthesis with interactions between singers,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Dec. 2024.
K. Yamauchi, Y. Saito, and H. Saruwatari, “Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Dec. 2024.
K. Baba, W. Nakata, Y. Saito, and H. Saruwatari, “The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Dec. 2024.
T. Kojima, N. Takamune, D. Kitamura, and H. Saruwatari, “Design of Spectrogram-Consistency Regularization Term Dependent on Observation in Independent Low-Rank Matrix Analysis for Blind Source Separation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2024.
S. Hirata, N. Takamune, K. Yamaoka, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Auxiliary-Function-Based Steering Vector Estimation Method for Spatially Regularized Independent Low-Rank Matrix Analysis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2024.
T. Igarashi, Y. Saito, K. Seki, S. Takamichi, R. Yamamoto, K. Tachibana, and H. Saruwatari, “Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment,” in Proceedings of Interspeech, Sep. 2024.
H. Suda, A. Watanabe, and S. Takamichi, “Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data,” in Proceedings of Interspeech, Sep. 2024.
O. Take, S. Takamichi, K. Seki, Y. Bando, and H. Saruwatari, “SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis,” in Proceedings of Interspeech, Sep. 2024.
D. Yang, T. Koriyama, and Y. Saito, “Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech,” in Proceedings of Interspeech, Sep. 2024.
Y. Ishikawa, T. Nakamura, N. Takamune, and H. Saruwatari, “Real-Time Framework for Speech Extraction Based on Independent Low-Rank Matrix Analysis with Spatial Regularization and Rank-Constrained Spatial Covariance Matrix Estimation,” in Workshop on Spoken Dialogue Systems for Cybernetic Avatars, Sep. 2024.
Y. Saito, T. Igarashi, K. Seki, S. Takamichi, R. Yamamoto, K. Tachibana, and H. Saruwatari, “SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark,” in Proceedings of Interspeech, Sep. 2024.
T. Saeki, S. Maiti, S. Takamichi, S. Watanabe, and H. Saruwatari, “SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics,” in Proceedings of Interspeech, Sep. 2024.
M. Tailleur, J. Lee, M. Lagrange, K. Choi, L. Heller, K. Imoto, and Y. Okamoto, “Correlation of Frechet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant,” in European Signal Processing Conference (EUSIPCO 2024), Sep. 2024.
O. Take, K. Watanabe, T. Nakatsuka, T. Cheng, T. Nakano, M. Goto, S. Takamichi, and H. Saruwatari, “Audio Effect Chain Estimation and Dry Signal Recovery from Multi-Effect-Processed Musical Signals,” in The 27th International Conference on Digital Audio Effects (DAFx24), Sep. 2024.
K. Seki, S. Takamichi, N. Takamune, Y. Saito, K. Imamura, and H. Saruwatari, “Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals,” in Proceedings of Interspeech, Sep. 2024.
S. Kando, Y. Miyao, J. Naradowsky, and S. Takamichi, “Textless Dependency Parsing by Labeled Sequence Prediction,” in Proceedings of Interspeech, Sep. 2024.
Y. Ishikawa, K. Konaka, T. Nakamura, N. Takamune, and H. Saruwatari, “Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Apr. 2024.
K. Seki, S. Takamichi, T. Saeki, and H. Saruwatari, “Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 12351–12355.
K. Yamauchi, Y. Ijima, and Y. Saito, “STYLECAP: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-Supervised Learning Models,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 11261–11265.
Y. Okamoto, K. Imoto, S. Takamichi, R. Nagase, T. Fukumori, and Y. Yamashita, “Environmental Sound Synthesis from Vocal Imitations and Sound Event Labels,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 411–415.
S. Takamichi, H. Maeda, J. Park, D. Saito, and H. Saruwatari, “Do Learned Speech Symbols Follow Zipf’s Law?,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 12526–12530.

2023

A. Watanabe, S. Takamichi, Y. Saito, W. Nakata, D. Xin, and H. Saruwatari, “Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2023, pp. 1–8.
X. Li, S. Takamichi, T. Saeki, W. Chen, S. Shiota, and S. Watanabe, “YODAS: Youtube-Oriented Dataset for Audio and Speech,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2023, pp. 1–8.
Y. Ishikawa, S. Takamichi, Y. Matsunaga, Y. Yoshikawa, and S. Fujiwara, “Speaking Practice Using Text-to-speech Technology: Japanese EFL Learners’ Perceptions,” in Proceedings of WorldCall, Nov. 2023.
S. Misawa, N. Takamune, K. Yatabe, D. Kitamura, and H. Saruwatari, “Blind Source Separation Using Independent Low-Rank Matrix Analysis with Spectrogram-Consistency Regularization,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2023, pp. 1050–1057.
K. Imamura, T. Nakamura, N. Takamune, K. Yatabe, and H. Saruwatari, “Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides,” in Proceedings of European Signal Processing Conference (EUSIPCO 2023), Sep. 2023, pp. 326–330.
K. Nishida, N. Takamune, R. Ikeshita, D. Kitamura, H. Saruwatari, and T. Nakatani, “NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction,” in Proceedings of European Signal Processing Conference (EUSIPCO 2023), Sep. 2023, pp. 925–929.
Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion,” in Proceedings of The 12th ISCA SSW, Grenoble, Aug. 2023, pp. 62–68.
R. Hirai, Y. Saito, and H. Saruwatari, “Federated learning for human-in-the-loop many-to-many voice conversion,” in Proceedings of The 12th ISCA SSW, Grenoble, Aug. 2023, pp. 94–99.
D. Xin, S. Takamichi, A. Morimatsu, and H. Saruwatari, “Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus,” in Proceedings of Interspeech, Aug. 2023, pp. 17–21.
Y. Ueda, S. Takamichi, Y. Saito, N. Takamune, and H. Saruwatari, “HumanDiffusion: diffusion model using perceptual gradients,” in Proceedings of Interspeech, Aug. 2023, pp. 4264–4268.
Y. Saito, E. Iimori, S. Takamichi, K. Tachibana, and H. Saruwatari, “CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center,” in Proceedings of Interspeech, Aug. 2023, pp. 5561–5565.
Y. Saito, S. Takamichi, E. Iimori, K. Tachibana, and H. Saruwatari, “ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings,” in Proceedings of Interspeech, Aug. 2023, pp. 3048–3052.
J. Park, S. Takamichi, T. Nakamura, K. Seki, D. Xin, and H. Saruwatari, “How Generative Spoken Language Model Encodes Noisy Speech: Investigation from Phonetics to Syntactics,” in Proceedings of Interspeech, Aug. 2023, pp. 1085–1089.
T. Umemoto, S. Takamichi, Y. Matsunaga, Y. Yoshikawa, K. Yui, K. Sakamoto, S. Fujiwara, and Y. Ishikawa, “Effects of text-to-speech synthesized speech on learners’ presentation anxiety and self-efficacy: A comparison of two models,” in Proceedings of EUROCALL, Aug. 2023.
T. Saeki, S. Maiti, X. Li, S. Watanabe, S. Takamichi, and H. Saruwatari, “Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) Main Track, Aug. 2023, pp. 5179–5187.
K. Seki, S. Takamichi, T. Saeki, and H. Saruwatari, “Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023.
S. Maiti, Y. Peng, T. Saeki, and S. Watanabe, “SpeechLMScore: Evaluating speech generation using speech language model,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
T. Saeki, H. Zen, Z. Chen, N. Morioka, G. Wang, Y. Zhang, A. Bapna, A. Rosenberg, and B. Ramabhadran, “Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
D. Xin, S. Adavanne, F. Ang, A. Kulkarni, S. Takamichi, and H. Saruwatari, “Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
K. Arikawa, S. Koyama, and H. Saruwatari, “Spatial active noise control method based on sound field interpolation from reference microphone signals,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
D. Yang, T. Koriyama, Y. Saito, T. Saeki, D. Xin, and H. Saruwatari, “Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
M. Kawamura, Y. Shirahata, R. Yamamoto, and K. Tachibana, “Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
T. Nakamura, S. Kawano, A. Yuguchi, Y. Kawanishi, and K. Yoshino, “What Should the System Do Next?: Operative Action Captioning for Estimating System Actions,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), Jun. 2023, pp. 6124–6130.
T. Nakamura, S. Takamichi, N. Tanji, S. Fukayama, and H. Saruwatari, “JaCappella corpus: A Japanese a cappella vocal ensemble corpus,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
A. Watanabe, S. Takamichi, Y. Saito, D. Xin, and H. Saruwatari, “Mid-attribute Speaker Generation using Optimal-Transport-based Interpolation of Gaussian Mixture Models,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023, pp. 1–5.
H. Ohnaka, S. Takamichi, K. Imoto, Y. Okamoto, K. Fujii, and H. Saruwatari, “Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2023.
K. Choi, J. Im, L. Heller, B. McFee, K. Imoto, Y. Okamoto, M. Lagrange, and S. Takamichi, “Foley Sound Synthesis at the DCASE 2023 Challenge,” in Proceedings of DCASE Challenge 2023, May 2023.
K. Arai, Y. Hirao, T. Narumi, T. Nakamura, S. Takamichi, and S. Yoshida, “TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal Correspondences,” in Proceedings of ACM Conference on Intelligent User Interfaces (IUI), Mar. 2023, pp. 850–865.
Y. Nakano, T. Saeki, S. Takamichi, K. Sudoh, and H. Saruwatari, “VTTS: visual-text to speech,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Jan. 2023, pp. 936–942.

2022

F. Nakashima, T. Nakamura, N. Takamune, S. Fukayama, and H. Saruwatari, “Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022, pp. 735–742.
Y. Okamoto, K. Imoto, S. Takamichi, T. Fukumori, and Y. Yamashita, “How Should We Evaluate Synthesized Environmental Sounds,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022, pp. 307–312.
K. Fujii, Y. Saito, and H. Saruwatari, “Adaptive End-To-End Text-To-Speech Synthesis Based on Error Correction Feedback From Humans,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022, pp. 1702–1707.
Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022, pp. 1898–1903.
Y. Nakai, K. Udagawa, Y. Saito, and H. Saruwatari, “Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-To-Speech,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022, pp. 743–748.
Y. Saito, Y. Nishimura, S. Takamichi, K. Tachibana, and H. Saruwatari, “STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent,” in Proceedings of Interspeech, Sep. 2022, pp. 5155–5159.
K. Shigemi, S. Koyama, T. Nakamura, and H. Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Sep. 2022, pp. 1–5.
Y. Nishimura, Y. Saito, S. Takamichi, K. Tachibana, and H. Saruwatari, “Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History,” in Proceedings of Interspeech, Sep. 2022, pp. 3373–3377.
S. Takamichi, W. Nakata, N. Tanji, and H. Saruwatari, “J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis,” in Proceedings of Interspeech, Sep. 2022, pp. 2358–2362.
W. Nakata, T. Koriyama, S. Takamichi, Y. Saito, Y. Ijima, R. Masumura, and H. Saruwatari, “Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,” in Proceedings of Interspeech, Sep. 2022, pp. 4551–4555.
Y. Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Sep. 2022, pp. 1–5.
T. Saeki*, D. Xin*, W. Nakata*, T. Koriyama, S. Takamichi, and H. S. (*E. contribution), “UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,” in Proceedings of Interspeech, Sep. 2022, pp. 4521–4525.
T. Saeki, S. Takamichi, T. Nakamura, N. Tanji, and H. Saruwatari, “SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling,” in Proceedings of Interspeech, Sep. 2022, pp. 4406–4410.
K. Udagawa, Y. Saito, and H. Saruwatari, “Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS,” in Proceedings of Interspeech, Sep. 2022, pp. 2968–2972.
D. Xin, S. Takamichi, and H. Saruwatari, “Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations,” in Proceedings of ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022, Jul. 2022.
Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Personalized filled-pause generation with group-wise prediction models,” in Proceedings of Language Resources and Evaluation Conference (LREC), Jun. 2022, pp. 385–392.
H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Robustness of signal processing-based pseudonymization method against decryption attack,” in Proceedings of Odyssey, Jun. 2022, pp. 287–293.
N. Kimura, Z. Su, T. Saeki, and J. Rekimoto, “SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition,” in Proceedings of Language Resources and Evaluation Conference (LREC), Jun. 2022, pp. 6866–6873.
J. G. C. Ribeiro, S. Koyama, and H. Saruwatari, “Region-to-region kernel interpolation of acoustic transfer function with directional weighting,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022, pp. 576–580.
K. Arikawa, S. Koyama, and H. Saruwatari, “Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022, pp. 1056–1060.
M. Kawamura, T. Nakamura, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022, pp. 941–945.

2021

S. Koyama, T. Nishida, K. Kimura, T. Abe, N. Ueno, and J. Brunnström, “MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 1–5, Oct. 2021.
K. Kimura, S. Koyama, N. Ueno, and H. Saruwatari, “Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis With Prior Information on Desired Field,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 281–285, Oct. 2021, pp. 281–285.
R. Ominato, N. Wakui, S. Takamichi, and S. Yano, “Discriminating between left and right ears using linear and nonlinear dimensionality reduction,” in SmaSys2021, Oct. 2021.
R. Arakawa, Z. Kashino, S. Takamichi, A. A. Verhulst, and M. Inami, “Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation,” in ACM ICMI, Oct. 2021, pp. 159–167.
R. Horiuchi, S. Koyama, J. G. C. Ribeiro, N. Ueno, and and Hiroshi Saruwatari, “Kernel learning for sound field estimation with L1 and L2 regularizations,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021, pp. 261–265.
N. Narisawa, R. Ikeshita, N. Takamune, D. Kitamura, T. Nakamura, H. Saruwatari, and T. Nakatani, “Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 326–330.
T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 331–335.
K. Saito, T. Nakamura, K. Yatabe, Y. Koizum, and H. Saruwatari, “Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 321–325.
K. Yufune, T. Koriyama, S. Takamichi, and H. Saruwatari, “Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,” in Proceedings of The 11th ISCA SSW, Aug. 2021, pp. 189–194.
T. Nakamura, T. Koriyama, and H. Saruwatari, “Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,” in Proceedings of Interspeech, Aug. 2021, pp. 121–125.
D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,” in Proceedings of Interspeech, Aug. 2021, pp. 1614–1618.
W. Nakata, T. Koriyama, S. Takamichi, N. Tanji, Y. Ijima, R. Masumura, and H. Saruwatari, “Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,” in Proceedings of The 11th ISCA SSW, Aug. 2021, pp. 211–215.
K. Mizuta, T. Koriyama, and H. Saruwatari, “Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,” in Proceedings of Interspeech, Aug. 2021, pp. 2192–2196.
Y. Ueda, K. Fujii, Y. Saito, S. Takamichi, Y. Baba, and H. Saruwatari, “HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 6468–6472.
D. Xin, T. Komatsu, S. Takamichi, and H. Saruwatari, “Disentangled Speaker and Language Representations using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 6608–6612.
Y. Ishikawa, S. Takamichi, T. Umemoto, M. Aikawa, K. Sakamoto, K. Yui, S. Fujiwara, A. Suto, and K. Nishiyama, “Japanese EFL learners’ speaking practice utilizing text-to-speech technology within a team-based flipped learning framework,” in Proceedings of International Conference on Human-Computer Interaction (HCII), Jun. 2021, pp. 283–291.
Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient basis estimation of noise spatial covariance matrix for rank-constrained spatial covariance matrix estimation method in blind speech extraction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 806–810.
H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight voice anonymization based on data-driven optimization of cascaded voice modification modules,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Jan. 2021, pp. 560–566.
T. Nishida, N. Ueno, S. Koyama, and H. Saruwatari, “Sensor Placement in Arbitrarily Restricted Region for Field Estimation Based on Gaussian Process,” in Proceedings of European Signal Processing Conference (EUSIPCO), Jan. 2021, pp. 2289–2293.
J. Brunnström and S. Koyama, “Kernel-Interpolation-Based Filtered-X Least Mean Square for Spatial Active Noise Control in Time Domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 161–165, 2021.
N. Ueno, S. Koyama, and H. Saruwatari, “Convex and Differentiable Formulation for Inverse Problems in Hilbert Spaces with Nonlinear Clipping Effects,” in IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2021.
S. Koyama, K. Kimura, and N. Ueno, “Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation,” in International Conference on Immersive and 3D Audio (I3DA), 2021. (invited)
S. Koyama, T. Amakasu, N. Ueno, and H. Saruwatari, “Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only With Amplitude Constraint,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021, pp. 411–415.
S. Koyama, J. Brunnström, H. Ito, N. Ueno, and and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, pp. 3052–3063.
T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 1226–1233.
X. Luo, S. Takamichi, T. Koriyama, Y. Saito, and H. Saruwatari, “Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 794–799.
S. Misawa, N. Takamune, T. Nakamura, D. Kitamura, H. Saruwatari, M. Une, and S. Makino, “Speech enhancement by noise self-supervised rank-constrained spatial covariance matrix estimation via independent deeply learned matrix analysis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021.
T. Saeki, S. Takamichi, and H. Saruwatari, “Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 749–756.

2020

K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Student’s t-distribution,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2020, pp. 869–874.
J. Koguchi, S. Takamichi, and M. Morise, “PJS: phoneme-balanced Japanese singing-voice corpus,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2020, pp. 487–491.
T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time, full-band, online DNN-based voice conversion system using a single CPU,” in Proceedings of Interspeech, Oct. 2020, pp. 1021–1022.
Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis,” in Proceedings of Detection and Classification of Acoustic Scenes and Events (DCASE), Oct. 2020, pp. 125–129.
M. Aso, S. Takamichi, and H. Saruwatari, “End-to-end text-to-speech synthesis with unaligned multiple language units based on attention,” in Proceedings of Interspeech, Oct. 2020, pp. 4009–4013.
D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,” in Proceedings of Interspeech, Oct. 2020, pp. 2947–2951.
N. Kimura, Z. Su, and T. Saeki, “End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge,” in Proceedings of Interspeech, Oct. 2020, pp. 1025–1026.
Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,” in Proceedings of Interspeech, Oct. 2020, pp. 3201–3205.
S. Goto, K. Ohnishi, Y. Saito, K. Tachibana, and K. Mori, “Face2Speech: towards multi-speaker text-to-speech synthesis using an embedding vector predicted from a face image,” in Proceedings of Interspeech, Oct. 2020, pp. 1321–1325.
K. Mitsui, T. Koriyama, and H. Saruwatari, “Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,” in Proceedings of Interspeech, Oct. 2020, pp. 2032–2036.
H. Takeuchi, K. Kashino, Y. Ohishi, and H. Saruwatari, “Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals,” in Proc. Interspeech, Sep. 2020, pp. 185–189.
N. Iijima, K. Shoichi, and H. Saruwatari, “Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements,” in IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2020, pp. 1–6.
S. Kozuka, T. Nakamura, and H. Saruwatari, “Investigation on Wavelet Basis Function of DNN-based Time Domain Audio Source Separation Inspired by Multiresolution Analysis,” in Proceedings of Internoise, Aug. 2020.
R. Okamoto, S. Yano, N. Wakui, and S. Takamichi, “Visualization of differences in ear acoustic characteristics using t-SNE,” in Proceedings of AES convention, May 2020.
T. Koriyama and H. Saruwatari, “Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 7249–7253.
T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Lifter training and sub-band modeling for computationally efficient and high-quality voice conversion using spectral differentials,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 7784–7788.
T. Nakamura and H. Saruwatari, “Time-domain Audio Source Separation based on Wave-U-Net Combined with Discrete Wavelet Transform,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 386–390.
K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 606–610.
T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, “Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student’s T Distribution,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 681–685.
Y. Saito, S. Takamichi, and H. Saruwatari, “SMASH corpus: a spontaneous speech corpus recording third-person audio commentaries on gameplay,” in Proceedings of Language Resources and Evaluation Conference (LREC), May 2020, pp. 6571–6577.
K. Ariga, T. Nishida, S. Koyama, N. Ueno, and H. Saruwatari, “Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 166–170.
Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,” in Proceedings of Language Resources and Evaluation Conference (LREC), May 2020, pp. 6438–6443.
H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 8404–8408. (invited)
S. Koyama, G. Chardon, and L. Daudet, “Optimizing Source and Sensor Placement for Sound Field Control: An Overview,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020. (overview)
G. Chardon, S. Koyama, and L. Daudet, “Numerical Evaluation of Source and Sensor Placement Methods For Sound Field Control,” in Forum Acusticum, 2020.

2019

N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Robust demixing filter update algorithm based on microphone-wise coordinate descent for independent deeply learned matrix analysis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 1868–1873.
Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 332–338.
M. Une, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, and S. Makino, “Evaluation of multichannel hearing aid system using rank-constrained spatial covariance matrix estimation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 1874–1879.
M. Nakanishi, N. Ueno, S. Koyama, and H. Saruwatari, “Two-dimensional sound field recording with multiple circular microphone arrays considering multiple scattering,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2019.
R. Arakawa, S. Takamichi, and H. Saruwatari, “TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication,” in Proceedings of UIST, New Orleans, Oct. 2019.
Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), A Coruña, Sep. 2019.
N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Column-wise update algorithm for independent deeply learned matrix analysis,” in Proceedings of international congress on acoustics (ICA), Aachen, Sep. 2019, pp. 2805–2812. [Young Scientist Conference Attendance Grant]
I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Speech Quality Evaluation of Synthesized Japanese Speech using EEG,” in Proceedings of Interspeech, Graz, Sep. 2019, pp. 1228–1232.
T. Nakamura, Y. Saito, S. Takamichi, Y. Ijima, and H. Saruwatari, “V2S attack: building DNN-based voice conversion from automatic speaker verification,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Three-dimensional spatial active noise control based on kernel-induced sound field interpolation,” in Proceedings of international congress on acoustics (ICA), Aachen, Sep. 2019.
M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
R. Arakawa, S. Takamichi, and H. Saruwatari, “Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
T. Koriyama, S. Takamichi, and T. Kobayashi, “Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,” in Proceedings of The 10th ISCA SSW, Vienna, Aug. 2019.
Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Comparison of Interpolation Methods for Gridless Sound Field Decomposition Based on Reciprocity Gap Functional,” in Proceedings of International Congress on Sound and Vibration (ICSV), Montreal, Jul. 2019. (to appear) [Invited]
I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “EEG Analysis towards Evaluating Synthesized Speech Quality,” in Proceedings of IEEE EMBC, Berlin, Jul. 2019.
N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and H. Nakajima, “Generalized-Gaussian-distribution-based independent deeply learned matrix analysis for multichannel audio source separation,” in Proceedings of International Congress and Exhibition on Noise Control Engineering (INTERNOISE), Madrid, Jun. 2019.
H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019.
K. Naruse, S. Yoshida, S. Takamichi, T. Narumi, T. Tanikawa, and M. Hirose, “Estimating Confidence in Voices using Crowdsourcing for Alleviating Tension with Altered Auditory Feedback,” in Proceedings of Asian CHI Symposium: Emerging HCI Research Collection in ACM Conference on Human Factors in Computing Systems (CHI), Glasgow, May 2019.
T. Koriyama and T. Kobayashi, “A Training Method Using DNN-guided Layerwise Pretraining for Deep Gaussian Processes,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019.
H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Feedforward Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019. (to appear)
Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Robust Gridless Sound Field Decomposotion Based on Structured Reciprocity Gap Functional in Spherical Harmonic Domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019. (to appear)
K. Yoshino, Y. Murase, N. Lubis, K. Sugiyama, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Spoken Dialogue Robot for Watching Daily Life of Elderly People,” in Proceedings of IWSDS, Sicily, Apr. 2019.

2018

M. Une, Y. Saito, S. Takamichi, D. Kitamura, R. Miyazaki, and H. Saruwatari, “Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018, pp. 99–103.
T. Akiyama, S. Takamichi, and H. Saruwatari, “Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, H. Nakajima, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018. [APSIPA ASC 2018 Best Paper Award]
H. Suda, G. Kotani, S. Takamichi, and D. Saito, “A revisit to feature handling for high-quality voice conversion,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
S. Shiota, S. Takamichi, and T. Matsui, “Data augmentation with moment-matching networks for i-vector based speaker verification,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
S. Koyama, “Sparsity-based sound field reconstruction,” in Tohoku Universal Acoustical Communication Month, Seminar on the spatial aspects of hearing and their applications, keynote lecture, Sendai, Oct. 2018. [Invited]
N. Ueno, S. Koyama, and H. Saruwatari, “Kernel Ridge Regression With Constraint of Helmholtz Equation for Sound Field Interpolation,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Sep. 2018, pp. 436–440.
S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Sep. 2018.
Y. Takida, S. Koyama, and H. Saruwatari, “Exterior and Interior Sound Field Separation Using Convex Optimization: Comparison of Signal Models,” in Proceedings of European Signal Processing Conference (EUSIPCO), Rome, Sep. 2018, pp. 2567–2571.
S. Mogami, H. Sumino, D. Kitamura, N. Takamune, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Rome, Sep. 2018.
Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Gridless Sound Field Decomposition Based on Reciprocity Gap Functional in Spherical Harmonic Domain,” in Proceedings of IEEE sensor array and multichannel signal processing workshop (SAM), Sheffield, Jul. 2018, pp. 627–631. [Best Student Paper Award, ONRG sponsored student travel grants]
S. Takamichi and H. Saruwatari, “CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects,” in Proceedings of Language Resources and Evaluation Conference (LREC), Miyazaki, May 2018, pp. 434–437.
S. Koyama, G. Chardon, and L. Daudet, “Joint Source and Sensor Placement for Sound Field Control Based on Empirical Interpolation Method,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 501–505.
N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Reproduction with Exterior Radiation Cancellation Using Analytical Weighting of Harmonic Coefficients,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 466–470. [IEEE SPS Japan Student Conference Paper Award]
Y. Saito, S. Takamichi, and H. Saruwatari, “Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 5299–5303.
Y. Saito, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 5274–5278.

2017

N. Mae, Y. Mitsui, S. Makino, D. Kitamura, N. Ono, T. Yamada, and H. Saruwatari, “Sound source localization using binaural different for hose-shaped rescue robot,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Dec. 2017.
Y. Mitsui, D. Kitamura, N. Takamune, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent low-rank matrix analysis based on parametric majorization-equalization algorithm,” in Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Curaçao, Dec. 2017.
S. Koyama and L. Daudet, “Comparison of Reverberation Models for Sparse Sound Field Decomposition,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2017, pp. 214–218.
S. Takamichi, D. Saito, H. Saruwatari, and N. Minematsu, “The UTokyo speech synthesis system for Blizzard Challenge 2017,” in Proceedings of Blizzard Challenge Workshop, Stockholm, Aug. 2017.
S. Takamichi, “Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Aug. 2017. [Invited]
D. Kitamura, N. Ono, and H. Saruwatari, “Experimental analysis of optimal window length for independent low-rank matrix analysis,” in Proceedings of Proceedings of 25th European Signal Processing Conference, Greek island of Kos, Aug. 2017. [Invited]
S. Takamichi, T. Koriyama, and H. Saruwatari, “Sampling-based speech parameter generation using moment-matching network,” in Proceedings of Interspeech, Stockholm, Aug. 2017.
H. Miyoshi, Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities,” in Proceedings of Interspeech, Stockholm, Aug. 2017.
S. Koyama, N. Murata, and H. Saruwatari, “Effect of Multipole Dictionary in Sparse Sound Field Decomposition For Super-resolution in Recording and Reproduction,” in Proceedings of International Congress on Sound and Vibration (ICSV), London, Jul. 2017. [Invited]
Y. Mitsui, D. Kitamura, S. Takamichi, N. Ono, and H. Saruwatari, “Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 21–25. [Student Paper Contest Finalist]
N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction Based On Circular Harmonic Expansion,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 111–115.
N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Spatio-temporal Sparse Sound Field Decomposition Considering Acoustic Source Signal Characteristics,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 441–445.
N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction With Gaussian Prior Based On Circular Harmonic Expansion,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), San Francisco, Mar. 2017, pp. 196–200.
R. Sato, H. Kameoka, and K. Kashino, “Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 5595–5599.
Y. Saito, S. Takamichi, and H. Saruwatari, “Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 4900–4904. [Spoken Language Processing Student Grant]
N. Mae, M. Ishimura, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation,” in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Grenoble, Feb. 2017, pp. 141–151.

2016

H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, Y. Takahashi, and K. Kondo, “Audio Signal Separation using Supervised NMF with Time-variant All-Pole-Model-Based Basis Deformation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Jeju, Dec. 2016. [Invited]
S. Koyama, N. Murata, and H. Saruwatari, “Super-resolution in sound field recording and reproduction based on sparse representation,” in Proceedings of 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan, Honolulu, Nov. 2016. [Invited]
M. Ishimura, S. Makino, T. Yamada, N. Ono, and H. Saruwatari, “Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-III-04.
D. Kitamura, N. Ono, H. Saruwatari, Y. Takahashi, and K. Kondo, “Discriminative and reconstructive basis training for audio source separation with semi-supervised nonnegative matrix factorization,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-III-02.
M. Takakusaki, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego-noise reduction for a hose-shaped rescue robot using determined rank-1 multichannel nonnegative matrix factorization,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-II-02.
K. Kobayashi, S. Takamichi, S. Nakamura, and T. Toda, “The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016,” in Proceedings of Interspeech, San Francisco, Sep. 2016, pp. 1667–1671.
L. Li, H. Kameoka, T. Higuchi, and H. Saruwatari, “Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech,” in Proceedings of Interspeech, San Francisco, Sep. 2016, pp. 3753–3757.
H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, N. Ono, Y. Takahashi, and K. Kondo, “Music signal separation using supervised NMF with all-pole-model-based discriminative basis deformation,” in Proceedings of The 2016 European Signal Processing Conference (EUSIPCO), Budapest, Aug. 2016, pp. 1143–1147.
N. Murata, H. Kameoka, K. Kinoshita, S. Araki, T. Nakatani, S. Koyama, and H. Saruwatari, “Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution,” in Proceedings of The 2016 European Signal Processing Conference (EUSIPCO), Budapest, Aug. 2016, pp. 1648–1652.
S. Koyama, “Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry,” in Proceedings of 2016 AES International Conference on Sound Field Control, Guildford, Jul. 2016 [Online]. Available at: http://www.aes.org/e-lib/browse.cfm?elib=18303 [Invited]
Y. Mitsufuji, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 56–60.
N. Murata, S. Koyama, H. Kameoka, N. Takamune, and H. Saruwatari, “Sparse sound field decomposition with multichannel extension of complex NMF,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 345–349.
S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment using sparse and low-rank signal models,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 395–399.

2015

S. Koyama, A. Matsubayashi, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition Using Group Sparse Bayesian Learning,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2015, pp. 850–855. [Invited]
N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Sound Field Decomposition with Parametric Dictionary Learning for Super-Resolution Recording and Reproduction,” in Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec. 2015.
S. Koyama, K. Ito, and H. Saruwatari, “Source-location-informed sound field recording and reproduction with spherical arrays,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2015.
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 276–280.
S. Koyama, N. Murata, and H. Saruwatari, “Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 619–623.
Y. Murota, D. Kitamura, S. Koyama, H. Saruwatari, and S. Nakamura, “Statistical modeling of binaural signal and its application to binaural source separation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 494–498.
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Nice, 2015, pp. 1261–1265. [Invited]
H. Saruwatari, “Statistical-model-based speech enhancement With musical-noise-free properties,” in in Proceedings of 2015 IEEE International Conference on Digital Signal Processing (DSP2015), Singapore, 2015. [Invited]

2014

D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Siem Reap, Dec. 2014. [Invited]
S. Koyama, P. Srivastava, K. Furuya, S. Shimauchi, and H. Ohmuro, “STSP: Space-Time Stretched Pulse for Measuring Spatio-Temporal Impulse Response,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2014, pp. 309–313.
D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905839.
F. Aprilyanti, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Optimized joint noise suppression and dereverberation based on blind signal extraction For hands-free speech recognition system,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905697.
S. Nakai, H. Saruwatari, R. Miyazaki, S. Nakamura, and K. Kondo, “Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905751.
Y. Murota, D. Kitamura, S. Nakai, H. Saruwatari, S. Nakamura, Y. Takahashi, and K. Kondo, “Music signal separation based on Bayesian spectral amplitude estimator with Automatic target prior adaptation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 7540–7544.
S. Koyama, S. Shimauchi, and H. Ohmuro, “Sparse Sound Field Representation in Recording and Reproduction for Reducing Spatial Aliasing Artifacts,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 4476–4480.
Y. Haneda, K. Furuya, S. Koyama, and K. Niwa, “Close-talking spherical microphone array using sound pressure interpolation based on spherical harmonic expansion,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 604–608.