# Journal Papers

## 2022

- H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules,”
*Computer Speech & Language*, vol. 72, p. 101315, 2022. - M. Takeuchi, J. Ahn, K. Lee, K. Takaki, T. Ifukube, K.-ichiro Yabu, S. Takamichi, R. Ueha, and M. Sekino, “Hands-Free Wearable Electrolarynx using LPC Residual Waves and Listening Evaluation,”
*Advanced Biomedical Engineering*, vol. 11, pp. 68–75, 2022. - Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “Onoma-to-wave: Environmental sound synthesis from onomatopoeic words,”
*APSIPA Transactions on Signal and Information Processing*, 2022. - K. Saito, T. Nakamura, K. Yatabe, and H. Saruwatari, “Sampling-Frequency-Independent Convolutional Layer and Its Application to Audio Source Separation,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, 2022. - Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction,”
*EURASIP Journal on Advances in Signal Processing*, 2022. (accepted)

## 2021

- K. Kamo, Y. Mitsui, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-diagonalizability-constrained multichannel nonnegative matrix factorization based on time-variant multivariate complex sub-Gaussian distribution,”
*Elsevier Signal Processing*, vol. 188, p. 108183, Jun. 2021. - T. Nakamura, S. Kozuka, and H. Saruwatari, “Time-Domain Audio Source Separation with Neural Networks Based on Multiresolution Analysis,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 1687–1701, 2021. - T. Nakamura and H. Kameoka, “Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 68–82, 2021. - Y. Saito, T. Nakamura, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification,”
*Acoustical Science and Technology*, vol. 42, no. 1, pp. 1–11, 2021. - Y. Saito, S. Takamichi, and H. Saruwatari, “Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 1033–1048, 2021. - T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time full-band voice conversion with sub-band modeling and data-driven phase estimation of spectral differentials,”
*IEICE Transactions on Information and Systems*, vol. E104.D, no. 7, pp. 1002–1016, 2021. - A. Aiba, M. Yoshida, D. Kitamura, S. Takamichi, and H. Saruwatari, “Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution,”
*IEICE Transactions on Information and Systems*, vol. E104.D, no. 3, pp. 441–449, 2021. - T. Saeki, S. Takamichi, and H. Saruwatari, “Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model,”
*IEEE Signal Processing Letters*, vol. 28, pp. 857–861, 2021. - N. Ueno, S. Koyama, and H. Saruwatari, “Directionally weighted wave field estimation exploiting prior information on source direction,”
*IEEE Transactions on Signal Processing*, vol. 69, pp. 2383–2395, 2021. - Y. Mitsufuji, N. Takamune, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 607–617, 2021. - K. Mitsui, T. Koriyama, and H. Saruwatari, “Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation,”
*Elsevier Speech Communication*, vol. 132, pp. 132–145, 2021. - S. Mizoguchi, Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based low-musical-noise single-channel speech enhancement based on higher-order-moments matching,”
*IEICE Transactions on Information and Systems*, vol. E104.D, no. 11, pp. 1971–1980, 2021.

## 2020

- N. Makishima, Y. Mitsui, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix,”
*Signal Processing (Elsevier)*, vol. 178, no. 107753, Sep. 2020. - M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis,”
*Elsevier Speech Communication*, vol. 125, pp. 53–60, Sep. 2020. - Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 28, pp. 1948–1963, Jun. 2020. - H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,”
*IEICE Transactions on Information and Systems*, vol. E103-D, no. 3, pp. 639–647, May 2020. - J. Koguchi, S. Takamichi, M. Morise, H. Saruwatari, and S. Sagayama, “DNN-based full-band speech synthesis using GMM approximation of spectral envelope,”
*IEICE Transactions on Information and Systems*, vol. E103.D, no. 12, pp. 2673–2681, 2020. - Y. Saito, K. Akuzawa, and K. Tachibana, “Joint adversarial training of speech recognition and synthesis models for many-to-one voice conversion using phonetic posteriorgrams,”
*IEICE Transactions on Information and Systems*, vol. E103.D, no. 9, pp. 1978–1987, 2020. - H. Tamaru, S. Takamichi, and H. Saruwatari, “Perception analysis of inter-singer similarity in Japanese song,”
*Acoustical Science and Technology*, vol. 41, no. 5, pp. 804–807, 2020. - S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase Reconstruction from Amplitude Spectrograms Based on Directional-Statistics Deep Neural Networks,”
*Elsevier Signal Processing*, vol. 169, 2020. - S. Takamichi, R. Sonobe, K. Mitsui, Y. Saito, T. Koriyama, N. Tanji, and H. Saruwatari, “JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,”
*Acoustical Science and Technology*, vol. 41, no. 5, pp. 761–768, 2020. - Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition,”
*Elsevier Signal Processing*, vol. 169, 2020.

## 2019

- D. Sekizawa, S. Takamichi, and H. Saruwatari, “Prosody correction preserving speaker individuality for Chinese-accented Japanese HMM-based text-to-speech synthesis,”
*IEICE Transactions on Information and Systems*, vol. E102.D, no. 6, pp. 1218–1221, Jun. 2019. - S. Takamichi and D. Morikawa, “Perceived azimuth-based creditability and self-reported confidence for sound localization experiments using crowdsourcing,”
*Acoustical Science and Technology*, vol. 40, no. 2, pp. 142–143, Mar. 2019. - H. Nakajima, D. Kitamura, N. Takamune, H. Saruwatari, and N. Ono, “Bilevel optimization using stationary point of lower-level objective function for discriminative basis learning in nonnegative matrix factorization,”
*IEEE Signal Processing Letters*, vol. 26, no. 6, pp. 818–822, 2019. - S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model for determined blind source separation,”
*IEEE Transactions on Acoustics, Speech, and Signal Processing*, vol. 28, pp. 503–518, 2019. - N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation,”
*IEEE Transactions on Acoustics, Speech, and Signal Processing*, vol. 27, no. 10, pp. 1601–1615, 2019. - Y. Saito, S. Takamichi, and H. Saruwatari, “Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra,”
*Computer Speech & Language*, vol. 58, pp. 347–363, 2019. - Y. Mitsufuji, S. Uhlich, N. Takamune, D. Kitamura, S. Koyama, and H. Saruwatari, “Multichannel non-negative matrix factorization using banded spatial covariance matrices in wavenumber domain,”
*IEEE Transactions on Acoustics, Speech, and Signal Processing*, vol. 28, pp. 49–60, 2019. - H. Sawada, N. Ono, H. kameoka, D. Kitamura, and H. Saruwatari, “A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF,”
*APSIPA Transactions on Signal and Information Processing*, vol. 8, no. E12, 2019. - S. Koyama and L. Daudet, “Sparse Representation of a Spatial Sound Field in a Reverberant Environment,”
*IEEE Journal of Selected Topics in Signal Processing*, vol. 13, no. 1, pp. 172–184, 2019. - T. Koriyama and T. Kobayashi, “Statistical parametric speech synthesis using deep Gaussian processes,”
*IEEE Transactions on Acoustics, Speech, and Signal Processing*, vol. 27, no. 5, pp. 948–959, 2019. - N. Maikusa, R. Sonobe, S. Kinoshita, N. Kawada, S. Yagishi, T. Masuoka, T. Kinoshita, S. Takamichi, and A. Homma, “Automatic detection of Alzheimer’s dementia using speech features of the revised Hasegawa’s Dementia Scale,”
*Geriatric Medicine*, vol. 57, no. 2, pp. 1117–1125, 2019. - N. Ueno, S. Koyama, and H. Saruwatari, “Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method,”
*IEEE Transactions on Acoustics, Speech, and Signal Processing*, vol. 27, no. 12, pp. 1852–1867, 2019.

## 2018

- T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, Satoshi, and Nakamura, “An End-to-end Model for Cross-Lingual Transformation of Paralinguistic Information,”
*Machine Translation*, pp. 1–16, Apr. 2018. - Y. Saito, S. Takamichi, and H. Saruwatari, “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 26, no. 1, pp. 84–96, Jan. 2018. [第34回 電気通信普及財団 テレコムシステム技術学生賞] - D. Kitamura, S. Mogami, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, and Y. Takahashi, “Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation,”
*EURASIP Journal on Advances in Signal Processing*, 2018. (accepted) - N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Representation Using Multidimensional Mixed-Norm Penalty With Application to Sound Field Decomposition,”
*IEEE Transactions on Signal Processing*, vol. 66, no. 12, pp. 3327–3338, 2018. - S. Koyama, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition for Super-resolution in Recording and Reproduction,”
*Journal of the Acoustical Society of America*, vol. 143, no. 6, pp. 3780–3895, 2018. - N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Recording Using Distributed Microphones Based on Harmonic Analysis of Infinite Order,”
*IEEE Signal Processing Letters*, vol. 25, no. 1, pp. 135–139, 2018.

## 2017

- Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Input-to-Output Highway Networks,”
*IEICE Transactions on Information and Systems*, 2017. - Y. Bando, H. Saruwatari, N. Ono, S. Makino, K. Itoyama, D. Kitamura, M. Ishimura, M. Takakusaki, N. Mae, K. Yamaoka, Y. Matsui, Y. Ambe, M. Konyo, S. Tadokoro, K. Yoshii, and H. G. Okuno, “Low-latency and high-quality two-stage human-voice-enhancement system for a hose-shaped rescue robot,”
*Journal of Robotics and Mechatronics*, vol. 29, no. 1, 2017.

## 2016

- S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models,”
*IEICE Transactions on Information and Systems*, vol. E99-D, no. 10, pp. 2490–2498, Oct. 2016. - D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 24, no. 9, pp. 1626–1641, Sep. 2016. - S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, “Post-filters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 24, no. 4, pp. 755–767, Apr. 2016. [日本音響学会 独創研究奨励賞 板倉記念対象論文] - S. Koyama, K. Furuya, K. Wakayama, S. Shimauchi, and H. Saruwatari, “Analytical approach to transforming filter design for sound field recording and reproduction using circular arrays with a spherical baffle,”
*Journal of the Acoustical Society of America*, vol. 139, no. 3, pp. 1024–1036, Mar. 2016. - Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “Non-Native Text-To-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics,”
*IEICE Transactions on Information and Systems*, vol. E99-D, no. 12, 2016.

## 2015

- S. Koyama, K. Furuya, Y. Haneda, and H. Saruwatari, “Source-location-informed sound field recording and reproduction,”
*IEEE Journal of Selected Topics in Signal Processing*, vol. 9, no. 5, pp. 881–894, Aug. 2015. - D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S. Nakamura, “Multichannel signal separation combining directional clustering and nonnegative matrix factorization with spectrogram restoration,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 23, no. 4, pp. 654–669, Apr. 2015. - F. D. Aprilyanti, J. Even, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Suppresion of noise and late reverberation based on blind signal extraction and Wiener filtering,”
*Acoustical Science and Technology*, vol. 36, no. 6, pp. 302–313, Jan. 2015.

## 2014

- S. Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda, and Y. Suzuki, “Wave Field Reconstruction Filtering in Cylindrical Harmonic Domain for With-Height Recording and Reproduction,”
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 22, no. 10, pp. 1546–1557, Oct. 2014. - R. Miyazaki, H. Saruwatari, S. Nakamura, K. Shikano, K. Kondo, J. Blanchette, and M. Bouchard, “Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction,”
*Signal Processing (Elsevier)*, vol. 102, pp. 226–239, Sep. 2014. - S. Koyama, K. Furuya, H. Uematsu, Y. Hiwasaki, and Y. Haneda, “Real-time Sound Field Transmission System by Using Wave Field Reconstruction Filter and Its Evaluation,”
*IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, vol. E97-A, no. 9, pp. 1840–1848, Sep. 2014. - T. Aketo, H. Saruwatari, and S. Nakamura, “Robust sound field reproduction against listener’s movement utilizing image sensor,”
*Journal of Signal Processing*, vol. 18, no. 4, pp. 213–216, Jul. 2014. - T. Miyauchi, D. Kitamura, H. Saruwatari, and S. Nakamura, “Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization,”
*Journal of Signal Processing*, vol. 18, no. 4, pp. 217–220, Jul. 2014. - D. Kitamura, H. Saruwatari, K. Yagi, K. Shikano, Y. Takahashi, and K. Kondo, “Music signal separation based on supervised nonnegative matrix factorization with orthogonality and maximum-divergence penalties,”
*IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, vol. E97-A, no. 5, pp. 1113–1118, May 2014.

# Books

## 2021

- Y. Ishikawa, S. Takamichi, T. Umemoto, Y. Tsubota, M. Aikawa, K. Sakamoto, K. Yui, S. Fujiwara, A. Suto, and K. Nishiyama, “Team-based flipped learning framework: Achieving high student engagement in learning ,” in
*Blended Language Learning: Evidence-based Trends and Applications (book chapter)*, Aug. 2021. (to appear)

## 2018

- H. Saruwatari and R. Miyazaki, “Musical-noise-free blind speech extraction based on higher-order statistics analysis,” in
*Audio Source Separation*, S. Makino, Ed. Springer, 2018, pp. 333–364. - D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation with independent low-rank matrix analysis,” in
*Audio Source Separation*, S. Makino, Ed. Springer, 2018, pp. 125–155.

## 2014

- H. Saruwatari and R. Miyazaki, “Statistical analysis and evaluation of blind speech extraction algorithms,” in
*Advances in Modern Blind Source Separation Techniques: Theory and Applications*, G. Naik and W. Wang, Eds. Springer, May 2014, pp. 291–322.

# Invited Talks

## 2021

- H. Saruwatari, “Multichannel audio source separation based on unsupervised and semi-supervised learning,” in
*Proceedings of Chinese Computer Federation*, Jan. 2021.

## 2020

- H. Saruwatari, “Multichannel audio source separation based on unsupervised and semi-supervised learning,” in
*Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)*, Dec. 2020.

## 2019

- Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Comparison of Interpolation Methods for Gridless Sound Field Decomposition Based on Reciprocity Gap Functional,” in
*Proceedings of International Congress on Sound and Vibration (ICSV)*, Montreal, Jul. 2019. (to appear) - S. Takamichi, “Group-delay modelling based on deep neural network with sine-skewed generalized cardioid distribution,” in
*Proceedings of International Conference on Soft Computing & Machine Learning (SCML)*, Wuhan, China, Apr. 2019. (invited)

## 2018

- M. Une, Y. Saito, S. Takamichi, D. Kitamura, R. Miyazaki, and H. Saruwatari, “Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech,” in
*Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)*, Hawaii, Nov. 2018. - S. Koyama, “Sparsity-based sound field reconstruction,” in
*Tohoku Universal Acoustical Communication Month, Seminar on the spatial aspects of hearing and their applications, keynote lecture*, Sendai, Oct. 2018. - S. Takamichi, “What can GAN and GMMN do for augmented speech communication?,” in
*GMI workshop*, Hiroshima, Aug. 2018.

## 2017

- S. Takamichi, “Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech
Synthesis using FFT spectra,” in
*Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)*, Kuala Lumpur, Dec. 2017. - D. Kitamura, N. Ono, and H. Saruwatari, “Experimental analysis of optimal window length for independent low-rank matrix analysis,” in
*Proceedings of Proceedings of 25th European Signal Processing Conference*, Greek island of Kos, Aug. 2017. - S. Koyama, N. Murata, and H. Saruwatari, “Effect of multipole dictionary in sparse sound field decomposition for super-resolution in recording and reproduction,” in
*Proceedings of International Congress on Sound and Vibration (ICSV)*, London, Jul. 2017.

## 2016

- H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, Y. Takahashi, and K. Kondo, “Audio signal separation using supervised NMF with time-variant all-pole-model-based basis deformation,” in
- S. Takamichi, “Speech synthesis that deceives anti-spoofing verification,” in
*NII Talk*, Dec. 2016. - S. Koyama, N. Murata, and H. Saruwatari, “Super-resolution in sound field recording and reproduction based on sparse representation,” in
*Proceedings of 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan*, Honolulu, Nov. 2016. - H. Saruwatari, K. Takata, N. Ono, and S. Makino, “Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation,” in
*The 22nd International Congress on Acoustics (ICA2016)*, Sep. 2016, no. ICA2016-312. - S. Koyama, “Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry,” in
*Proceedings of 2016 AES International Conference on Sound Field Control*, Guildford, Jul. 2016 [Online]. Available at: http://www.aes.org/e-lib/browse.cfm?elib=18303

## 2015

- S. Koyama, A. Matsubayashi, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition Using Group Sparse Bayesian Learning,” in
- D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” in
*in Proceedings of The 2015 European Signal Processing Conference (EUSIPCO2015)*, Nice, Sep. 2015, pp. 1271–1275. - H. Saruwatari, “Statistical-model-based speech enhancement
With musical-noise-free properties,” in
*in Proceedings of 2015 IEEE International Conference on Digital Signal Processing (DSP2015)*, Singapore, 2015.

## 2014

- D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration,” in

# International Conferences

## 2022

- F. Nakashima, T. Nakamura, N. Takamune, S. Fukayama, and H. Saruwatari, “Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders,” in
- Y. Okamoto, K. Imoto, S. Takamichi, T. Fukumori, and Y. Yamashita, “How Should We Evaluate Synthesized Environmental Sounds,” in
- K. Fujii, Y. Saito, and H. Saruwatari, “Adaptive End-To-End Text-To-Speech Synthesis Based on Error Correction Feedback From Humans,” in
- Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis,” in
- Y. Nakai, K. Udagawa, Y. Saito, and H. Saruwatari, “Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-To-Speech,” in
- Y. Saito, Y. Nishimura, S. Takamichi, K. Tachibana, and H. Saruwatari, “STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - K. Shigemi, S. Koyama, T. Nakamura, and H. Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Sep. 2022. (accepted) - Y. Nishimura, Y. Saito, S. Takamichi, K. Tachibana, and H. Saruwatari, “Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - S. Takamichi, W. Nakata, N. Tanji, and H. Saruwatari, “J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - W. Nakata, T. Koriyama, S. Takamichi, Y. Saito, Y. Ijima, R. Masumura, and H. Saruwatari, “Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - Y. Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Sep. 2022. (accepted) - T. Saeki*, D. Xin*, W. Nakata*, T. Koriyama, S. Takamichi, and H. S. (*E. contribution), “UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - T. Saeki, S. Takamichi, T. Nakamura, N. Tanji, and H. Saruwatari, “SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - K. Udagawa, Y. Saito, and H. Saruwatari, “Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS,” in
*Proceedings of Interspeech*, Sep. 2022. (accepted) - D. Xin, S. Takamichi, and H. Saruwatari, “Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations,” in
*Proceedings of ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022*, Jul. 2022. (accepted) - Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Personalized filled-pause generation with group-wise prediction models,” in
*Proceedings of Language Resources and Evaluation Conference (LREC)*, Jun. 2022. (accepted) - H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Robustness of signal processing-based pseudonymization method against decryption attack,” in
*Proceedings of Odyssey*, Jun. 2022. (accepted) - N. Kimura, Z. Su, T. Saeki, and J. Rekimoto, “SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition,” in
*Proceedings of Language Resources and Evaluation Conference (LREC)*, Jun. 2022. (accepted) - J. G. C. Ribeiro, S. Koyama, and H. Saruwatari, “Region-to-region kernel interpolation of acoustic transfer function with directional weighting,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2022. (accepted) - K. Arikawa, S. Koyama, and H. Saruwatari, “Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2022. (accepted) - M. Kawamura, T. Nakamura, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2022. (accepted)

## 2021

- S. Koyama, T. Nishida, K. Kimura, T. Abe, N. Ueno, and J. Brunnström, “MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, 1–5, Oct. 2021. - K. Kimura, S. Koyama, N. Ueno, and H. Saruwatari, “Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis With Prior Information on Desired Field,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, 281–285, Oct. 2021, pp. 281–285. - R. Ominato, N. Wakui, S. Takamichi, and S. Yano, “Discriminating between left and right ears using linear and nonlinear dimensionality reduction,” in
*SmaSys2021*, Oct. 2021. - R. Arakawa, Z. Kashino, S. Takamichi, A. A. Verhulst, and M. Inami, “Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation,” in
*ACM ICMI*, Oct. 2021, pp. 159–167. - R. Horiuchi, S. Koyama, J. G. C. Ribeiro, N. Ueno, and and Hiroshi Saruwatari, “Kernel learning for sound field estimation with L1 and L2 regularizations,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, Oct. 2021, pp. 261–265. - N. Narisawa, R. Ikeshita, N. Takamune, D. Kitamura, T. Nakamura, H. Saruwatari, and T. Nakatani, “Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Aug. 2021, pp. 326–330. - T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Aug. 2021, pp. 331–335. - K. Saito, T. Nakamura, K. Yatabe, Y. Koizum, and H. Saruwatari, “Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Aug. 2021, pp. 321–325. - K. Yufune, T. Koriyama, S. Takamichi, and H. Saruwatari, “Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,” in
*Proceedings of The 11th ISCA SSW*, Aug. 2021, pp. 189–194. - T. Nakamura, T. Koriyama, and H. Saruwatari, “Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,” in
*Proceedings of Interspeech*, Aug. 2021, pp. 121–125. - D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,” in
*Proceedings of Interspeech*, Aug. 2021, pp. 1614–1618. - W. Nakata, T. Koriyama, S. Takamichi, N. Tanji, Y. Ijima, R. Masumura, and H. Saruwatari, “Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,” in
*Proceedings of The 11th ISCA SSW*, Aug. 2021, pp. 211–215. - K. Mizuta, T. Koriyama, and H. Saruwatari, “Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,” in
*Proceedings of Interspeech*, Aug. 2021, pp. 2192–2196. - Y. Ueda, K. Fujii, Y. Saito, S. Takamichi, Y. Baba, and H. Saruwatari, “HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Jun. 2021, pp. 6468–6472. - D. Xin, T. Komatsu, S. Takamichi, and H. Saruwatari, “Disentangled Speaker and Language Representations using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Jun. 2021, pp. 6608–6612. - Y. Ishikawa, S. Takamichi, T. Umemoto, M. Aikawa, K. Sakamoto, K. Yui, S. Fujiwara, A. Suto, and K. Nishiyama, “Japanese EFL learners’ speaking practice utilizing text-to-speech technology within a team-based flipped learning framework,” in
*Proceedings of International Conference on Human-Computer Interaction (HCII)*, Jun. 2021, pp. 283–291. - Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient basis estimation of noise spatial covariance matrix for rank-constrained spatial covariance matrix estimation method in blind speech extraction,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Jun. 2021, pp. 806–810. - H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight voice anonymization based on data-driven optimization of cascaded voice modification modules,” in
*Proceedings of IEEE Spoken Language Technology Workshop (SLT)*, Jan. 2021, pp. 560–566. - T. Nishida, N. Ueno, S. Koyama, and H. Saruwatari, “Sensor Placement in Arbitrarily Restricted Region for Field Estimation Based on Gaussian Process,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Jan. 2021, pp. 2289–2293. - J. Brunnström and S. Koyama, “Kernel-Interpolation-Based Filtered-X Least Mean Square for Spatial Active Noise Control in Time Domain,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, 161–165, 2021. - N. Ueno, S. Koyama, and H. Saruwatari, “Convex and Differentiable Formulation for Inverse Problems in Hilbert Spaces with Nonlinear Clipping Effects,” in
*IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, 2021. - S. Koyama, K. Kimura, and N. Ueno, “Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation,” in
*International Conference on Immersive and 3D Audio (I3DA)*, 2021. (invited) - S. Koyama, T. Amakasu, N. Ueno, and H. Saruwatari, “Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only With Amplitude Constraint,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, 2021, pp. 411–415. - S. Koyama, J. Brunnström, H. Ito, N. Ueno, and and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, 2021, pp. 3052–3063. - T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models,” in
- X. Luo, S. Takamichi, T. Koriyama, Y. Saito, and H. Saruwatari, “Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,” in
- S. Misawa, N. Takamune, T. Nakamura, D. Kitamura, H. Saruwatari, M. Une, and S. Makino, “Speech enhancement by noise self-supervised rank-constrained spatial covariance matrix estimation via independent deeply learned matrix analysis,” in
- T. Saeki, S. Takamichi, and H. Saruwatari, “Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network,” in
*Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)*, 2021, pp. 749–756.

## 2020

- K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Student’s t-distribution,” in
- J. Koguchi, S. Takamichi, and M. Morise, “PJS: phoneme-balanced Japanese singing-voice corpus,” in
- T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time, full-band, online DNN-based voice conversion system using a single CPU,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 1021–1022. - Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis,” in
*Proceedings of Detection and Classification of Acoustic Scenes and Events (DCASE)*, Oct. 2020, pp. 125–129. - M. Aso, S. Takamichi, and H. Saruwatari, “End-to-end text-to-speech synthesis with unaligned multiple language units based on attention,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 4009–4013. - D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 2947–2951. - N. Kimura, Z. Su, and T. Saeki, “End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 1025–1026. - Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 3201–3205. - S. Goto, K. Ohnishi, Y. Saito, K. Tachibana, and K. Mori, “Face2Speech: towards multi-speaker text-to-speech synthesis using an embedding vector predicted from a face image,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 1321–1325. - K. Mitsui, T. Koriyama, and H. Saruwatari, “Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,” in
*Proceedings of Interspeech*, Oct. 2020, pp. 2032–2036. - H. Takeuchi, K. Kashino, Y. Ohishi, and H. Saruwatari, “Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals,” in
*Proc. Interspeech*, Sep. 2020, pp. 185–189. - N. Iijima, K. Shoichi, and H. Saruwatari, “Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements,” in
*IEEE International Workshop on Multimedia Signal Processing (MMSP)*, Sep. 2020, pp. 1–6. - S. Kozuka, T. Nakamura, and H. Saruwatari, “Investigation on Wavelet Basis Function of DNN-based Time Domain Audio Source Separation Inspired by Multiresolution Analysis,” in
*Proceedings of Internoise*, Aug. 2020. - R. Okamoto, S. Yano, N. Wakui, and S. Takamichi, “Visualization of differences in ear acoustic characteristics using t-SNE,” in
*Proceedings of AES convention*, May 2020. - T. Koriyama and H. Saruwatari, “Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 7249–7253. - T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Lifter training and sub-band modeling for computationally efficient and high-quality voice conversion using spectral differentials,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 7784–7788. - T. Nakamura and H. Saruwatari, “Time-domain Audio Source Separation based on Wave-U-Net Combined with Discrete Wavelet Transform,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 386–390. - K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 606–610. - T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, “Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student’s T Distribution,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 681–685. - Y. Saito, S. Takamichi, and H. Saruwatari, “SMASH corpus: a spontaneous speech corpus recording third-person audio commentaries on gameplay,” in
*Proceedings of Language Resources and Evaluation Conference (LREC)*, May 2020, pp. 6571–6577. - K. Ariga, T. Nishida, S. Koyama, N. Ueno, and H. Saruwatari, “Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 166–170. - Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,” in
*Proceedings of Language Resources and Evaluation Conference (LREC)*, May 2020, pp. 6438–6443. - H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, May 2020, pp. 8404–8408. (invited) - S. Koyama, G. Chardon, and L. Daudet, “Optimizing Source and Sensor Placement for Sound Field Control: An Overview,” in
*IEEE/ACM Transactions on Audio, Speech, and Language Processing*, 2020. (overview) - G. Chardon, S. Koyama, and L. Daudet, “Numerical Evaluation of Source and Sensor Placement Methods For Sound Field Control,” in
*Forum Acusticum*, 2020.

## 2019

- N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Robust demixing filter update algorithm based on microphone-wise coordinate descent for independent deeply learned matrix analysis,” in
- Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction,” in
- M. Une, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, and S. Makino, “Evaluation of multichannel hearing aid system using rank-constrained spatial covariance matrix estimation,” in
- M. Nakanishi, N. Ueno, S. Koyama, and H. Saruwatari, “Two-dimensional sound field recording with multiple circular microphone arrays considering multiple scattering,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, New Paltz, Oct. 2019. - R. Arakawa, S. Takamichi, and H. Saruwatari, “TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication,” in
*Proceedings of UIST*, New Orleans, Oct. 2019. - Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, A Coruña, Sep. 2019. - N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Column-wise update algorithm for independent deeply learned matrix analysis,” in
*Proceedings of international congress on acoustics (ICA)*, Aachen, Sep. 2019, pp. 2805–2812. [Young Scientist Conference Attendance Grant] - I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Speech Quality Evaluation of Synthesized Japanese Speech using EEG,” in
*Proceedings of Interspeech*, Graz, Sep. 2019, pp. 1228–1232. - T. Nakamura, Y. Saito, S. Takamichi, Y. Ijima, and H. Saruwatari, “V2S attack: building DNN-based voice conversion from automatic speaker verification,” in
*Proceedings of The 10th ISCA SSW*, Vienna, Sep. 2019. - H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Three-dimensional spatial active noise control based on kernel-induced sound field interpolation,” in
*Proceedings of international congress on acoustics (ICA)*, Aachen, Sep. 2019. - M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation,” in
*Proceedings of The 10th ISCA SSW*, Vienna, Sep. 2019. - Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis,” in
*Proceedings of The 10th ISCA SSW*, Vienna, Sep. 2019. - R. Arakawa, S. Takamichi, and H. Saruwatari, “Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device,” in
*Proceedings of The 10th ISCA SSW*, Vienna, Sep. 2019. - T. Koriyama, S. Takamichi, and T. Kobayashi, “Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,” in
*Proceedings of The 10th ISCA SSW*, Vienna, Aug. 2019. - Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Comparison of Interpolation Methods for Gridless Sound Field Decomposition Based on Reciprocity Gap Functional,” in
*Proceedings of International Congress on Sound and Vibration (ICSV)*, Montreal, Jul. 2019. (to appear) [Invited] - I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “EEG Analysis towards Evaluating Synthesized Speech Quality,” in
*Proceedings of IEEE EMBC*, Berlin, Jul. 2019. - N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and H. Nakajima, “Generalized-Gaussian-distribution-based independent deeply learned matrix analysis for multichannel audio source separation,” in
*Proceedings of International Congress and Exhibition on Noise Control Engineering (INTERNOISE)*, Madrid, Jun. 2019. - H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brighton, May 2019. - K. Naruse, S. Yoshida, S. Takamichi, T. Narumi, T. Tanikawa, and M. Hirose, “Estimating Confidence in Voices using Crowdsourcing for Alleviating Tension with Altered Auditory Feedback,” in
*Proceedings of Asian CHI Symposium: Emerging HCI Research Collection in ACM Conference on Human Factors in Computing Systems (CHI)*, Glasgow, May 2019. - T. Koriyama and T. Kobayashi, “A Training Method Using DNN-guided Layerwise Pretraining for Deep Gaussian Processes,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brighton, May 2019. - H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Feedforward Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brighton, May 2019. (to appear) - Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Robust Gridless Sound Field Decomposotion Based on Structured Reciprocity Gap Functional in Spherical Harmonic Domain,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brighton, May 2019. (to appear) - K. Yoshino, Y. Murase, N. Lubis, K. Sugiyama, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Spoken Dialogue Robot for Watching Daily Life of Elderly People,” in
*Proceedings of IWSDS*, Sicily, Apr. 2019.

## 2018

- M. Une, Y. Saito, S. Takamichi, D. Kitamura, R. Miyazaki, and H. Saruwatari, “Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech,” in
- T. Akiyama, S. Takamichi, and H. Saruwatari, “Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis,” in
- S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, H. Nakajima, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model,” in
- H. Suda, G. Kotani, S. Takamichi, and D. Saito, “A revisit to feature handling for high-quality voice conversion,” in
- S. Shiota, S. Takamichi, and T. Matsui, “Data augmentation with moment-matching networks for i-vector based speaker verification,” in
- S. Koyama, “Sparsity-based sound field reconstruction,” in
*Tohoku Universal Acoustical Communication Month, Seminar on the spatial aspects of hearing and their applications, keynote lecture*, Sendai, Oct. 2018. [Invited] - N. Ueno, S. Koyama, and H. Saruwatari, “Kernel Ridge Regression With Constraint of Helmholtz Equation for Sound Field Interpolation,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Tokyo, Sep. 2018, pp. 436–440. - S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Tokyo, Sep. 2018. - Y. Takida, S. Koyama, and H. Saruwatari, “Exterior and Interior Sound Field Separation Using Convex Optimization: Comparison of Signal Models,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Rome, Sep. 2018, pp. 2567–2571. - S. Mogami, H. Sumino, D. Kitamura, N. Takamune, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Rome, Sep. 2018. - Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Gridless Sound Field Decomposition Based on Reciprocity Gap Functional in Spherical Harmonic Domain,” in
*Proceedings of IEEE sensor array and multichannel signal processing workshop (SAM)*, Sheffield, Jul. 2018, pp. 627–631. [Best Student Paper Award, ONRG sponsored student travel grants] - S. Takamichi and H. Saruwatari, “CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects,” in
*Proceedings of Language Resources and Evaluation Conference (LREC)*, Miyazaki, May 2018, pp. 434–437. - S. Koyama, G. Chardon, and L. Daudet, “Joint Source and Sensor Placement for Sound Field Control Based on Empirical Interpolation Method,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Calgary, Apr. 2018, pp. 501–505. - N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Reproduction with Exterior Radiation Cancellation Using Analytical Weighting of Harmonic Coefficients,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Calgary, Apr. 2018, pp. 466–470. [IEEE SPS Japan Student Conference Paper Award] - Y. Saito, S. Takamichi, and H. Saruwatari, “Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Calgary, Apr. 2018, pp. 5299–5303. - Y. Saito, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Calgary, Apr. 2018, pp. 5274–5278.

## 2017

- N. Mae, Y. Mitsui, S. Makino, D. Kitamura, N. Ono, T. Yamada, and H. Saruwatari, “Sound source localization using binaural different for hose-shaped rescue robot,” in
- Y. Mitsui, D. Kitamura, N. Takamune, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent low-rank matrix analysis based on parametric majorization-equalization algorithm,” in
*Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)*, Curaçao, Dec. 2017. - S. Koyama and L. Daudet, “Comparison of Reverberation Models for Sparse Sound Field Decomposition,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, New Paltz, Oct. 2017, pp. 214–218. - S. Takamichi, D. Saito, H. Saruwatari, and N. Minematsu, “The UTokyo speech synthesis system for Blizzard Challenge 2017,” in
*Proceedings of Blizzard Challenge Workshop*, Stockholm, Aug. 2017. - S. Takamichi, “Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra,” in
- D. Kitamura, N. Ono, and H. Saruwatari, “Experimental analysis of optimal window length for independent low-rank matrix analysis,” in
*Proceedings of Proceedings of 25th European Signal Processing Conference*, Greek island of Kos, Aug. 2017. [Invited] - S. Takamichi, T. Koriyama, and H. Saruwatari, “Sampling-based speech parameter generation using moment-matching network,” in
*Proceedings of Interspeech*, Stockholm, Aug. 2017. - H. Miyoshi, Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities,” in
*Proceedings of Interspeech*, Stockholm, Aug. 2017. - S. Koyama, N. Murata, and H. Saruwatari, “Effect of Multipole Dictionary in Sparse Sound Field Decomposition For Super-resolution in Recording and Reproduction,” in
*Proceedings of International Congress on Sound and Vibration (ICSV)*, London, Jul. 2017. [Invited] - Y. Mitsui, D. Kitamura, S. Takamichi, N. Ono, and H. Saruwatari, “Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, New Orleans, Mar. 2017, pp. 21–25. [Student Paper Contest Finalist] - N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction Based On Circular Harmonic Expansion,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, New Orleans, Mar. 2017, pp. 111–115. - N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Spatio-temporal Sparse Sound Field Decomposition Considering Acoustic Source Signal Characteristics,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, New Orleans, Mar. 2017, pp. 441–445. - N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction With Gaussian Prior Based On Circular Harmonic Expansion,” in
*Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA)*, San Francisco, Mar. 2017, pp. 196–200. - R. Sato, H. Kameoka, and K. Kashino, “Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, New Orleans, Mar. 2017, pp. 5595–5599. - Y. Saito, S. Takamichi, and H. Saruwatari, “Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, New Orleans, Mar. 2017, pp. 4900–4904. [Spoken Language Processing Student Grant] - N. Mae, M. Ishimura, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation,” in
*Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)*, Grenoble, Feb. 2017, pp. 141–151.

## 2016

- H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, Y. Takahashi, and K. Kondo, “Audio Signal Separation using Supervised NMF with Time-variant All-Pole-Model-Based Basis Deformation,” in
- S. Koyama, N. Murata, and H. Saruwatari, “Super-resolution in sound field recording and reproduction based on sparse representation,” in
*Proceedings of 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan*, Honolulu, Nov. 2016. [Invited] - M. Ishimura, S. Makino, T. Yamada, N. Ono, and H. Saruwatari, “Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Xian, Sep. 2016, no. PS-III-04. - D. Kitamura, N. Ono, H. Saruwatari, Y. Takahashi, and K. Kondo, “Discriminative and reconstructive basis training for audio source separation with semi-supervised nonnegative matrix factorization,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Xian, Sep. 2016, no. PS-III-02. - M. Takakusaki, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego-noise reduction for a hose-shaped rescue robot using determined rank-1 multichannel nonnegative matrix factorization,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Xian, Sep. 2016, no. PS-II-02. - K. Kobayashi, S. Takamichi, S. Nakamura, and T. Toda, “The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016,” in
*Proceedings of Interspeech*, San Francisco, Sep. 2016, pp. 1667–1671. - L. Li, H. Kameoka, T. Higuchi, and H. Saruwatari, “Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech,” in
*Proceedings of Interspeech*, San Francisco, Sep. 2016, pp. 3753–3757. - H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, N. Ono, Y. Takahashi, and K. Kondo, “Music signal separation using supervised NMF with all-pole-model-based discriminative basis deformation,” in
*Proceedings of The 2016 European Signal Processing Conference (EUSIPCO)*, Budapest, Aug. 2016, pp. 1143–1147. - N. Murata, H. Kameoka, K. Kinoshita, S. Araki, T. Nakatani, S. Koyama, and H. Saruwatari, “Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution,” in
*Proceedings of The 2016 European Signal Processing Conference (EUSIPCO)*, Budapest, Aug. 2016, pp. 1648–1652. - S. Koyama, “Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry,” in
*Proceedings of 2016 AES International Conference on Sound Field Control*, Guildford, Jul. 2016 [Online]. Available at: http://www.aes.org/e-lib/browse.cfm?elib=18303 [Invited] - Y. Mitsufuji, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Shanghai, Mar. 2016, pp. 56–60. - N. Murata, S. Koyama, H. Kameoka, N. Takamune, and H. Saruwatari, “Sparse sound field decomposition with multichannel extension of complex NMF,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Shanghai, Mar. 2016, pp. 345–349. - S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment using sparse and low-rank signal models,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Shanghai, Mar. 2016, pp. 395–399.

## 2015

- S. Koyama, A. Matsubayashi, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition Using Group Sparse Bayesian Learning,” in
- N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Sound Field Decomposition with Parametric Dictionary Learning for Super-Resolution Recording and Reproduction,” in
*Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)*, Dec. 2015. - S. Koyama, K. Ito, and H. Saruwatari, “Source-location-informed sound field recording and reproduction with spherical arrays,” in
*Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)*, New Paltz, Oct. 2015. - D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brisbane, Apr. 2015, pp. 276–280. - S. Koyama, N. Murata, and H. Saruwatari, “Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brisbane, Apr. 2015, pp. 619–623. - Y. Murota, D. Kitamura, S. Koyama, H. Saruwatari, and S. Nakamura, “Statistical modeling of binaural signal and its application to binaural source separation,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Brisbane, Apr. 2015, pp. 494–498. - D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” in
*Proceedings of European Signal Processing Conference (EUSIPCO)*, Nice, 2015, pp. 1261–1265. [Invited] - H. Saruwatari, “Statistical-model-based speech enhancement
With musical-noise-free properties,” in
*in Proceedings of 2015 IEEE International Conference on Digital Signal Processing (DSP2015)*, Singapore, 2015. [Invited]

## 2014

- D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration,” in
- S. Koyama, P. Srivastava, K. Furuya, S. Shimauchi, and H. Ohmuro, “STSP: Space-Time Stretched Pulse for Measuring Spatio-Temporal Impulse Response,” in
*Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)*, Sep. 2014, pp. 309–313. - D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation,” in
*Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA)*, Nancy, May 2014, no. 1569905839. - F. Aprilyanti, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Optimized joint noise suppression and dereverberation based on blind signal extraction
For hands-free speech recognition system,” in
*Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA)*, Nancy, May 2014, no. 1569905697. - S. Nakai, H. Saruwatari, R. Miyazaki, S. Nakamura, and K. Kondo, “Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement,” in
*Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA)*, Nancy, May 2014, no. 1569905751. - Y. Murota, D. Kitamura, S. Nakai, H. Saruwatari, S. Nakamura, Y. Takahashi, and K. Kondo, “Music signal separation based on Bayesian spectral amplitude estimator with
Automatic target prior adaptation,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Florence, May 2014, pp. 7540–7544. - S. Koyama, S. Shimauchi, and H. Ohmuro, “Sparse Sound Field Representation in Recording and Reproduction for Reducing Spatial Aliasing Artifacts,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Florence, May 2014, pp. 4476–4480. - Y. Haneda, K. Furuya, S. Koyama, and K. Niwa, “Close-talking spherical microphone array using sound pressure interpolation based on spherical harmonic expansion,” in
*Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, Florence, May 2014, pp. 604–608.

# Preprints

## 2020

- N. T. Shinnosuke Takamichi Mamoru Komachi and H. Saruwatari, “JSSS: free Japanese speech corpus for summarization and simplification,” in
*arXiv*, Oct. 2020.