Skip to content

Journal Papers

2022

  • H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight and irreversible speech pseudonymization based on data-driven optimization of cascaded voice modification modules,” Computer Speech & Language, vol. 72, p. 101315, 2022.
  • M. Takeuchi, J. Ahn, K. Lee, K. Takaki, T. Ifukube, K.-ichiro Yabu, S. Takamichi, R. Ueha, and M. Sekino, “Hands-Free Wearable Electrolarynx using LPC Residual Waves and Listening Evaluation,” Advanced Biomedical Engineering, vol. 11, pp. 68–75, 2022.
  • Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “Onoma-to-wave: Environmental sound synthesis from onomatopoeic words,” APSIPA Transactions on Signal and Information Processing, 2022.
  • K. Saito, T. Nakamura, K. Yatabe, and H. Saruwatari, “Sampling-Frequency-Independent Convolutional Layer and Its Application to Audio Source Separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.
  • Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction,” EURASIP Journal on Advances in Signal Processing, 2022. (accepted)

2021

  • K. Kamo, Y. Mitsui, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-diagonalizability-constrained multichannel nonnegative matrix factorization based on time-variant multivariate complex sub-Gaussian distribution,” Elsevier Signal Processing, vol. 188, p. 108183, Jun. 2021.
  • T. Nakamura, S. Kozuka, and H. Saruwatari, “Time-Domain Audio Source Separation with Neural Networks Based on Multiresolution Analysis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1687–1701, 2021.
  • T. Nakamura and H. Kameoka, “Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 68–82, 2021.
  • Y. Saito, T. Nakamura, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel and many-to-many voice conversion using variational autoencoders integrating speech recognition and speaker verification,” Acoustical Science and Technology, vol. 42, no. 1, pp. 1–11, 2021.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1033–1048, 2021.
  • T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time full-band voice conversion with sub-band modeling and data-driven phase estimation of spectral differentials,” IEICE Transactions on Information and Systems, vol. E104.D, no. 7, pp. 1002–1016, 2021.
  • A. Aiba, M. Yoshida, D. Kitamura, S. Takamichi, and H. Saruwatari, “Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution,” IEICE Transactions on Information and Systems, vol. E104.D, no. 3, pp. 441–449, 2021.
  • T. Saeki, S. Takamichi, and H. Saruwatari, “Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model,” IEEE Signal Processing Letters, vol. 28, pp. 857–861, 2021.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Directionally weighted wave field estimation exploiting prior information on source direction,” IEEE Transactions on Signal Processing, vol. 69, pp. 2383–2395, 2021.
  • Y. Mitsufuji, N. Takamune, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 607–617, 2021.
  • K. Mitsui, T. Koriyama, and H. Saruwatari, “Deep Gaussian Process Based Multi-speaker Speech Synthesis with Latent Speaker Representation,” Elsevier Speech Communication, vol. 132, pp. 132–145, 2021.
  • S. Mizoguchi, Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based low-musical-noise single-channel speech enhancement based on higher-order-moments matching,” IEICE Transactions on Information and Systems, vol. E104.D, no. 11, pp. 1971–1980, 2021.

2020

  • N. Makishima, Y. Mitsui, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix,” Signal Processing (Elsevier), vol. 178, no. 107753, Sep. 2020.
  • M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis,” Elsevier Speech Communication, vol. 125, pp. 53–60, Sep. 2020.
  • Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1948–1963, Jun. 2020.
  • H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based neural double-tracking for synthesized and natural singing voices,” IEICE Transactions on Information and Systems, vol. E103-D, no. 3, pp. 639–647, May 2020.
  • J. Koguchi, S. Takamichi, M. Morise, H. Saruwatari, and S. Sagayama, “DNN-based full-band speech synthesis using GMM approximation of spectral envelope,” IEICE Transactions on Information and Systems, vol. E103.D, no. 12, pp. 2673–2681, 2020.
  • Y. Saito, K. Akuzawa, and K. Tachibana, “Joint adversarial training of speech recognition and synthesis models for many-to-one voice conversion using phonetic posteriorgrams,” IEICE Transactions on Information and Systems, vol. E103.D, no. 9, pp. 1978–1987, 2020.
  • H. Tamaru, S. Takamichi, and H. Saruwatari, “Perception analysis of inter-singer similarity in Japanese song,” Acoustical Science and Technology, vol. 41, no. 5, pp. 804–807, 2020.
  • S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase Reconstruction from Amplitude Spectrograms Based on Directional-Statistics Deep Neural Networks,” Elsevier Signal Processing, vol. 169, 2020.
  • S. Takamichi, R. Sonobe, K. Mitsui, Y. Saito, T. Koriyama, N. Tanji, and H. Saruwatari, “JSUT and JVS: free Japanese voice corpora for accelerating speech synthesis research,” Acoustical Science and Technology, vol. 41, no. 5, pp. 761–768, 2020.
  • Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition,” Elsevier Signal Processing, vol. 169, 2020.

2019

  • D. Sekizawa, S. Takamichi, and H. Saruwatari, “Prosody correction preserving speaker individuality for Chinese-accented Japanese HMM-based text-to-speech synthesis,” IEICE Transactions on Information and Systems, vol. E102.D, no. 6, pp. 1218–1221, Jun. 2019.
  • S. Takamichi and D. Morikawa, “Perceived azimuth-based creditability and self-reported confidence for sound localization experiments using crowdsourcing,” Acoustical Science and Technology, vol. 40, no. 2, pp. 142–143, Mar. 2019.
  • H. Nakajima, D. Kitamura, N. Takamune, H. Saruwatari, and N. Ono, “Bilevel optimization using stationary point of lower-level objective function for discriminative basis learning in nonnegative matrix factorization,” IEEE Signal Processing Letters, vol. 26, no. 6, pp. 818–822, 2019.
  • S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model for determined blind source separation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 503–518, 2019.
  • N. Makishima, S. Mogami, N. Takamune, D. Kitamura, H. Sumino, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 10, pp. 1601–1615, 2019.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra,” Computer Speech & Language, vol. 58, pp. 347–363, 2019.
  • Y. Mitsufuji, S. Uhlich, N. Takamune, D. Kitamura, S. Koyama, and H. Saruwatari, “Multichannel non-negative matrix factorization using banded spatial covariance matrices in wavenumber domain,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, pp. 49–60, 2019.
  • H. Sawada, N. Ono, H. kameoka, D. Kitamura, and H. Saruwatari, “A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF,” APSIPA Transactions on Signal and Information Processing, vol. 8, no. E12, 2019.
  • S. Koyama and L. Daudet, “Sparse Representation of a Spatial Sound Field in a Reverberant Environment,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 1, pp. 172–184, 2019.
  • T. Koriyama and T. Kobayashi, “Statistical parametric speech synthesis using deep Gaussian processes,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 5, pp. 948–959, 2019.
  • N. Maikusa, R. Sonobe, S. Kinoshita, N. Kawada, S. Yagishi, T. Masuoka, T. Kinoshita, S. Takamichi, and A. Homma, “Automatic detection of Alzheimer’s dementia using speech features of the revised Hasegawa’s Dementia Scale,” Geriatric Medicine, vol. 57, no. 2, pp. 1117–1125, 2019.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 12, pp. 1852–1867, 2019.

2018

  • T. Kano, S. Takamichi, S. Sakti, G. Neubig, T. Toda, Satoshi, and Nakamura, “An End-to-end Model for Cross-Lingual Transformation of Paralinguistic Information,” Machine Translation, pp. 1–16, Apr. 2018.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 84–96, Jan. 2018. [第34回 電気通信普及財団 テレコムシステム技術学生賞]
  • D. Kitamura, S. Mogami, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, and Y. Takahashi, “Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation,” EURASIP Journal on Advances in Signal Processing, 2018. (accepted)
  • N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Representation Using Multidimensional Mixed-Norm Penalty With Application to Sound Field Decomposition,” IEEE Transactions on Signal Processing, vol. 66, no. 12, pp. 3327–3338, 2018.
  • S. Koyama, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition for Super-resolution in Recording and Reproduction,” Journal of the Acoustical Society of America, vol. 143, no. 6, pp. 3780–3895, 2018.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Recording Using Distributed Microphones Based on Harmonic Analysis of Infinite Order,” IEEE Signal Processing Letters, vol. 25, no. 1, pp. 135–139, 2018.

2017

  • Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Input-to-Output Highway Networks,” IEICE Transactions on Information and Systems, 2017.
  • Y. Bando, H. Saruwatari, N. Ono, S. Makino, K. Itoyama, D. Kitamura, M. Ishimura, M. Takakusaki, N. Mae, K. Yamaoka, Y. Matsui, Y. Ambe, M. Konyo, S. Tadokoro, K. Yoshii, and H. G. Okuno, “Low-latency and high-quality two-stage human-voice-enhancement system for a hose-shaped rescue robot,” Journal of Robotics and Mechatronics, vol. 29, no. 1, 2017.

2016

  • S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “A statistical sample-based approach to GMM-based voice conversion using tied-covariance acoustic models,” IEICE Transactions on Information and Systems, vol. E99-D, no. 10, pp. 2490–2498, Oct. 2016.
  • D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 9, pp. 1626–1641, Sep. 2016.
  • S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, and S. Nakamura, “Post-filters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 755–767, Apr. 2016. [日本音響学会 独創研究奨励賞 板倉記念対象論文]
  • S. Koyama, K. Furuya, K. Wakayama, S. Shimauchi, and H. Saruwatari, “Analytical approach to transforming filter design for sound field recording and reproduction using circular arrays with a spherical baffle,” Journal of the Acoustical Society of America, vol. 139, no. 3, pp. 1024–1036, Mar. 2016.
  • Y. Oshima, S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, “Non-Native Text-To-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics,” IEICE Transactions on Information and Systems, vol. E99-D, no. 12, 2016.

2015

  • S. Koyama, K. Furuya, Y. Haneda, and H. Saruwatari, “Source-location-informed sound field recording and reproduction,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 881–894, Aug. 2015.
  • D. Kitamura, H. Saruwatari, H. Kameoka, Y. Takahashi, K. Kondo, and S. Nakamura, “Multichannel signal separation combining directional clustering and nonnegative matrix factorization with spectrogram restoration,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 4, pp. 654–669, Apr. 2015.
  • F. D. Aprilyanti, J. Even, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Suppresion of noise and late reverberation based on blind signal extraction and Wiener filtering,” Acoustical Science and Technology, vol. 36, no. 6, pp. 302–313, Jan. 2015.

2014

  • S. Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda, and Y. Suzuki, “Wave Field Reconstruction Filtering in Cylindrical Harmonic Domain for With-Height Recording and Reproduction,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1546–1557, Oct. 2014.
  • R. Miyazaki, H. Saruwatari, S. Nakamura, K. Shikano, K. Kondo, J. Blanchette, and M. Bouchard, “Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction,” Signal Processing (Elsevier), vol. 102, pp. 226–239, Sep. 2014.
  • S. Koyama, K. Furuya, H. Uematsu, Y. Hiwasaki, and Y. Haneda, “Real-time Sound Field Transmission System by Using Wave Field Reconstruction Filter and Its Evaluation,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E97-A, no. 9, pp. 1840–1848, Sep. 2014.
  • T. Aketo, H. Saruwatari, and S. Nakamura, “Robust sound field reproduction against listener’s movement utilizing image sensor,” Journal of Signal Processing, vol. 18, no. 4, pp. 213–216, Jul. 2014.
  • T. Miyauchi, D. Kitamura, H. Saruwatari, and S. Nakamura, “Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization,” Journal of Signal Processing, vol. 18, no. 4, pp. 217–220, Jul. 2014.
  • D. Kitamura, H. Saruwatari, K. Yagi, K. Shikano, Y. Takahashi, and K. Kondo, “Music signal separation based on supervised nonnegative matrix factorization with orthogonality and maximum-divergence penalties,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E97-A, no. 5, pp. 1113–1118, May 2014.


Books

2021

  • Y. Ishikawa, S. Takamichi, T. Umemoto, Y. Tsubota, M. Aikawa, K. Sakamoto, K. Yui, S. Fujiwara, A. Suto, and K. Nishiyama, “Team-based flipped learning framework: Achieving high student engagement in learning ,” in Blended Language Learning: Evidence-based Trends and Applications (book chapter), Aug. 2021. (to appear)

2018

  • H. Saruwatari and R. Miyazaki, “Musical-noise-free blind speech extraction based on higher-order statistics analysis,” in Audio Source Separation, S. Makino, Ed. Springer, 2018, pp. 333–364.
  • D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Determined blind source separation with independent low-rank matrix analysis,” in Audio Source Separation, S. Makino, Ed. Springer, 2018, pp. 125–155.

2014

  • H. Saruwatari and R. Miyazaki, “Statistical analysis and evaluation of blind speech extraction algorithms,” in Advances in Modern Blind Source Separation Techniques: Theory and Applications, G. Naik and W. Wang, Eds. Springer, May 2014, pp. 291–322.


Invited Talks

2021

  • H. Saruwatari, “Multichannel audio source separation based on unsupervised and semi-supervised learning,” in Proceedings of Chinese Computer Federation, Jan. 2021.

2020

  • H. Saruwatari, “Multichannel audio source separation based on unsupervised and semi-supervised learning,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2020.

2019

  • Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Comparison of Interpolation Methods for Gridless Sound Field Decomposition Based on Reciprocity Gap Functional,” in Proceedings of International Congress on Sound and Vibration (ICSV), Montreal, Jul. 2019. (to appear)
  • S. Takamichi, “Group-delay modelling based on deep neural network with sine-skewed generalized cardioid distribution,” in Proceedings of International Conference on Soft Computing & Machine Learning (SCML), Wuhan, China, Apr. 2019. (invited)

2018

  • M. Une, Y. Saito, S. Takamichi, D. Kitamura, R. Miyazaki, and H. Saruwatari, “Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
  • S. Koyama, “Sparsity-based sound field reconstruction,” in Tohoku Universal Acoustical Communication Month, Seminar on the spatial aspects of hearing and their applications, keynote lecture, Sendai, Oct. 2018.
  • S. Takamichi, “What can GAN and GMMN do for augmented speech communication?,” in GMI workshop, Hiroshima, Aug. 2018.

2017

  • S. Takamichi, “Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech Synthesis using FFT spectra,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Dec. 2017.
  • D. Kitamura, N. Ono, and H. Saruwatari, “Experimental analysis of optimal window length for independent low-rank matrix analysis,” in Proceedings of Proceedings of 25th European Signal Processing Conference, Greek island of Kos, Aug. 2017.
  • S. Koyama, N. Murata, and H. Saruwatari, “Effect of multipole dictionary in sparse sound field decomposition for super-resolution in recording and reproduction,” in Proceedings of International Congress on Sound and Vibration (ICSV), London, Jul. 2017.

2016

  • H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, Y. Takahashi, and K. Kondo, “Audio signal separation using supervised NMF with time-variant all-pole-model-based basis deformation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Jeju, Dec. 2016.
  • S. Takamichi, “Speech synthesis that deceives anti-spoofing verification,” in NII Talk, Dec. 2016.
  • S. Koyama, N. Murata, and H. Saruwatari, “Super-resolution in sound field recording and reproduction based on sparse representation,” in Proceedings of 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan, Honolulu, Nov. 2016.
  • H. Saruwatari, K. Takata, N. Ono, and S. Makino, “Flexible microphone array based on multichannel nonnegative matrix factorization and statistical signal estimation,” in The 22nd International Congress on Acoustics (ICA2016), Sep. 2016, no. ICA2016-312.
  • S. Koyama, “Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry,” in Proceedings of 2016 AES International Conference on Sound Field Control, Guildford, Jul. 2016 [Online]. Available at: http://www.aes.org/e-lib/browse.cfm?elib=18303

2015

  • S. Koyama, A. Matsubayashi, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition Using Group Sparse Bayesian Learning,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2015, pp. 850–855.
  • D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” in in Proceedings of The 2015 European Signal Processing Conference (EUSIPCO2015), Nice, Sep. 2015, pp. 1271–1275.
  • H. Saruwatari, “Statistical-model-based speech enhancement With musical-noise-free properties,” in in Proceedings of 2015 IEEE International Conference on Digital Signal Processing (DSP2015), Singapore, 2015.

2014

  • D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Siem Reap, Dec. 2014.


International Conferences

2022

  • F. Nakashima, T. Nakamura, N. Takamune, S. Fukayama, and H. Saruwatari, “Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022. (accepted)
  • Y. Okamoto, K. Imoto, S. Takamichi, T. Fukumori, and Y. Yamashita, “How Should We Evaluate Synthesized Environmental Sounds,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022. (accepted)
  • K. Fujii, Y. Saito, and H. Saruwatari, “Adaptive End-To-End Text-To-Speech Synthesis Based on Error Correction Feedback From Humans,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022. (accepted)
  • Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022. (accepted)
  • Y. Nakai, K. Udagawa, Y. Saito, and H. Saruwatari, “Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-To-Speech,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2022. (accepted)
  • Y. Saito, Y. Nishimura, S. Takamichi, K. Tachibana, and H. Saruwatari, “STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • K. Shigemi, S. Koyama, T. Nakamura, and H. Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2022. (accepted)
  • Y. Nishimura, Y. Saito, S. Takamichi, K. Tachibana, and H. Saruwatari, “Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • S. Takamichi, W. Nakata, N. Tanji, and H. Saruwatari, “J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • W. Nakata, T. Koriyama, S. Takamichi, Y. Saito, Y. Ijima, R. Masumura, and H. Saruwatari, “Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • Y. Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2022. (accepted)
  • T. Saeki*, D. Xin*, W. Nakata*, T. Koriyama, S. Takamichi, and H. S. (*E. contribution), “UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • T. Saeki, S. Takamichi, T. Nakamura, N. Tanji, and H. Saruwatari, “SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • K. Udagawa, Y. Saito, and H. Saruwatari, “Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS,” in Proceedings of Interspeech, Sep. 2022. (accepted)
  • D. Xin, S. Takamichi, and H. Saruwatari, “Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations,” in Proceedings of ICML Expressive Vocalizations (ExVo) Workshop and Competition 2022, Jul. 2022. (accepted)
  • Y. Matsunaga, T. Saeki, S. Takamichi, and H. Saruwatari, “Personalized filled-pause generation with group-wise prediction models,” in Proceedings of Language Resources and Evaluation Conference (LREC), Jun. 2022. (accepted)
  • H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Robustness of signal processing-based pseudonymization method against decryption attack,” in Proceedings of Odyssey, Jun. 2022. (accepted)
  • N. Kimura, Z. Su, T. Saeki, and J. Rekimoto, “SSR7000: A Synchronized Corpus of Ultrasound Tongue Imaging for End-to-End Silent Speech Recognition,” in Proceedings of Language Resources and Evaluation Conference (LREC), Jun. 2022. (accepted)
  • J. G. C. Ribeiro, S. Koyama, and H. Saruwatari, “Region-to-region kernel interpolation of acoustic transfer function with directional weighting,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. (accepted)
  • K. Arikawa, S. Koyama, and H. Saruwatari, “Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. (accepted)
  • M. Kawamura, T. Nakamura, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2022. (accepted)

2021

  • S. Koyama, T. Nishida, K. Kimura, T. Abe, N. Ueno, and J. Brunnström, “MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 1–5, Oct. 2021.
  • K. Kimura, S. Koyama, N. Ueno, and H. Saruwatari, “Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis With Prior Information on Desired Field,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 281–285, Oct. 2021, pp. 281–285.
  • R. Ominato, N. Wakui, S. Takamichi, and S. Yano, “Discriminating between left and right ears using linear and nonlinear dimensionality reduction,” in SmaSys2021, Oct. 2021.
  • R. Arakawa, Z. Kashino, S. Takamichi, A. A. Verhulst, and M. Inami, “Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation,” in ACM ICMI, Oct. 2021, pp. 159–167.
  • R. Horiuchi, S. Koyama, J. G. C. Ribeiro, N. Ueno, and and Hiroshi Saruwatari, “Kernel learning for sound field estimation with L1 and L2 regularizations,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct. 2021, pp. 261–265.
  • N. Narisawa, R. Ikeshita, N. Takamune, D. Kitamura, T. Nakamura, H. Saruwatari, and T. Nakatani, “Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 326–330.
  • T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 331–335.
  • K. Saito, T. Nakamura, K. Yatabe, Y. Koizum, and H. Saruwatari, “Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method,” in Proceedings of European Signal Processing Conference (EUSIPCO), Aug. 2021, pp. 321–325.
  • K. Yufune, T. Koriyama, S. Takamichi, and H. Saruwatari, “Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder,” in Proceedings of The 11th ISCA SSW, Aug. 2021, pp. 189–194.
  • T. Nakamura, T. Koriyama, and H. Saruwatari, “Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer,” in Proceedings of Interspeech, Aug. 2021, pp. 121–125.
  • D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Speaker Adaptation using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis,” in Proceedings of Interspeech, Aug. 2021, pp. 1614–1618.
  • W. Nakata, T. Koriyama, S. Takamichi, N. Tanji, Y. Ijima, R. Masumura, and H. Saruwatari, “Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings,” in Proceedings of The 11th ISCA SSW, Aug. 2021, pp. 211–215.
  • K. Mizuta, T. Koriyama, and H. Saruwatari, “Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator,” in Proceedings of Interspeech, Aug. 2021, pp. 2192–2196.
  • Y. Ueda, K. Fujii, Y. Saito, S. Takamichi, Y. Baba, and H. Saruwatari, “HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 6468–6472.
  • D. Xin, T. Komatsu, S. Takamichi, and H. Saruwatari, “Disentangled Speaker and Language Representations using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 6608–6612.
  • Y. Ishikawa, S. Takamichi, T. Umemoto, M. Aikawa, K. Sakamoto, K. Yui, S. Fujiwara, A. Suto, and K. Nishiyama, “Japanese EFL learners’ speaking practice utilizing text-to-speech technology within a team-based flipped learning framework,” in Proceedings of International Conference on Human-Computer Interaction (HCII), Jun. 2021, pp. 283–291.
  • Y. Kondo, Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Deficient basis estimation of noise spatial covariance matrix for rank-constrained spatial covariance matrix estimation method in blind speech extraction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jun. 2021, pp. 806–810.
  • H. Kai, S. Takamichi, S. Shiota, and H. Kiya, “Lightweight voice anonymization based on data-driven optimization of cascaded voice modification modules,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), Jan. 2021, pp. 560–566.
  • T. Nishida, N. Ueno, S. Koyama, and H. Saruwatari, “Sensor Placement in Arbitrarily Restricted Region for Field Estimation Based on Gaussian Process,” in Proceedings of European Signal Processing Conference (EUSIPCO), Jan. 2021, pp. 2289–2293.
  • J. Brunnström and S. Koyama, “Kernel-Interpolation-Based Filtered-X Least Mean Square for Spatial Active Noise Control in Time Domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 161–165, 2021.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Convex and Differentiable Formulation for Inverse Problems in Hilbert Spaces with Nonlinear Clipping Effects,” in IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2021.
  • S. Koyama, K. Kimura, and N. Ueno, “Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation,” in International Conference on Immersive and 3D Audio (I3DA), 2021. (invited)
  • S. Koyama, T. Amakasu, N. Ueno, and H. Saruwatari, “Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only With Amplitude Constraint,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021, pp. 411–415.
  • S. Koyama, J. Brunnström, H. Ito, N. Ueno, and and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, pp. 3052–3063.
  • T. Hasumi, T. Nakamura, N. Takamune, H. Saruwatari, D. Kitamura, Y. Takahashi, and K. Kondo, “Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 1226–1233.
  • X. Luo, S. Takamichi, T. Koriyama, Y. Saito, and H. Saruwatari, “Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 794–799.
  • S. Misawa, N. Takamune, T. Nakamura, D. Kitamura, H. Saruwatari, M. Une, and S. Makino, “Speech enhancement by noise self-supervised rank-constrained spatial covariance matrix estimation via independent deeply learned matrix analysis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021.
  • T. Saeki, S. Takamichi, and H. Saruwatari, “Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network,” in Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 749–756.

2020

  • K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Student’s t-distribution,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2020, pp. 869–874.
  • J. Koguchi, S. Takamichi, and M. Morise, “PJS: phoneme-balanced Japanese singing-voice corpus,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2020, pp. 487–491.
  • T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Real-time, full-band, online DNN-based voice conversion system using a single CPU,” in Proceedings of Interspeech, Oct. 2020, pp. 1021–1022.
  • Y. Okamoto, K. Imoto, S. Takamichi, R. Yamanishi, T. Fukumori, and Y. Yamashita, “RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis,” in Proceedings of Detection and Classification of Acoustic Scenes and Events (DCASE), Oct. 2020, pp. 125–129.
  • M. Aso, S. Takamichi, and H. Saruwatari, “End-to-end text-to-speech synthesis with unaligned multiple language units based on attention,” in Proceedings of Interspeech, Oct. 2020, pp. 4009–4013.
  • D. Xin, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Cross-lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space,” in Proceedings of Interspeech, Oct. 2020, pp. 2947–2951.
  • N. Kimura, Z. Su, and T. Saeki, “End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge,” in Proceedings of Interspeech, Oct. 2020, pp. 1025–1026.
  • Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “Investigating Effective Additional Contextual Factors in DNN-based Spontaneous Speech Synthesis,” in Proceedings of Interspeech, Oct. 2020, pp. 3201–3205.
  • S. Goto, K. Ohnishi, Y. Saito, K. Tachibana, and K. Mori, “Face2Speech: towards multi-speaker text-to-speech synthesis using an embedding vector predicted from a face image,” in Proceedings of Interspeech, Oct. 2020, pp. 1321–1325.
  • K. Mitsui, T. Koriyama, and H. Saruwatari, “Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes,” in Proceedings of Interspeech, Oct. 2020, pp. 2032–2036.
  • H. Takeuchi, K. Kashino, Y. Ohishi, and H. Saruwatari, “Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals,” in Proc. Interspeech, Sep. 2020, pp. 185–189.
  • N. Iijima, K. Shoichi, and H. Saruwatari, “Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements,” in IEEE International Workshop on Multimedia Signal Processing (MMSP), Sep. 2020, pp. 1–6.
  • S. Kozuka, T. Nakamura, and H. Saruwatari, “Investigation on Wavelet Basis Function of DNN-based Time Domain Audio Source Separation Inspired by Multiresolution Analysis,” in Proceedings of Internoise, Aug. 2020.
  • R. Okamoto, S. Yano, N. Wakui, and S. Takamichi, “Visualization of differences in ear acoustic characteristics using t-SNE,” in Proceedings of AES convention, May 2020.
  • T. Koriyama and H. Saruwatari, “Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 7249–7253.
  • T. Saeki, Y. Saito, S. Takamichi, and H. Saruwatari, “Lifter training and sub-band modeling for computationally efficient and high-quality voice conversion using spectral differentials,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 7784–7788.
  • T. Nakamura and H. Saruwatari, “Time-domain Audio Source Separation based on Wave-U-Net Combined with Discrete Wavelet Transform,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 386–390.
  • K. Kamo, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 606–610.
  • T. Kondo, K. Fukushige, N. Takamune, D. Kitamura, H. Saruwatari, R. Ikeshita, and T. Nakatani, “Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student’s T Distribution,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 681–685.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “SMASH corpus: a spontaneous speech corpus recording third-person audio commentaries on gameplay,” in Proceedings of Language Resources and Evaluation Conference (LREC), May 2020, pp. 6571–6577.
  • K. Ariga, T. Nishida, S. Koyama, N. Ueno, and H. Saruwatari, “Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 166–170.
  • Y. Yamashita, T. Koriyama, Y. Saito, S. Takamichi, Y. Ijima, R. Masumura, and H. Saruwatari, “DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus,” in Proceedings of Language Resources and Evaluation Conference (LREC), May 2020, pp. 6438–6443.
  • H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2020, pp. 8404–8408. (invited)
  • S. Koyama, G. Chardon, and L. Daudet, “Optimizing Source and Sensor Placement for Sound Field Control: An Overview,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020. (overview)
  • G. Chardon, S. Koyama, and L. Daudet, “Numerical Evaluation of Source and Sensor Placement Methods For Sound Field Control,” in Forum Acusticum, 2020.

2019

  • N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Robust demixing filter update algorithm based on microphone-wise coordinate descent for independent deeply learned matrix analysis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 1868–1873.
  • Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 332–338.
  • M. Une, Y. Kubo, N. Takamune, D. Kitamura, H. Saruwatari, and S. Makino, “Evaluation of multichannel hearing aid system using rank-constrained spatial covariance matrix estimation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, Nov. 2019, pp. 1874–1879.
  • M. Nakanishi, N. Ueno, S. Koyama, and H. Saruwatari, “Two-dimensional sound field recording with multiple circular microphone arrays considering multiple scattering,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2019.
  • R. Arakawa, S. Takamichi, and H. Saruwatari, “TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication,” in Proceedings of UIST, New Orleans, Oct. 2019.
  • Y. Kubo, N. Takamune, D. Kitamura, and H. Saruwatari, “Efficient full-rank spatial covariance estimation using independent low-rank matrix analysis for blind source separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), A Coruña, Sep. 2019.
  • N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, and K. Kondo, “Column-wise update algorithm for independent deeply learned matrix analysis,” in Proceedings of international congress on acoustics (ICA), Aachen, Sep. 2019, pp. 2805–2812. [Young Scientist Conference Attendance Grant]
  • I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Speech Quality Evaluation of Synthesized Japanese Speech using EEG,” in Proceedings of Interspeech, Graz, Sep. 2019, pp. 1228–1232.
  • T. Nakamura, Y. Saito, S. Takamichi, Y. Ijima, and H. Saruwatari, “V2S attack: building DNN-based voice conversion from automatic speaker verification,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
  • H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Three-dimensional spatial active noise control based on kernel-induced sound field interpolation,” in Proceedings of international congress on acoustics (ICA), Aachen, Sep. 2019.
  • M. Aso, S. Takamichi, N. Takamune, and H. Saruwatari, “Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
  • R. Arakawa, S. Takamichi, and H. Saruwatari, “Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device,” in Proceedings of The 10th ISCA SSW, Vienna, Sep. 2019.
  • T. Koriyama, S. Takamichi, and T. Kobayashi, “Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis,” in Proceedings of The 10th ISCA SSW, Vienna, Aug. 2019.
  • Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Comparison of Interpolation Methods for Gridless Sound Field Decomposition Based on Reciprocity Gap Functional,” in Proceedings of International Congress on Sound and Vibration (ICSV), Montreal, Jul. 2019. (to appear) [Invited]
  • I. H. Parmonangan, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “EEG Analysis towards Evaluating Synthesized Speech Quality,” in Proceedings of IEEE EMBC, Berlin, Jul. 2019.
  • N. Makishima, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, and H. Nakajima, “Generalized-Gaussian-distribution-based independent deeply learned matrix analysis for multichannel audio source separation,” in Proceedings of International Congress and Exhibition on Noise Control Engineering (INTERNOISE), Madrid, Jun. 2019.
  • H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, “Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019.
  • K. Naruse, S. Yoshida, S. Takamichi, T. Narumi, T. Tanikawa, and M. Hirose, “Estimating Confidence in Voices using Crowdsourcing for Alleviating Tension with Altered Auditory Feedback,” in Proceedings of Asian CHI Symposium: Emerging HCI Research Collection in ACM Conference on Human Factors in Computing Systems (CHI), Glasgow, May 2019.
  • T. Koriyama and T. Kobayashi, “A Training Method Using DNN-guided Layerwise Pretraining for Deep Gaussian Processes,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019.
  • H. Ito, S. Koyama, N. Ueno, and H. Saruwatari, “Feedforward Spatial Active Noise Control Based on Kernel Interpolation of Sound Field,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019. (to appear)
  • Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Robust Gridless Sound Field Decomposotion Based on Structured Reciprocity Gap Functional in Spherical Harmonic Domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, May 2019. (to appear)
  • K. Yoshino, Y. Murase, N. Lubis, K. Sugiyama, H. Tanaka, S. Sakriani, S. Takamichi, and S. Nakamura, “Spoken Dialogue Robot for Watching Daily Life of Elderly People,” in Proceedings of IWSDS, Sicily, Apr. 2019.

2018

  • M. Une, Y. Saito, S. Takamichi, D. Kitamura, R. Miyazaki, and H. Saruwatari, “Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018, pp. 99–103.
  • T. Akiyama, S. Takamichi, and H. Saruwatari, “Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
  • S. Mogami, N. Takamune, D. Kitamura, H. Saruwatari, Y. Takahashi, K. Kondo, H. Nakajima, and N. Ono, “Independent low-rank matrix analysis based on time-variant sub-Gaussian source model,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018. [APSIPA ASC 2018 Best Paper Award]
  • H. Suda, G. Kotani, S. Takamichi, and D. Saito, “A revisit to feature handling for high-quality voice conversion,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
  • S. Shiota, S. Takamichi, and T. Matsui, “Data augmentation with moment-matching networks for i-vector based speaker verification,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Hawaii, Nov. 2018.
  • S. Koyama, “Sparsity-based sound field reconstruction,” in Tohoku Universal Acoustical Communication Month, Seminar on the spatial aspects of hearing and their applications, keynote lecture, Sendai, Oct. 2018. [Invited]
  • N. Ueno, S. Koyama, and H. Saruwatari, “Kernel Ridge Regression With Constraint of Helmholtz Equation for Sound Field Interpolation,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Sep. 2018, pp. 436–440.
  • S. Takamichi, Y. Saito, N. Takamune, D. Kitamura, and H. Saruwatari, “Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Tokyo, Sep. 2018.
  • Y. Takida, S. Koyama, and H. Saruwatari, “Exterior and Interior Sound Field Separation Using Convex Optimization: Comparison of Signal Models,” in Proceedings of European Signal Processing Conference (EUSIPCO), Rome, Sep. 2018, pp. 2567–2571.
  • S. Mogami, H. Sumino, D. Kitamura, N. Takamune, S. Takamichi, H. Saruwatari, and N. Ono, “Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Rome, Sep. 2018.
  • Y. Takida, S. Koyama, N. Ueno, and H. Saruwatari, “Gridless Sound Field Decomposition Based on Reciprocity Gap Functional in Spherical Harmonic Domain,” in Proceedings of IEEE sensor array and multichannel signal processing workshop (SAM), Sheffield, Jul. 2018, pp. 627–631. [Best Student Paper Award, ONRG sponsored student travel grants]
  • S. Takamichi and H. Saruwatari, “CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects,” in Proceedings of Language Resources and Evaluation Conference (LREC), Miyazaki, May 2018, pp. 434–437.
  • S. Koyama, G. Chardon, and L. Daudet, “Joint Source and Sensor Placement for Sound Field Control Based on Empirical Interpolation Method,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 501–505.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Sound Field Reproduction with Exterior Radiation Cancellation Using Analytical Weighting of Harmonic Coefficients,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 466–470. [IEEE SPS Japan Student Conference Paper Award]
  • Y. Saito, S. Takamichi, and H. Saruwatari, “Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 5299–5303.
  • Y. Saito, Y. Ijima, K. Nishida, and S. Takamichi, “Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Calgary, Apr. 2018, pp. 5274–5278.

2017

  • N. Mae, Y. Mitsui, S. Makino, D. Kitamura, N. Ono, T. Yamada, and H. Saruwatari, “Sound source localization using binaural different for hose-shaped rescue robot,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Dec. 2017.
  • Y. Mitsui, D. Kitamura, N. Takamune, H. Saruwatari, Y. Takahashi, and K. Kondo, “Independent low-rank matrix analysis based on parametric majorization-equalization algorithm,” in Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Curaçao, Dec. 2017.
  • S. Koyama and L. Daudet, “Comparison of Reverberation Models for Sparse Sound Field Decomposition,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2017, pp. 214–218.
  • S. Takamichi, D. Saito, H. Saruwatari, and N. Minematsu, “The UTokyo speech synthesis system for Blizzard Challenge 2017,” in Proceedings of Blizzard Challenge Workshop, Stockholm, Aug. 2017.
  • S. Takamichi, “Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Aug. 2017. [Invited]
  • D. Kitamura, N. Ono, and H. Saruwatari, “Experimental analysis of optimal window length for independent low-rank matrix analysis,” in Proceedings of Proceedings of 25th European Signal Processing Conference, Greek island of Kos, Aug. 2017. [Invited]
  • S. Takamichi, T. Koriyama, and H. Saruwatari, “Sampling-based speech parameter generation using moment-matching network,” in Proceedings of Interspeech, Stockholm, Aug. 2017.
  • H. Miyoshi, Y. Saito, S. Takamichi, and H. Saruwatari, “Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities,” in Proceedings of Interspeech, Stockholm, Aug. 2017.
  • S. Koyama, N. Murata, and H. Saruwatari, “Effect of Multipole Dictionary in Sparse Sound Field Decomposition For Super-resolution in Recording and Reproduction,” in Proceedings of International Congress on Sound and Vibration (ICSV), London, Jul. 2017. [Invited]
  • Y. Mitsui, D. Kitamura, S. Takamichi, N. Ono, and H. Saruwatari, “Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 21–25. [Student Paper Contest Finalist]
  • N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction Based On Circular Harmonic Expansion,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 111–115.
  • N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Spatio-temporal Sparse Sound Field Decomposition Considering Acoustic Source Signal Characteristics,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 441–445.
  • N. Ueno, S. Koyama, and H. Saruwatari, “Listening-area-informed Sound Field Reproduction With Gaussian Prior Based On Circular Harmonic Expansion,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), San Francisco, Mar. 2017, pp. 196–200.
  • R. Sato, H. Kameoka, and K. Kashino, “Fast algorithm for statistical phrase/accent command estimation based on generative model incorporating spectral features,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 5595–5599.
  • Y. Saito, S. Takamichi, and H. Saruwatari, “Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, Mar. 2017, pp. 4900–4904. [Spoken Language Processing Student Grant]
  • N. Mae, M. Ishimura, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego noise reduction for hose-shaped rescue robot combining independent low-rank matrix analysis and multichannel noise cancellation,” in Proceedings of International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Grenoble, Feb. 2017, pp. 141–151.

2016

  • H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, Y. Takahashi, and K. Kondo, “Audio Signal Separation using Supervised NMF with Time-variant All-Pole-Model-Based Basis Deformation,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Jeju, Dec. 2016. [Invited]
  • S. Koyama, N. Murata, and H. Saruwatari, “Super-resolution in sound field recording and reproduction based on sparse representation,” in Proceedings of 5th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan, Honolulu, Nov. 2016. [Invited]
  • M. Ishimura, S. Makino, T. Yamada, N. Ono, and H. Saruwatari, “Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-III-04.
  • D. Kitamura, N. Ono, H. Saruwatari, Y. Takahashi, and K. Kondo, “Discriminative and reconstructive basis training for audio source separation with semi-supervised nonnegative matrix factorization,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-III-02.
  • M. Takakusaki, D. Kitamura, N. Ono, T. Yamada, S. Makino, and H. Saruwatari, “Ego-noise reduction for a hose-shaped rescue robot using determined rank-1 multichannel nonnegative matrix factorization,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Xian, Sep. 2016, no. PS-II-02.
  • K. Kobayashi, S. Takamichi, S. Nakamura, and T. Toda, “The NU-NAIST voice conversion system for the Voice Conversion Challenge 2016,” in Proceedings of Interspeech, San Francisco, Sep. 2016, pp. 1667–1671.
  • L. Li, H. Kameoka, T. Higuchi, and H. Saruwatari, “Semi-supervised joint enhancement of spectral and cepstral sequences of noisy speech,” in Proceedings of Interspeech, San Francisco, Sep. 2016, pp. 3753–3757.
  • H. Nakajima, D. Kitamura, N. Takamune, S. Koyama, H. Saruwatari, N. Ono, Y. Takahashi, and K. Kondo, “Music signal separation using supervised NMF with all-pole-model-based discriminative basis deformation,” in Proceedings of The 2016 European Signal Processing Conference (EUSIPCO), Budapest, Aug. 2016, pp. 1143–1147.
  • N. Murata, H. Kameoka, K. Kinoshita, S. Araki, T. Nakatani, S. Koyama, and H. Saruwatari, “Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution,” in Proceedings of The 2016 European Signal Processing Conference (EUSIPCO), Budapest, Aug. 2016, pp. 1648–1652.
  • S. Koyama, “Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry,” in Proceedings of 2016 AES International Conference on Sound Field Control, Guildford, Jul. 2016 [Online]. Available at: http://www.aes.org/e-lib/browse.cfm?elib=18303 [Invited]
  • Y. Mitsufuji, S. Koyama, and H. Saruwatari, “Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 56–60.
  • N. Murata, S. Koyama, H. Kameoka, N. Takamune, and H. Saruwatari, “Sparse sound field decomposition with multichannel extension of complex NMF,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 345–349.
  • S. Koyama and H. Saruwatari, “Sound field decomposition in reverberant environment using sparse and low-rank signal models,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, Mar. 2016, pp. 395–399.

2015

  • S. Koyama, A. Matsubayashi, N. Murata, and H. Saruwatari, “Sparse Sound Field Decomposition Using Group Sparse Bayesian Learning,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Dec. 2015, pp. 850–855. [Invited]
  • N. Murata, S. Koyama, N. Takamune, and H. Saruwatari, “Sparse Sound Field Decomposition with Parametric Dictionary Learning for Super-Resolution Recording and Reproduction,” in Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), Dec. 2015.
  • S. Koyama, K. Ito, and H. Saruwatari, “Source-location-informed sound field recording and reproduction with spherical arrays,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, Oct. 2015.
  • D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 276–280.
  • S. Koyama, N. Murata, and H. Saruwatari, “Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 619–623.
  • Y. Murota, D. Kitamura, S. Koyama, H. Saruwatari, and S. Nakamura, “Statistical modeling of binaural signal and its application to binaural source separation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Apr. 2015, pp. 494–498.
  • D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, “Relaxation of rank-1 spatial constraint in overdetermined blind source separation,” in Proceedings of European Signal Processing Conference (EUSIPCO), Nice, 2015, pp. 1261–1265. [Invited]
  • H. Saruwatari, “Statistical-model-based speech enhancement With musical-noise-free properties,” in in Proceedings of 2015 IEEE International Conference on Digital Signal Processing (DSP2015), Singapore, 2015. [Invited]

2014

  • D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Siem Reap, Dec. 2014. [Invited]
  • S. Koyama, P. Srivastava, K. Furuya, S. Shimauchi, and H. Ohmuro, “STSP: Space-Time Stretched Pulse for Measuring Spatio-Temporal Impulse Response,” in Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2014, pp. 309–313.
  • D. Kitamura, H. Saruwatari, S. Nakamura, Y. Takahashi, K. Kondo, and H. Kameoka, “Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905839.
  • F. Aprilyanti, H. Saruwatari, K. Shikano, S. Nakamura, and T. Takatani, “Optimized joint noise suppression and dereverberation based on blind signal extraction For hands-free speech recognition system,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905697.
  • S. Nakai, H. Saruwatari, R. Miyazaki, S. Nakamura, and K. Kondo, “Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement,” in Proceedings of Hands-free Speech Communication and Microphone Arrays (HSCMA), Nancy, May 2014, no. 1569905751.
  • Y. Murota, D. Kitamura, S. Nakai, H. Saruwatari, S. Nakamura, Y. Takahashi, and K. Kondo, “Music signal separation based on Bayesian spectral amplitude estimator with Automatic target prior adaptation,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 7540–7544.
  • S. Koyama, S. Shimauchi, and H. Ohmuro, “Sparse Sound Field Representation in Recording and Reproduction for Reducing Spatial Aliasing Artifacts,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 4476–4480.
  • Y. Haneda, K. Furuya, S. Koyama, and K. Niwa, “Close-talking spherical microphone array using sound pressure interpolation based on spherical harmonic expansion,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, May 2014, pp. 604–608.

Preprints

2020

  • N. T. Shinnosuke Takamichi Mamoru Komachi and H. Saruwatari, “JSSS: free Japanese speech corpus for summarization and simplification,” in arXiv, Oct. 2020.