Design And Implementation Of Adversarial Neural Network For Voice Data Processing
DOI:
https://doi.org/10.37676/jki.v2i1.562Keywords:
Implementation, Adversarial Neural Network, Voice Data ProcessingAbstract
In today's digital age, voice data processing has become an important area in information and communication technology. Adversarial Neural Networks (GANs) are one of the recent methods that show great potential in improving the quality and efficiency of voice data processing. This article discusses the design and implementation of GANs for speech data processing, focusing on model architecture, optimization techniques, and performance evaluation. The results show that GANs can produce better speech representations and improve processing quality compared to traditional methods. It also explores the challenges faced in the implementation of GANs and provides recommendations for future development.
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27, 2672-2680.
Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of Wasserstein GANs. In Advances in neural information processing systems (pp. 5767-5777).
Donahue, C., McAuley, J., & Puckette, M. (2018). Adversarial audio synthesis. In International Conference on Learning Representations (ICLR).
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., & Roberts, A. (2019). Gansynth: Adversarial neural audio synthesis. In International Conference on Learning Representations (ICLR).
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations (ICLR).
Choi, H., & Jang, H. (2018). Generative adversarial networks for efficient processing of audio signals. IEEE Signal Processing Magazine, 35(3), 123-135.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401-4410).
Pons, J., Serra, J., & Fuentes, J. (2019). Rethinking conditional GAN training: Improving class consistency in conditional GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8705-8714).
Vasquez, A., & Lewis, M. (2019). MelNet: A Generative Model for Audio in the Frequency Domain. arXiv preprint arXiv:1906.01083.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
Pandey, A., & Wang, D. (2019). Densely connected time-domain convolutional networks for real-time speech enhancement. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6875-6879). IEEE.
Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K. W., & Vinyals, O. (2015). Learning the speech front-end with raw waveform CLDNNs. In Sixteenth Annual Conference of the International Speech Communication Association.
Zhao, Y., & Li, Y. (2018). Investigating generative adversarial networks for speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 563-576.
Lippmann, R. P. (1989). Review of neural networks for speech recognition. The Journal of the Acoustical Society of America, 87(4), 1389-1409.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Heskyel Pranata Tarigan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





