IMPLEMENTASI METODE RANDOM FOREST PADA TEXT MINING UNTUK KLASIFIKASI SMS SPAM MENGGUNAKAN PYTHON
Abstract
The use of the Random Forest method in the field of text mining has become the focus of research in efforts to combat Short Message Service (SMS) spam. This research proposes a Random Forest approach to SMS spam classification using the Python programming language. For the training process, datasets that have been labelled for SMS types, namely 0 = normal SMS, 1 = fraudulent SMS (spam), 2 = promo SMS, are used as training and test datasets to measure the performance of the Random Forest method. The first step to identify SMS spam is to collect SMS data from different sources to build a representative dataset. The Random Forest model was implemented using the scikit-learn library in the Python environment. The model training process involves dividing the data into a training set and a validation set to measure the performance of the method. Various evaluation metrics such as accuracy, precision, recall and F1 score are used to evaluate the performance of the method. The research results show that the Random Forest method is able to provide a good classification in distinguishing between spam SMS, normal SMS and promo SMS with an accuracy rate of 89%, with precision calculation results = 0.97%, recall = 0.83% and F1 score = 0.89%. The implementation of the random forest method shows a satisfactory level of accuracy and is able to identify most spam SMS with a low error rate.
Full Text:
PDFReferences
Amani Alzahrani, D. B. rawat. (2019). Comparative Study of Machine Learning Algorithms SMS Spam Detection. UTC from IEEE Xplore.
Elagamy, M. N., Stanier, C., & Sharp, B. (2018). Stock market random forest-text mining system mining critical indicators of stock market movements. 2nd International Conference on Natural Language and Speech Processing, ICNLSP 2018, 1–8. https://doi.org/10.1109/ICNLSP.2018.8374370
Harjito, B., Wijayanto, A., Aini, K. N., & Murtiyas, B. (2019). Comparison of Multinomial Naïve Bayes with K-Nearest Neighbors, Support Vector Machine and Random Forest for Classification of “Network Attacks” Document. Proceedings of 2019 4th International Conference on Informatics and Computing, ICIC 2019. https://doi.org/10.1109/ICIC47613.2019.8985919
Ismail, S. S. I., Mansour, R. F., Abd El-Aziz, R. M., & Taloba, A. I. (2022). Efficient E-Mail Spam Detection Strategy Using Genetic Decision Tree Processing with NLP Features. Computational Intelligence and Neuroscience, 2022. https://doi.org/10.1155/2022/7710005
Karyawati, A. E., Wijaya, K. D. Y., Supriana, I. W., & Supriana, I. W. (2023). a Comparison of Different Kernel Functions of Svm Classification Method for Spam Detection. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 8(2), 91–97. https://doi.org/10.33480/jitk.v8i2.2463
Khalik, M. F. M., & Arifin, F. (2023). Klasifikasi Indeks Kedalaman Kemiskinan Provinsi Sulawesi Selatan Berbasis Decision Tree, K-Nearest Neighbor, Naive Bayes, Neural Network, dan Random Forest. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 9(2), 282. https://doi.org/10.26418/jp.v9i2.67492
Kurniawan, I., Buani, D. C. P., Abdussomad, A., Apriliah, W., & Saputra, R. A. (2023). Implementasi Algoritma Random Forest Untuk Menentukan Penerima Bantuan Raskin. Jurnal Teknologi Informasi Dan Ilmu Komputer, 10(2), 421–428. https://doi.org/10.25126/jtiik.20231026225
Maqsood, U., Ur Rehman, S., Ali, T., Mahmood, K., Alsaedi, T., & Kundi, M. (2023). An Intelligent Framework Based on Deep Learning for SMS and e-mail Spam Detection. Applied Computational Intelligence and Soft Computing, 2023(Dl). https://doi.org/10.1155/2023/6648970
Pan, W., Li, J., Gao, L., Yue, L., Yang, Y., Deng, L., & Deng, C. (2022). Semantic Graph Neural Network: A Conversion from Spam Email Classification to Graph Classification. Scientific Programming, 2022(ii). https://doi.org/10.1155/2022/6737080
Parida, U., Nayak, M., & Nayak, A. K. (2021). News text categorization using random forest and naïve bayes. 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology, ODICON 2021, 0–3. https://doi.org/10.1109/ODICON50556.2021.9428925
Savargiv, M., Masoumi, B., & Keyvanpour, M. R. (2021). A new random forest algorithm based on learning automata. Computational Intelligence and Neuroscience, 2021. https://doi.org/10.1155/2021/5572781
Sejati, P., Munawar, Pilliang, M., & Akbar, H. (2022). Studi Komparasi Naive Bayes , K-Nearest Neighbor, dan Random Forest Untuk Prediksi Calon Mahasiswa Yang Diterima Atau Comparative Study Of Naive Bayes , K-Nearest Neighbor , And Random Forest For The Prediction Of Prospective Students. Jurnal Teknologi Informasi Dan Ilmu Komputer (JTIIK), 9(7), 1341–1348. https://doi.org/10.25126/jtiik.202296737
Singh, A. B., Singh, K. M., Chanu, Y. J., Thongam, K., & Singh, K. J. (2022). An Improved Image Spam Classification Model Based on Deep Learning Techniques. Security and Communication Networks, 2022. https://doi.org/10.1155/2022/8905424
Singh, S. N., & Sarraf, T. (2020). Sentiment analysis of a product based on user reviews using random forests algorithm. Proceedings of the Confluence 2020 - 10th International Conference on Cloud Computing, Data Science and Engineering, 112–116. https://doi.org/10.1109/Confluence47617.2020.9058128
Sisodia, D. S., Mahapatra, S., & Sharma, A. (2020). Automated SMS classification and spam analysis using topic modeling. 2nd International Conference on Data, Engineering and Applications, IDEA 2020. https://doi.org/10.1109/IDEA49133.2020.9170710
Tambunan, S. M., Nataliani, Y., & Lestari, E. S. (2021). Perbandingan Klasifikasi dengan Pendekatan Pembelajaran Mesin untuk Mengidentifikasi Tweet Hoaks di Media Sosial Twitter. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 7(2), 112. https://doi.org/10.26418/jp.v7i2.47232
Trishna, T. I., Emon, S. U., Ema, R. R., Sajal, G. I. H., Kundu, S., & Islam, T. (2019). Detection of Hepatitis (A, B, C and E) Viruses Based on Random Forest, K-nearest and Naïve Bayes Classifier. 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 1–7. https://doi.org/10.1109/ICCCNT45670.2019.8944455
Tyasnurita, R., & Hapsari, S. W. (2020). Identification of Chronic Kidney Disease Using Naive Bayes, Adaboost, and Random Forest Learning Methods. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 6(1), 115–120. https://doi.org/10.33480/jitk.v6i1.1403
Utami, M. P., Nurhayati, O. D., & Warsito, B. (2020). Hoax Information Detection System Using Apriori Algorithm and Random Forest Algorithm in Twitter. 6th International Conference on Interactive Digital Media, ICIDM 2020, Icidm. https://doi.org/10.1109/ICIDM51048.2020.9339648
Virra, K., Andreswari, R., & Hasibuan, M. A. (2019). Sentiment Analysis of Social Media Users Using Naïve Bayes, Decision Tree, Random Forest Algorithm: A Case Study of Draft Law on the Elimination of Sexual Violence (RUU PKS). ICSECC 2019 - International Conference on Sustainable Engineering and Creative Computing: New Idea, New Innovation, Proceedings, 239–244. https://doi.org/10.1109/ICSECC.2019.8907228
DOI: http://dx.doi.org/10.36723/juri.v17i1.742
Refbacks
- There are currently no refbacks.

Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.
View My Stats











