![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
Natural language based malicious domain detection using machine learning and deep learning
![Scientific and Technical Journal of Information Technologies, Mechanics and Optics](/images/mag-ntv.png)
Annotation
Cyberattacks are still challenging since they are increasing day by day. Cybercriminals employ a variety of strategies to manipulate and exploit their targets vulnerabilities. Malicious URLs are one such strategy which is used to target large groups on various social media platforms. To draw internet users, these web addresses are disguised as being safe. Deliberate or inadvertent use of such URLs exposes the user or the organization in the cyberspace and opens the way for further attacks. Systems that use rules-based or machine learning algorithms to find malicious URLs usually rely on feature engineering. This requires domain expertise and experience. Sometimes, even after extracting features from a dataset, it may not completely leverage the potential of the dataset. The proposed method employs Natural Language Processing (NLP) approaches to vectorize the words in the URLs and applies machine learning and deep learning models for classification. Vectorization technique in NLP reduces the effort of feature engineering and maximizing the use of the dataset. For the experiment, two separate datasets are used. To vectorize the URL text, three different vectorization methods are used. To evaluate the performance of the proposed method, two different datasets (D1 and D2) that are regularly utilized in the research domain were used. The results demonstrate that the superior accuracy of 92.4 % with the D1 dataset is achieved by the Decision Tree (DT) with count vectorizer and the Random Forest (RF) with Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer. With the D2 dataset, DT with TF-IDF vectorizer obtains a greater accuracy of 99.5 %. The Artificial Neural Network (ANN) model achieves 89.6 % accuracy with the D1 dataset and 99.2 % accuracy with the D2 dataset.
Keywords
Постоянный URL
Articles in current issue
- Polymer composition with phenanthrenequinone for recording relief holographic gratings
- Modern approaches to the application of mathematical modeling methods in biomedical research
- Analysis of the phase images obtained during the collection of a holographic registration system based on the geometric phase effect and a polarization camera
- Color triangle color separation system for colorimetric research in microscopy
- The concept of aerial photography using a two-element active optoelectronic complex
- Variational problem of adaptive optimal control. Theoretical and applied computer analysis
- Brief review of the development of theories of robustness, roughness and bifurcations of dynamic systems
- Multiple context-free path querying by matrix multiplication
- Predicting the results of the 16-factor R. Cattell test based on the analysis of text posts of social network users
- Methodology for the control of electric power distribution system components to ensure the quality of consumed electricity
- Voice based answer evaluation system for physically disabled students using natural language processing and machine learning
- Hybrid JAYA algorithm for workflow scheduling in cloud
- Information model of the essential goods purchase duration
- Analysis and control of user engagement in personalized mobile assisting software for chronic disease patients
- Role discovery in node-attributed public transportation networks: the model description
- A survey of network intrusion detection systems based deep learning approaches
- Monitoring the health status of the population by age groups
- An intelligent shell game optimization based energy consumption analytics model for smart metering data
- Active voltage damping method with negative DC link current feedback in electric and hybrid electric transmissions
- Comparative analysis of switched reluctance motor control algorithms
- Gas dynamics of stationary supersonic gas jets with inert particles exhausting into a medium with low pressure
- Mixed forms of free oscillations of a rectangular CFCF-plate
- Modeling of heat-hydrodynamic processes in evaporators of low-temperature systems with intrachannel boiling of refrigerants
- High performance modeling of the stress-strain state of thin-walled shell structures with the use of deep learning
- Validation of state machine specifications