PHENOMENOLOGICAL DESCRIPTION OF INTERNET DOCUMENTS COLLECTING AND PROCESSING
Annotation
The state of the Internet as a repository of information resources is analyzed from the point of view of a bot - a program that collects data for the purpose of monitoring resources, filling a search engine, or other commercial or research purposes. An approach is proposed to describe the problem under study through a set of phenomena that arise when collecting documents on the Internet. The described phenomena must be taken into account when developing monitoring systems or search engines. A number of features that arise during web scraping, harvesting and other cases of using bots to collect data on the Internet are given. The problems of using subdomains, recursive subdomains, dynamically loaded content technologies, search engine optimization of text content and others are described. It is shown that the task of collecting data from Internet resources is not only technological, but also to a greater extent knowledge-intensive, and since research is in an active phase, there is no “out-of-the-box” solution for it. The article will be useful to researchers in the field of Internet development, search engine developers, specialists in data retrieval and Internet technologies, as well as specialists in the field of creation and support of Internet resources and in the field of Internet marketing.
Keywords
Постоянный URL
Articles in current issue
- APPLICATION OF BIG DATA METHODS FOR COMPARING DATA OF GEOMAGNETIC OBSERVATORIES IN THE INTERMAGNET NETWORK
- PHENOMENOLOGICAL DESCRIPTION OF INTERNET DOCUMENTS COLLECTING AND PROCESSING
- DIGITAL IMPLEMENTATION OF VARIABLE DELAY IN MODELING AND CONTROL SYSTEMS
- PRACTICAL APPLICATION OF WORKPLACE MODELS FOR VERIFICATION OF MEASURING INSTRUMENTS AS NON-STATIONARY SERVICE SYSTEMS
- ANALYTICAL APPROACH TO SELECTIVE SEARCH FOR STATE PROBABILITY FUNCTIONS IN MARKOV CHAINS
- PARAMETERIZATION ALGORITHM FOR NON-STATIONARY SYSTEMS USING A DYNAMIC CONTROLLER
- EVALUATION OF MATERIAL HARDNESS USING WEAR TESTING BY CHORD METHOD
- THE USE OF SWARM ALGORITHMS IN TECHNOLOGICAL PREPARATION OF PRODUCTION
- INDEX OF ARTICLES PUBLISHED IN 2023