itsMe Posted September 17, 2023 Share Posted September 17, 2023 This is the hidden content, please Sign In or Sign Up The basic procedure executed by the web crawling algorithm takes a list of seed URLs as its input and repeatedly executes the following steps: Remove a URL from the URL list. Check existence of the page. Download the corresponding page. Check the Relevancy of the page. Extract any links contained in it. Check the cache if the links are already in it. Add the unique links back to the URL list. After all URLs are processed, return the most relevant page. Features Onion Crawler (.onion) Returns Page title and address with a short description about the site Save links to database Get data from site Save crawl info to JSON file Crawl custom domains Check if the link is live Built-in Updater Build visual tree of link relationship that can be quickly viewed or saved to an image file Visualization Module Revamp Implement BFS Search for webcrawler Use Golang service for concurrent webcrawling Improve stability (Handle errors gracefully, expand test coverage and etc.) Improve performance (Done with gotor) Changelog v3.1.1 Update release by @KingAkeem in #288 Update README.md by @KingAkeem in #289 Improving release process by @KingAkeem in #290 This is the hidden content, please Sign In or Sign Up Link to comment Share on other sites More sharing options...
Recommended Posts