Skip to main content

ImageCrawler/WebCrawler for A WebSite

The ImageCrawler/WebCrawler Application is developed to crawl any WebSite to find out missing content or images. This is developed based on Search Algorithm, and has two Versions of Code in it.
One  Version  for using Selenium, which takes screenshots of the error page URLs, and another version runs in the background using a shell script and captures all the page URLs. The end results are emailed to the recipient's list.

Feel free to use it, the code is available for download on Git. Let me know Your feedback.

ImageCrawler  on GitHub

Comments

Popular posts from this blog

HashMap in Java

1) Implement HashMap in Java, with the put and get operations   HashMap can be implemented in Java Using Arrays. Use the same logic that the Out of the Box   HashMap follows, for resizing, and load factor, when ever the HashMap reaches the size of the   resize with the load factor a new Array is created, and the previous array contents are copied over   to the new Array.  HashMap is Not Synchronized by default. We can synchronize the whole map by using Synchronization, or by using collection.synchronizedmap(map), which synchronizes all the operations on the map. Alternatively We can use the CocurrentHashMap which does not lock the read operations, rather locks the segments that are being written. 2) HashMap vs LinkedHashMap vs IdentityHashMap 3) HashMap vs ConcurrentHashMap 4) Implement a Cache using LinkedHashMap