Showing posts with label jdk 1.7. Show all posts
Showing posts with label jdk 1.7. Show all posts

Tuesday, October 8

Website Crawler with fork and Join Framework

                Website Crawler with fork and Join Framework 
Here are the classes involved in writing code for this exercise . It can be directly copied and executed using java 7 as fork and Join libraries are available in java only version 1.7 onwards.

Along with these classes you would need HTMLParser jar file , which is used to retrieve links available in a page linked to a particular link. 

Please download htmlparser-1.6.jar file and include in the class path to execute below code


WebsiteCrawler class initiates the logic . It create ForkJoinPool which is used to contain the threads to take up and execute the work stealing work is divided among these threads and is executed is parallel . Thus overall processing is executed faster and multiple processor/core hardware is effectively utilized

import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.concurrent.ForkJoinPool;

 * @author Manoj

public class WebsiteCrawler implements LinkTracker {

    private final Collection linksCrawled = Collections.synchronizedSet(new HashSet());
    private String inputUrl;
    private ForkJoinPool pool;

    public WebsiteCrawler(String inputUrl, int maxThreadCoulnt) {
        this.inputUrl = inputUrl;
        pool = new ForkJoinPool(maxThreadCoulnt);

    private void init() {
        pool.invoke(new LinkSearcher(this.inputUrl, this));


    public void addVisited(String s) {

    public boolean visited(String s) {
        return linksCrawled.contains(s);

    public static void main(String[] args) throws Exception {
        new WebsiteCrawler("", 50).init();



LinkTracker interface provides the basic methods required to execute the link search logic
 * @author Manoj
public interface LinkTracker {

    boolean visited(String link);

    void addVisited(String link);

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.RecursiveAction;

import org.htmlparser.Parser;
import org.htmlparser.filters.NodeClassFilter;
import org.htmlparser.tags.LinkTag;
import org.htmlparser.util.NodeList;


This is the class where core recursive logic is executed . To divide ,assign and execute the logic recursively this class extends RecursiveAction class and overrides compute() method. compute method is invoked recursively and execute the logic for every link . After visit ,visited link is added to the set and all child URLS found for current URL are added as recursiveAction in the list to be executed by compute() method. 

To understand the code further Please execute this code in debug mode and walk through the flow.

 public class LinkSearcher extends RecursiveAction {

    private String url;
    private LinkTracker tracker;

    public LinkSearcher(String url, LinkTracker tracker) {
        this.url = url;
        this.tracker = tracker;

    public void compute() {
        if (!tracker.visited(url)) {
            try {
                List actions = new ArrayList();
                URL uriLink = new URL(url);
                Parser parser = new Parser(uriLink.openConnection());
                NodeList list = parser.extractAllNodesThatMatch(new NodeClassFilter(LinkTag.class));

                for (int i = 0; i < list.size(); i++) {
                    LinkTag extracted = (LinkTag) list.elementAt(i);

                    if (!extracted.extractLink().isEmpty() && !tracker.visited(extracted.extractLink())) {

                        actions.add(new LinkSearcher(extracted.extractLink(), tracker));

            } catch (Exception e) {

Sunday, October 6

'javac' is not recognized as an internal or external command, operable program or batch file.

'javac' is not recognized as an internal or external command, operable program or batch file.

This error might occur as JDK is not set in path System variable in environment variable


To set this find out where Java is installed on your system

Let's say you have Java 7 installed on your system , copy bin directory path as below :


Now access the path variable in environment variable from :

Control Panel\System and Security\System

Set Java_Home as

then set  path variable value as : %Java_Home%;(current value in path)


set path directly as
\jdk1.7.0\bin;(current value in path)

If Still it does not work and throw same error , Access command line 

and check the path with path command 

Now see if \jdk1.7.0\bin is there in the output ..

If it is not there then set path from command line using command 

set path =%path%;\jdk1.7.0\bin

Now check again the value of path variable , \jdk1.7.0\bin should be there 

Try executing program again , it should work now.. 

Friday, September 13

Java Today - NewsLetter

Java SE 8 JDK Preview Available Now iProgrammer
Oracle has made available a preview of the development kit for Java SE (Standard ... environment by adding closures and related features to the Java language.
See all stories on this topic »  
Java SE 7 Update 40 Released About - News & Issues
Java 7 Update 40 has been released and the JDK is available to download from Oracle's Java download page. The release notes show there are a number of ...
See all stories on this topic »  

Gradle: do we need another build tool? | Java Code Geeks Keyhole Software
About Keyhole Software. Keyhole is a midwest-based consulting firm with a tight-knit technical team. We work primarily with Java, JavaScript and .
Java Code Geeks 
Java 7 Update Includes Security Features | gHale
Oracle released the Java standard edition version 7 update 40 (7u40), which includes bug fixes and some new features. The most notable security patch ... 

Adding the Power of Search to Your Hibernate App the Easy Way ... timkitch
I'm currently working on a software project with a data layer that is built using Hibernate – an Object-Relational Mapping (ORM) framework that takes a lot...
TDIing out loud: JSON and XML tutorial - Part 1 Eddie Hartman
If you're working with cloud services, or probably any kind of services, you're most likely working with JSON, XML (e.g. SOAP web services) or both. Although ...
TDIing out loud

Blogspot XML generator - HTML, CSS and Javascript - Codecall
Blogspot XML generator - posted in HTML, CSS and Javascript: Hi, I know that its possible to edit a blog in blogspot using the export and import feature in XML.
Codecall RSS

JVM Blog: Thinking about faith, technology and the future The Westby Times (blog)
Wednesday morning I found myself talking about how important my relationship with God is with a complete stranger. What I came to understand is that I don't ...
See all stories on this topic »

Spring Framework Reference Documentation
Code equivalents for Spring's XML namespaces · 3.1.6. Support for Hibernate 4.x · 3.1.7. TestContext framework support for @Configuration classes and bean ...