Thursday 15 May 2014

Stemming English words in Java with Netbeans

Hi..
 Here I have shared the java code for stemming the english words . This code reduces the inflected (or sometimes derived) words to their stem, base or root form, eg: information->inform. All you need to do is to make use of the attached .jar file in your netbeans library and run the code.

Procedure:

1. Create a new Java Application Project in Netbeans.

2. Add the JAR file attached "org.tartarus.snowball"(download link at bottom) to the Netbeans Project Library.

3. Use the following code and make changes wherever needed (Eg. Input file name, Output file name, Path etc)

import java.io.*;
import java.util.regex.Pattern;
import java.util.Scanner;
import org.tartarus.snowball.*;

public class EnglishStemmer_porterstemmer {

public static void main(String[] args) throws Exception{
        String a,s,line,token;
        //read the input file
 FileReader file_to_read=new FileReader("C:/sample.txt"); // you can change file path.
 
    Scanner filesc=new Scanner(file_to_read);//scanner for file
    FileWriter fstream = new FileWriter("C:/sample_stem_out.txt");  // after run, you can see the output file in the specified location
    BufferedWriter out = new BufferedWriter(fstream);
 
    while(filesc.hasNextLine())
    {
    line=filesc.nextLine();
    Scanner linesc=new Scanner(line);//scanner for line
 
        while(linesc.hasNext())
        {
         token=linesc.next();
         a=EnglishSnowballStemmerFactory.getInstance().process(token);//method to access the porter stemmer for english
         out.write(a);
         out.write(" ");
        }
        out.newLine();
        linesc.close();
    }
    filesc.close();
   // a=EnglishSnowballStemmerFactory.getInstance().process("information");
    //System.out.println(a);
    }
}


4. Run the example program provided to understand the usage.



.Jar file Link:
https://drive.google.com/file/d/0B6sz85c3IPh9RWdlVTNndU1CcWc/edit?usp=sharing.

Sample Input and output:
https://drive.google.com/file/d/0B6sz85c3IPh9TEtqMVdQZ1VpSG8/edit?usp=sharing
https://drive.google.com/file/d/0B6sz85c3IPh9Y2JsUm14aTE4QWs/edit?usp=sharing

11 comments:

  1. I tried and it worked after I made some changes in code. where you have asked to specify the name of o/p file. I guess it should be "drive:/sample_stem_out.txt". Only then text file gets created. Upon executing the code as it is, o/p text file was never created. Do verify if I am right.

    ReplyDelete
  2. Yea Manasa, But you don need to specify drive name. you can see the output file in the location from which the program reads actual input. And output file extension by default is ".txt". Even if you don specify, it would create the file as filaname.txt . Do check once again and if it doesn't work, I cross verify in other machines and update the code. Thanks .

    ReplyDelete
  3. working properly..nice work...thank you..!

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. When I run my program with the input you have provided, it gives the correct result.
    But when I create my own input file, it gives an empty output file.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. i have the same problem , the file is empty , someone have an idea, thanks

    ReplyDelete
  8. can i found a version for french ?

    ReplyDelete
  9. i have the same problem , the file is empty

    ReplyDelete
  10. Best casino bonus and promotions - DrmCD
    › bonus-and-promotions › games › bonus-and-promotions › games Sep 14, 2020 — Sep 14, 2020 What is the best 속초 출장샵 casino bonus? 1. Wild Win. A deposit bonus, 5x playthrough, 100% up to $1,000 + 20 free spins, 100% up to $1000. A deposit bonus, 10x playthrough, 100% up to $1000. A deposit bonus, 화성 출장샵 10x playthrough, 100% up to $1000. A deposit bonus, 10x playthrough, 100% up to $1000. A deposit bonus, 10x playthrough, 100% up 광주광역 출장샵 to $1000. A deposit bonus, 공주 출장마사지 10x playthrough, 100% up to $1000. A deposit bonus, 10x playthrough, 100% up to $1000. A deposit 수원 출장마사지 bonus, 10x playthrough, 100% up to $1000. A deposit bonus, 10

    ReplyDelete