Hi..
Here I have shared the java code for stemming the english words . This code reduces the inflected (or sometimes derived) words to their stem, base or root form, eg: information->inform. All you need to do is to make use of the attached .jar file in your netbeans library and run the code.
Procedure:
1. Create a new Java Application Project in Netbeans.
2. Add the JAR file attached "org.tartarus.snowball"(download link at bottom) to the Netbeans Project Library.
3. Use the following code and make changes wherever needed (Eg. Input file name, Output file name, Path etc)
import java.io.*;
import java.util.regex.Pattern;
import java.util.Scanner;
import org.tartarus.snowball.*;
public class EnglishStemmer_porterstemmer {
public static void main(String[] args) throws Exception{
String a,s,line,token;
//read the input file
FileReader file_to_read=new FileReader("C:/sample.txt"); // you can change file path.
Scanner filesc=new Scanner(file_to_read);//scanner for file
FileWriter fstream = new FileWriter("C:/sample_stem_out.txt"); // after run, you can see the output file in the specified location
BufferedWriter out = new BufferedWriter(fstream);
while(filesc.hasNextLine())
{
line=filesc.nextLine();
Scanner linesc=new Scanner(line);//scanner for line
while(linesc.hasNext())
{
token=linesc.next();
a=EnglishSnowballStemmerFactory.getInstance().process(token);//method to access the porter stemmer for english
out.write(a);
out.write(" ");
}
out.newLine();
linesc.close();
}
filesc.close();
// a=EnglishSnowballStemmerFactory.getInstance().process("information");
//System.out.println(a);
}
}
4. Run the example program provided to understand the usage.
.Jar file Link:
https://drive.google.com/file/d/0B6sz85c3IPh9RWdlVTNndU1CcWc/edit?usp=sharing.
Sample Input and output:
https://drive.google.com/file/d/0B6sz85c3IPh9TEtqMVdQZ1VpSG8/edit?usp=sharing
https://drive.google.com/file/d/0B6sz85c3IPh9Y2JsUm14aTE4QWs/edit?usp=sharing
Here I have shared the java code for stemming the english words . This code reduces the inflected (or sometimes derived) words to their stem, base or root form, eg: information->inform. All you need to do is to make use of the attached .jar file in your netbeans library and run the code.
Procedure:
1. Create a new Java Application Project in Netbeans.
2. Add the JAR file attached "org.tartarus.snowball"(download link at bottom) to the Netbeans Project Library.
3. Use the following code and make changes wherever needed (Eg. Input file name, Output file name, Path etc)
import java.io.*;
import java.util.regex.Pattern;
import java.util.Scanner;
import org.tartarus.snowball.*;
public class EnglishStemmer_porterstemmer {
public static void main(String[] args) throws Exception{
String a,s,line,token;
//read the input file
FileReader file_to_read=new FileReader("C:/sample.txt"); // you can change file path.
Scanner filesc=new Scanner(file_to_read);//scanner for file
FileWriter fstream = new FileWriter("C:/sample_stem_out.txt"); // after run, you can see the output file in the specified location
BufferedWriter out = new BufferedWriter(fstream);
while(filesc.hasNextLine())
{
line=filesc.nextLine();
Scanner linesc=new Scanner(line);//scanner for line
while(linesc.hasNext())
{
token=linesc.next();
a=EnglishSnowballStemmerFactory.getInstance().process(token);//method to access the porter stemmer for english
out.write(a);
out.write(" ");
}
out.newLine();
linesc.close();
}
filesc.close();
// a=EnglishSnowballStemmerFactory.getInstance().process("information");
//System.out.println(a);
}
}
4. Run the example program provided to understand the usage.
.Jar file Link:
https://drive.google.com/file/d/0B6sz85c3IPh9RWdlVTNndU1CcWc/edit?usp=sharing.
Sample Input and output:
https://drive.google.com/file/d/0B6sz85c3IPh9TEtqMVdQZ1VpSG8/edit?usp=sharing
https://drive.google.com/file/d/0B6sz85c3IPh9Y2JsUm14aTE4QWs/edit?usp=sharing