Java WordNet Similarity
WordNet词网研究7——之JWS(Java Wordnet Similarity)语义相似度计算
JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。 其中实现了许多经典的语义相似度算法 。是一款值得研究的语义相似度计算开源工具。
JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:
1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;
2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;
3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;
4、安装WordNet;
5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;
6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;
7、在Eclipse下运行JWS中的例子程序:TestExamples
说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改
String dir = "C:/Program Files/WordNet"; //这里指定WordNet的安装路径,按照你实际安装的路径加以修改
JWS ws = new JWS(dir, "3.0"); //把3.0改为2.1即可
程序实例:
1 import java.util.TreeMap; 2 import java.text.* ; 3 import edu.sussex.nlp.jws.* ; 4 5 6 // 'TestExamples': how to use Java WordNet::Similarity 7 // David Hope, 2008 8 public class TestExamples 9 { 10 public static void main(String[] args) 11 { 12 13 // 1. SET UP: 14 // Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s) 15 String dir = "E:/Commonly Application/WordNet/" ; 16 // That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict 17 // The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.0 18 19 // Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat] 20 JWS ws = new JWS(dir, "2.1" ); 21 // Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply 22 // JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat"); 23 24 25 // 2. EXAMPLES OF USE: 26 27 // 2.1 [JIANG & CONRATH MEASURE] 28 JiangAndConrath jcn = ws.getJiangAndConrath(); 29 // System.out.println("Jiang & Conrath\n"); 30 // all senses 31 TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", "n"); // all senses 32 // TreeMap<String, Double> scores1 = jcn.jcn("apple", 1, "banana", "n"); // fixed;all 33 // TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", 2, "n"); // all;fixed 34 for (String s : scores1.keySet()) 35 System.out.println(s + "\t" + scores1.get(s)); 36 // specific senses 37 // System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n"); 38 // max. 39 // /System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n"); 40 41 // */ 42 // 2.2 [LIN MEASURE] 43 Lin lin = ws.getLin(); 44 // /System.out.println("Lin\n"); 45 // all senses 46 TreeMap<String, Double> scores2 = lin.lin("like", "love", "n"); // all senses 47 // TreeMap<String, Double> scores2 = lin.lin("kid", "child", "n"); // fixed;all 48 // TreeMap<String, Double> scores2 = lin.lin("apple", "banana", 2, "n"); // all;fixed 49 // for(String s : scores2.keySet()) 50 // System.out.println(s + "\t" + scores2.get(s)); 51 // specific senses 52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n" ); 53 // max. 54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n" ); 55 56 // ... and so on for any other measure 57 } 58 } // eof
简单实现基于JWS的语义相似度计算程序,例如:
1 import edu.sussex.nlp.jws.JWS; 2 import edu.sussex.nlp.jws.Lin; 3 4 5 public class Similar { 6 7 private String str1; 8 private String str2; 9 private String dir = "E:/Commonly Application/WordNet/" ; 10 private JWS ws = new JWS(dir, "2.1" ); 11 12 public Similar(String str1,String str2){ 13 this .str1= str1; 14 this .str2= str2; 15 } 16 17 public double getSimilarity(){ 18 String[] strs1 = splitString(str1); 19 String[] strs2 = splitString(str2); 20 double sum = 0.0 ; 21 for (String s1 : strs1){ 22 for (String s2: strs2){ 23 double sc= maxScoreOfLin(s1,s2); 24 sum+= sc; 25 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+ sc); 26 } 27 } 28 double Similarity = sum /(strs1.length * strs2.length); 29 sum=0 ; 30 return Similarity; 31 } 32 33 private String[] splitString(String str){ 34 String[] ret = str.split(" " ); 35 return ret; 36 } 37 38 private double maxScoreOfLin(String str1,String str2){ 39 Lin lin = ws.getLin(); 40 double sc = lin.max(str1, str2, "n" ); 41 if (sc==0 ){ 42 sc = lin.max(str1, str2, "v" ); 43 } 44 return sc; 45 } 46 47 public static void main(String args[]){ 48 String s1="departure" ; 49 String s2="leaving from" ; 50 Similar sm= new Similar(s1, s2); 51 System.out.println(sm.getSimilarity()); 52 } 53 }
当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!
分类: WordNet
标签: WordNet
作者: Leo_wl
出处: http://www.cnblogs.com/Leo_wl/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
版权信息查看更多关于Java WordNet Similarity的详细内容...