Java WordNet Similarity
WordNet词网研究7——之JWS(Java Wordnet Similarity)语义相似度计算
JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。 其中实现了许多经典的语义相似度算法 。是一款值得研究的语义相似度计算开源工具。
JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:
1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;
2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://HdhCmsTestd.umn.edu/~tpederse/Data/;
3、下载JWS(现有版本:beta.11.01):http://HdhCmsTestcogs.susx.ac.uk/users/drh21/;
4、安装WordNet;
5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;
6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;
7、在Eclipse下运行JWS中的例子程序:TestExamples
说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改
String dir = "C:/Program Files/WordNet"; //这里指定WordNet的安装路径,按照你实际安装的路径加以修改
JWS ws = new JWS(dir, "3.0"); //把3.0改为2.1即可
程序实例:
1 import java.util.TreeMap;
2 import java.text.* ;
3 import edu.sussex.nlp.jws.* ;
4
5
6 // 'TestExamples': how to use Java WordNet::Similarity
7 // David Hope, 2008
8 public class TestExamples
9 {
10 public static void main(String[] args)
11 {
12
13 // 1. SET UP:
14 // Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s)
15 String dir = "E:/Commonly Application/WordNet/" ;
16 // That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict
17 // The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.0
18
19 // Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat]
20 JWS ws = new JWS(dir, "2.1" );
21 // Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply
22 // JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");
23
24
25 // 2. EXAMPLES OF USE:
26
27 // 2.1 [JIANG & CONRATH MEASURE]
28 JiangAndConrath jcn = ws.getJiangAndConrath();
29 // System.out.println("Jiang & Conrath\n");
30 // all senses
31 TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", "n"); // all senses
32 // TreeMap<String, Double> scores1 = jcn.jcn("apple", 1, "banana", "n"); // fixed;all
33 // TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", 2, "n"); // all;fixed
34 for (String s : scores1.keySet())
35 System.out.println(s + "\t" + scores1.get(s));
36 // specific senses
37 // System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");
38 // max.
39 // /System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");
40
41 // */
42 // 2.2 [LIN MEASURE]
43 Lin lin = ws.getLin();
44 // /System.out.println("Lin\n");
45 // all senses
46 TreeMap<String, Double> scores2 = lin.lin("like", "love", "n"); // all senses
47 // TreeMap<String, Double> scores2 = lin.lin("kid", "child", "n"); // fixed;all
48 // TreeMap<String, Double> scores2 = lin.lin("apple", "banana", 2, "n"); // all;fixed
49 // for(String s : scores2.keySet())
50 // System.out.println(s + "\t" + scores2.get(s));
51 // specific senses
52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n" );
53 // max.
54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n" );
55
56 // ... and so on for any other measure
57 }
58 } // eof
简单实现基于JWS的语义相似度计算程序,例如:
1 import edu.sussex.nlp.jws.JWS;
2 import edu.sussex.nlp.jws.Lin;
3
4
5 public class Similar {
6
7 private String str1;
8 private String str2;
9 private String dir = "E:/Commonly Application/WordNet/" ;
10 private JWS ws = new JWS(dir, "2.1" );
11
12 public Similar(String str1,String str2){
13 this .str1= str1;
14 this .str2= str2;
15 }
16
17 public double getSimilarity(){
18 String[] strs1 = splitString(str1);
19 String[] strs2 = splitString(str2);
20 double sum = 0.0 ;
21 for (String s1 : strs1){
22 for (String s2: strs2){
23 double sc= maxScoreOfLin(s1,s2);
24 sum+= sc;
25 System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+ sc);
26 }
27 }
28 double Similarity = sum /(strs1.length * strs2.length);
29 sum=0 ;
30 return Similarity;
31 }
32
33 private String[] splitString(String str){
34 String[] ret = str.split(" " );
35 return ret;
36 }
37
38 private double maxScoreOfLin(String str1,String str2){
39 Lin lin = ws.getLin();
40 double sc = lin.max(str1, str2, "n" );
41 if (sc==0 ){
42 sc = lin.max(str1, str2, "v" );
43 }
44 return sc;
45 }
46
47 public static void main(String args[]){
48 String s1="departure" ;
49 String s2="leaving from" ;
50 Similar sm= new Similar(s1, s2);
51 System.out.println(sm.getSimilarity());
52 }
53 }
当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!
分类: WordNet
标签: WordNet
作者: Leo_wl
出处: http://HdhCmsTestcnblogs测试数据/Leo_wl/
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。
版权信息查看更多关于Java WordNet Similarity的详细内容...