好得很程序员自学网

<tfoot draggable='sEl'></tfoot>

Java WordNet Similarity

Java WordNet Similarity

WordNet词网研究7——之JWS(Java Wordnet Similarity)语义相似度计算

 

JWS——Java WordNet Similarity是由University Of Sussex的David Hope等开发的基于java与WordNet的语义相似度计算开源项目。 其中实现了许多经典的语义相似度算法 。是一款值得研究的语义相似度计算开源工具。

JWS是WordNet::Similarity(一个Perl版的WordNet相似度比较包)的Java实现版本,想用Java实现用WordNet比较词语相似度的朋友有福拉!!简述使用步骤:

1、下载WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;

2、下载WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;

3、下载JWS(现有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;

4、安装WordNet;

5、解压WordNet-InfoContent-2.1,并将文件夹拷贝至WordNet目录D:/Program Files/WordNet/2.1下;

6、将JWS中的两个jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷贝至Java的lib目录下,并设置环境变量;

7、在Eclipse下运行JWS中的例子程序:TestExamples

     说明:由于下载的WordNet是2.1版本的,所以程序中有几处需要修改

     String dir = "C:/Program Files/WordNet";    //这里指定WordNet的安装路径,按照你实际安装的路径加以修改

     JWS ws = new JWS(dir, "3.0");                   //把3.0改为2.1即可

程序实例:

  1   import   java.util.TreeMap;
   2   import  java.text.* ;
   3   import  edu.sussex.nlp.jws.* ;
   4  
  5  
  6   //   'TestExamples': how to use Java WordNet::Similarity
   7   //   David Hope, 2008 
  8   public   class   TestExamples
   9   {
  10        public   static   void   main(String[] args)
  11       {
  12  
 13   //   1. SET UP:
  14   //     Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s) 
 15          String dir = "E:/Commonly Application/WordNet/" ;
  16   //     That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict
  17   //     The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.0
  18  
 19   //     Option 1  (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat] 
 20          JWS    ws =  new  JWS(dir, "2.1" );
  21   //     Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply
  22           //  JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");
  23  
 24  
 25   //   2. EXAMPLES OF USE:
  26  
 27   //   2.1 [JIANG & CONRATH MEASURE] 
 28          JiangAndConrath jcn =  ws.getJiangAndConrath();
  29           //  System.out.println("Jiang & Conrath\n");
  30   //   all senses 
 31          TreeMap<String, Double>     scores1    =    jcn.jcn("apple", "banana", "n");             //   all senses
  32           //  TreeMap<String, Double>     scores1    =    jcn.jcn("apple", 1, "banana", "n");       //   fixed;all
  33           //  TreeMap<String, Double>     scores1    =    jcn.jcn("apple", "banana", 2, "n");       //   all;fixed 
 34           for  (String s : scores1.keySet())
  35              System.out.println(s + "\t" +  scores1.get(s));
  36   //   specific senses
  37           //  System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");
  38   //   max. 
 39           //  /System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");
  40  
 41   //  */
  42   //   2.2 [LIN MEASURE] 
 43          Lin lin =  ws.getLin();
  44           //  /System.out.println("Lin\n");
  45   //   all senses 
 46          TreeMap<String, Double>     scores2    =    lin.lin("like", "love", "n");             //   all senses
  47           //  TreeMap<String, Double>     scores2    =    lin.lin("kid", "child", "n");       //   fixed;all
  48           //  TreeMap<String, Double>     scores2    =    lin.lin("apple", "banana", 2, "n");       //   all;fixed
  49           //  for(String s : scores2.keySet())
  50               //  System.out.println(s + "\t" + scores2.get(s));
  51   //   specific senses 
 52          System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n" );
  53   //   max. 
 54          System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n" );
  55  
 56   //   ... and so on for any other measure 
 57       }
  58  }  //   eof 

简单实现基于JWS的语义相似度计算程序,例如:

  1   import   edu.sussex.nlp.jws.JWS;
   2   import   edu.sussex.nlp.jws.Lin;
   3  
  4  
  5   public   class   Similar {
   6  
  7       private   String str1;
   8       private   String str2;
   9       private  String dir = "E:/Commonly Application/WordNet/" ;
  10       private  JWS    ws =  new  JWS(dir, "2.1" );
  11      
 12       public   Similar(String str1,String str2){
  13           this .str1= str1;
  14           this .str2= str2;
  15       }
  16      
 17       public   double   getSimilarity(){
  18          String[] strs1 =  splitString(str1);
  19          String[] strs2 =  splitString(str2);
  20           double  sum = 0.0 ;
  21           for  (String s1 : strs1){
  22               for  (String s2: strs2){
  23                   double  sc=  maxScoreOfLin(s1,s2);
  24                  sum+=  sc;
  25                  System.out.println("当前计算: "+s1+" VS "+s2+" 的相似度为:"+ sc);
  26               }
  27           }
  28           double  Similarity = sum /(strs1.length *  strs2.length);
  29          sum=0 ;
  30           return   Similarity;
  31       }
  32      
 33       private   String[] splitString(String str){
  34          String[] ret = str.split(" " );
  35           return   ret;
  36       }
  37      
 38       private   double   maxScoreOfLin(String str1,String str2){
  39          Lin lin =  ws.getLin();
  40           double  sc = lin.max(str1, str2, "n" );
  41           if (sc==0 ){
  42              sc = lin.max(str1, str2, "v" );
  43           }
  44           return   sc;
  45       }
  46      
 47       public   static   void   main(String args[]){
  48          String s1="departure" ;
  49          String s2="leaving from" ;
  50          Similar sm=  new   Similar(s1, s2);
  51           System.out.println(sm.getSimilarity());
  52       }
  53  }

当时碰到想基于protege+Wordnet来处理语义分析这块,所以接触到JWS,但没有太多的时间去深入研究,是一个非常的遗憾,希望有研究的朋友,发个Blog Url,大家参考参考!

 

 

分类:  WordNet

标签:  WordNet

作者: Leo_wl

    

出处: http://www.cnblogs.com/Leo_wl/

    

本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利。

版权信息

查看更多关于Java WordNet Similarity的详细内容...

  阅读:50次