java Lucene 中自定义排序的实现
网络编程 2021-07-05 11:23www.168986.cn编程入门
使用Lucene来搜索内容,搜索结果的显示顺序是比较重要的.Lucene中Build-in的几个排序定义在大多数情况下是不适合我们使用的.要适合自己的应用程序的场景,就只能自定义排序功能,本节我们就来看看在Lucene中如何实现自定义排序功能.
Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的parator(Expert: returns a parator for sorting ScoreDocs).该接口只定义了一个方法.如下:
Java代码
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
view plaincopy to clipboardprint?
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
Java代码
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
view plaincopy to clipboardprint?
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
有3个方法与排序相关,需要我们实现 分别如下:
Java代码
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
view plaincopy to clipboardprint?
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
Java代码
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
view plaincopy to clipboardprint?
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
Java代码
package .nikee.lucene.test;
import java.io.IOException;
import junit.framework.TestCase;
import .apache.lucene.analysis.WhitespaceAnalyzer;
import .apache.lucene.document.Document;
import .apache.lucene.document.Field;
import .apache.lucene.index.IndexWriter;
import .apache.lucene.index.Term;
import .apache.lucene.search.FieldDoc;
import .apache.lucene.search.Hits;
import .apache.lucene.search.IndexSearcher;
import .apache.lucene.search.Query;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.Sort;
import .apache.lucene.search.SortField;
import .apache.lucene.search.TermQuery;
import .apache.lucene.search.TopFieldDocs;
import .apache.lucene.store.RAMDirectory;
import .nikee.lucene.DistanceComparatorSource;
public class DistanceComparatorSourceTest extends TestCase {
private RAMDirectory directory;
private IndexSearcher searcher;
private Query query;
//建立测试环境
protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
addPoint(writer, "El Charro", "restaurant", 1, 2);
addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9);
addPoint(writer, "Los Betos", "restaurant", 9, 6);
addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8);
writer.close();
searcher = new IndexSearcher(directory);
query = new TermQuery(new Term("type", "restaurant"));
}
private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException {
Document doc = new Document();
doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
}
public void testNearestRestaurantToHome() throws Exception {
//使用DistanceComparatorSource来构造一个SortField
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0)));
Hits hits = searcher.search(query, sort); // 搜索
//测试
assertEquals("closest", "El Charro", hits.doc(0).get("name"));
assertEquals("furthest", "Los Betos", hits.doc(3).get("name"));
}
public void testNeareastRestaurantToWork() throws Exception {
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10
//上面的测试实现了自定义排序,并不能访问自定义排序的更详细信息,利用
//TopFieldDocs 可以进一步访问相关信息
TopFieldDocs docs = searcher.search(query, null, 3, sort);
assertEquals(4, docs.totalHits);
assertEquals(3, docs.scoreDocs.length);
//取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc
FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];
assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]);
Document document = searcher.doc(fieldDoc.doc);
assertEquals("Los Betos", document.get("name"));
dumpDocs(sort, docs); // 显示相关信息
}
// 显示有关排序的信息
private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException {
System.out.println("Sorted by: " + sort);
ScoreDoc[] scoreDocs = docs.scoreDocs;
for (int i = 0; i < scoreDocs.length; i++) {
FieldDoc fieldDoc = (FieldDoc) scoreDocs[i];
Float distance = (Float) fieldDoc.fields[0];
Document doc = searcher.doc(fieldDoc.doc);
System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance);
}
}
}
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的parator(Expert: returns a parator for sorting ScoreDocs).该接口只定义了一个方法.如下:
Java代码
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
view plaincopy to clipboardprint?
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
/
Creates a parator for the field in the given index.
@param reader - Index to create parator for.
@param fieldname - Field to create parator for.
@return Comparator of ScoreDoc objects.
@throws IOException - If an error ours reading the index.
/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
Java代码
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
view plaincopy to clipboardprint?
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
//Special parator for sorting hits aording to puted relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special parator for sorting hits aording to index order (document number).
public static final ScoreDocComparator INDEXORDER;
有3个方法与排序相关,需要我们实现 分别如下:
Java代码
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
view plaincopy to clipboardprint?
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
/
Compares two ScoreDoc objects and returns a result indicating their sort order.
@param i First ScoreDoc
@param j Second ScoreDoc
@return -1 if i should e before j;
1 if i should e after j;
0 if they are equal
/
public int pare(ScoreDoc i,ScoreDoc j);
/
Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
@param i Document
@return Serializable object
/
public Comparable sortValue(ScoreDoc i);
/
Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
@return One of the constants in SortField.
/
public int sortType();
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
Java代码
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
view plaincopy to clipboardprint?
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
package .nikee.lucene;
import java.io.IOException;
import .apache.lucene.index.IndexReader;
import .apache.lucene.index.Term;
import .apache.lucene.index.TermDocs;
import .apache.lucene.index.TermEnum;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.ScoreDocComparator;
import .apache.lucene.search.SortComparatorSource;
import .apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax deltax + deltay deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int pare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
Java代码
package .nikee.lucene.test;
import java.io.IOException;
import junit.framework.TestCase;
import .apache.lucene.analysis.WhitespaceAnalyzer;
import .apache.lucene.document.Document;
import .apache.lucene.document.Field;
import .apache.lucene.index.IndexWriter;
import .apache.lucene.index.Term;
import .apache.lucene.search.FieldDoc;
import .apache.lucene.search.Hits;
import .apache.lucene.search.IndexSearcher;
import .apache.lucene.search.Query;
import .apache.lucene.search.ScoreDoc;
import .apache.lucene.search.Sort;
import .apache.lucene.search.SortField;
import .apache.lucene.search.TermQuery;
import .apache.lucene.search.TopFieldDocs;
import .apache.lucene.store.RAMDirectory;
import .nikee.lucene.DistanceComparatorSource;
public class DistanceComparatorSourceTest extends TestCase {
private RAMDirectory directory;
private IndexSearcher searcher;
private Query query;
//建立测试环境
protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
addPoint(writer, "El Charro", "restaurant", 1, 2);
addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9);
addPoint(writer, "Los Betos", "restaurant", 9, 6);
addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8);
writer.close();
searcher = new IndexSearcher(directory);
query = new TermQuery(new Term("type", "restaurant"));
}
private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException {
Document doc = new Document();
doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
}
public void testNearestRestaurantToHome() throws Exception {
//使用DistanceComparatorSource来构造一个SortField
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0)));
Hits hits = searcher.search(query, sort); // 搜索
//测试
assertEquals("closest", "El Charro", hits.doc(0).get("name"));
assertEquals("furthest", "Los Betos", hits.doc(3).get("name"));
}
public void testNeareastRestaurantToWork() throws Exception {
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10
//上面的测试实现了自定义排序,并不能访问自定义排序的更详细信息,利用
//TopFieldDocs 可以进一步访问相关信息
TopFieldDocs docs = searcher.search(query, null, 3, sort);
assertEquals(4, docs.totalHits);
assertEquals(3, docs.scoreDocs.length);
//取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc
FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];
assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]);
Document document = searcher.doc(fieldDoc.doc);
assertEquals("Los Betos", document.get("name"));
dumpDocs(sort, docs); // 显示相关信息
}
// 显示有关排序的信息
private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException {
System.out.println("Sorted by: " + sort);
ScoreDoc[] scoreDocs = docs.scoreDocs;
for (int i = 0; i < scoreDocs.length; i++) {
FieldDoc fieldDoc = (FieldDoc) scoreDocs[i];
Float distance = (Float) fieldDoc.fields[0];
Document doc = searcher.doc(fieldDoc.doc);
System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance);
}
}
}
编程语言
- 如何快速学会编程 如何快速学会ug编程
- 免费学编程的app 推荐12个免费学编程的好网站
- 电脑怎么编程:电脑怎么编程网咯游戏菜单图标
- 如何写代码新手教学 如何写代码新手教学手机
- 基础编程入门教程视频 基础编程入门教程视频华
- 编程演示:编程演示浦丰投针过程
- 乐高编程加盟 乐高积木编程加盟
- 跟我学plc编程 plc编程自学入门视频教程
- ug编程成航林总 ug编程实战视频
- 孩子学编程的好处和坏处
- 初学者学编程该从哪里开始 新手学编程从哪里入
- 慢走丝编程 慢走丝编程难学吗
- 国内十强少儿编程机构 中国少儿编程机构十强有
- 成人计算机速成培训班 成人计算机速成培训班办
- 孩子学编程网上课程哪家好 儿童学编程比较好的
- 代码编程教学入门软件 代码编程教程