Google App Engine : Full Text Search support in pre-release version

Full Text Search in Google’s app engine is what limiting me from adopting app engine at a first place, may be others feels this as limitation too. But, Then something interesting happened two Google guys presented in Google IO the upcoming feature of the App Engine – Full Text Search. They presented what I needed most. They are Google they had to provide search capabilities, that’s what issues reported by one of user. I didn’t hear any word from them since then about Full Text Search. May be they are planning big opening for this big feature.

But,  2-3 days earlier they released pre-released version in their opensource app engine SDK. But, they didn’t mention anywhere in the changes or release notes that it contain search capabilities. Yesterday, I explored the python SDK and found that there is new search libraries and apis in ext folder (google/appengine/ext/search). Then I created the document for python SDK through epydoc. And found this doc of search package (google.appengine.ext.search),
Full text indexing and search, implemented in pure python.
Defines a SearchableModel subclass of db.Model that supports full text indexing and search, based on the datastore's existing indexes.

Don't expect too much. First, there's no ranking, which is a killer drawback. There's also no exact phrase match, substring match, boolean operators, stemming, or other common full text search features. Finally, support for stop words (common words that are not indexed) is currently limited to English.

To be indexed, entities must be created and saved as SearchableModel instances, e.g.:


class Article(search.SearchableModel):
text = db.TextProperty()
...

article = Article(text=...)
article.save()

To search the full text index, use the SearchableModel.all() method to get an instance of SearchableModel.Query, which subclasses db.Query. Use its search() method to provide a search query, in addition to any other filters or sort orders, e.g.:


query = article.all().search('a search query').filter(...).order(...)
for result in query:
...

The full text index is stored in a property named __searchable_text_index.

Specifying multiple indexes and properties to index
---------------------------------------------------

By default, one index is created with all string properties. You can define multiple indexes and specify which properties should be indexed for each by overriding SearchableProperties() method of model.SearchableModel, for example:


class Article(search.SearchableModel):
@classmethod
def SearchableProperties(cls):
return [['book', 'author'], ['book']]

In this example, two indexes will be maintained - one that includes 'book' and 'author' properties, and another one for 'book' property only. They will be stored in properties named __searchable_text_index_book_author and __searchable_text_index_book respectively. Note that the index that includes all properties will not be created unless added explicitly like this:


@classmethod
def SearchableProperties(cls):
return [['book', 'author'], ['book'], search.ALL_PROPERTIES]

The default return value of SearchableProperties() is [search.ALL_PROPERTIES] (one index, all properties).

To search using a custom-defined index, pass its definition in 'properties' parameter of 'search':

Article.all().search('Lem', properties=['book', 'author'])

Note that the order of properties in the list matters.

Adding indexes to index.yaml
-----------------------------

In general, if you just want to provide full text search, you *don't* need to add any extra indexes to your index.yaml. However, if you want to use search() in a query *in addition to* an ancestor, filter, or sort order, you'll need to create an index in index.yaml with the __searchable_text_index property. For example:

- kind: Article
properties:
- name: __searchable_text_index
- name: date
direction: desc
...

Similarly, if you created a custom index (see above), use the name of the property it's stored in, e.g. __searchable_text_index_book_author.
Note that using SearchableModel will noticeable increase the latency of save() operations, since it writes an index row for each indexable word. This also means that the latency of save() will increase roughly with the size of the properties in a given entity. Caveat hacker!

Hoping this will help others and also encourage others to think and adopt Google app engine as more capable system to handle real world problems.