Elasticsearch implementation in Django using Haystack

Haystack for Django framework provides a powerful but easy way to integrate common search backends like Elasticsearch, Solr, Whoosh and more. It’s API closely mirrors the Django APIs making it easy for developers to get started. Haystack can be setup with multiple search backends in the same app enabling it to call any backend for indexing or querying.

Configuration

Install the django haystack package using: pip install django-haystack and add ‘haystack’ to INSTALLED_APPS in the settings file. At this point, we are ready to plugin a search backend of our choice. Assuming, Elasticsearch is already installed, we will add it to the required ‘HAYSTACK_SETTINGS’.

  HAYSTACK_SETTINGS = {
 ‘default’: {
 ‘ENGINE’: ‘haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine’,
 ‘URL’: ‘http://127.0.0.1:9200/’,
 ‘INDEX_NAME’: ‘myindex’, 
 },
}

Let’s dissect the above settings: ‘default’ is the name given to the backend. If multiple backends are used, one of them must to be named default to tell Haystack which backend to use if not explicitly specified. You can use the ‘using’ clause to explicitly specify the backend.

sqs = SearchQuerySet().all().using('solr')

‘ENGINE’ specifies the path to the package with the default backend settings file. In the above example, we are pointing to the default Elasticsearch backend settings file provided in the Haystack package. This file contains out of the box Elasticsearch specific settings for most common uses. The default file can be extended to use a custom backend.
‘URL’ as the name suggests is the server URL of the search backend. The URL can be parameterized for different environments.
‘INDEX_NAME’ is the name of the Elasticsearch search index where all the indexed data is stored on disc.

Handling Data

Haystack determines what data should be placed in the search index. To specify the models that need to be indexed we need to subclass indexes.SearchIndex and indexes.Indexable. Within the class we can specify the individual fields to be indexed.

        class UserIndex(indexes.searchIndex, indexes.Indexable):
            text = indexes.charField(document=True, use_template=True)
            name = indexes.charField(model_attr = ‘name’)
            created_date = indexes.DateTimeField(model_attr=’created_date’)

        def get_model(self):
            return User

Every SearchIndex requires that only one field be marked with document=True. The field is named ‘text’ to follow Haystack naming convention. It can be modified but not recommended. Document = True tells haystack to use the ‘text’ field as the primary field for searching within. Specifying ‘use_template=True’ enables us to use a data template rather than a single field to build the document that the searchindex will index.

To create a data template, we need to create a new template inside the templates directory (../templates/user_text.txt) and place the following lines inside:

    {{ object.description }}
    {{ object.notes }}

description and notes are the fields that will be used as primary fields for searching in. The name and created_date fields specified in the search index can be used for filtering data while searching. The data can then be indexed using the ‘rebuild_index’ command. In the above example I have shown a simple and quick setup to get started. The search backend is highly customizable and can be modified for any use cases.

I mentioned above that the ElasticSearch backend settings file can be extended to use a custom one. That enables us to specify custom analyzers, tokenizers and filters for indexing or querying. In my next blog post I will talk more in detail about extending the backend to use custom settings.

Jul 2014 Hadoop Workshop

We recently conducted a hands-on workshop on Hadoop for the local developer community in Phoenix. While we have conducted many such workshops on various topics as part of our internal training program at Clairvoyant and at previous jobs, this was really a fun and very unique experience for all us.  We would like to thank all the attendees for their enthusiasm towards learning and also the volunteers, for their help and support- making the workshop a great success!

Several people asked us why we are doing something like this, investing time and effort and that too, for free! We figured we would share some of our thoughts and motivation behind conducting this workshop(or any other similar event)

  • What was the workshop about? We setup a full day workshop – the main focus being on Hadoop essentials and code examples (HDFS, Map Reduce). Less Slides – More Code was our approach towards the workshop. After covering some of the basics and the key concepts the rest of the day was driven with Coding exercises that the attendees could write and run on their own machines. Lots of interaction – questions and answers and sharing some of our experience and knowledge in this space.
  • Why did we organize this workshop?  Well, to start with as a team we believe in sharing our experiences and in learning from the larger developer community and their experiences. As a company we are exposed to big data related technologies quite a bit, and several of  our key employees are extremely passionate about the Hadoop eco system in general. Over the last 18 months we observed that there is a huge demand for well qualified engineers with good hands on experience in this field. We noticed several engineers are really interested in exploring Hadoop, but are either completely overwhelmed by the amount of information available or struggling to find time in their busy lives to focus. We thought it provided us with a great opportunity to put together a structured workshop to help people who are looking for a way to get started on learning Hadoop. While we put a lot of our experience in the form of material, it was also mainly focused on the basics and getting started on working with Hadoop.
Hadoop workshop
Hadoop workshop
  • What does Clairvoyant get out of this? Hopefully lot’s of good karma :). We are a huge supporter and believer of the open source movement, this is one of our way of giving back in our own way. We really do not see this as a business opportunity. Internally at Clairvoyant we invest a lot in employee professional development and we encourage all of our employees to constantly improve their skill set. Out of several such internal efforts this idea of extending the same to the local developer community emerged. Luckily we do have a good strong senior leadership team who believe in participating in local community and learn from each other, and several of them volunteered their time to help out during the workshop. We participate in different local events and forums to share our experiences and this is an another instance of that – albeit on a larger scale. Sharing our knowledge, learning from the community itself and contributing back in our own way are what drive us.
  • What’s next? If you attended the workshop, you already know who to contact if you are have questions or would like to express interest in additional workshops. We are very happy to help you on next steps. If you have not attended this workshop but wanted to get started, again we are happy to help you reach out us workshop@clairvoyantsoft.com.

We intend to do this as a regular activity, time permitting. If you are interested in specific topics(or even a hackathon) – would like follow up on additional sessions – feel free to reach out to us – hello@clairvoyantsoft.com, we would love to hear from you! We do have several bright people at Clairvoyant, we will do our best to help.