Home » Javascript » How to index source code with ElasticSearch

How to index source code with ElasticSearch

Posted by: admin November 27, 2021 Leave a comment

Questions:

I need to provide full text search on javascript source files and highlighting of results.

My question is what combination of existing ElasticSearch tokenizers and analyzers would be best for this?

Answers:

Interesting question but I’m not aware of an out of the box solution. You can use a WordDelimiter tokenizer as you can specify e.g. the underscore to be handled as a digit and then functions like hello_world (or helloWorld if camelcase is enabled) will be searchable via hello or world.

But I doubt that the results are sufficient … and you’ll have to implement a source code analyzer yourself or use code which extracts the syntax tree to index method names and bodies into different fields

###

You can use the attachment type plugin to load the files into Elasticsearch and let it index the files. It can handle meta data for the files and index the content of the files.

The github page includes information on how to do highlighting of the search documents.

###

Unless you want to expose this as a service to somebody, i would recommend you to install InstaSearch plugin in eclipse; this plugin creates lucense index and gives you instantaneous results.