Is there any speed- and cache-efficient implementations of trie in C/C++?

I know what a trie is, but I don’t want reinvent the wheel, implementing it myself.

if you are looking for an ANSI C implementation you can “steal” it from FreeBSD. The file you are looking for is called radix.c. It’s used for managing routing data in kernel.

I realize the question was about ready implementations, but for reference…

Before you jump on Judy you should have read “A Performance Comparison of Judy to Hash Tables“. Then googling the title will likely give you a lifetime of discussion and rebutals to read.

The one explicitly cache-conscious trie I know of is the HAT-trie.

The HAT-trie, when implemented correctly, is very cool. However, for prefix search you need a sorting step on the hash buckets, which somewhat clashes with the idea of a prefix structure.

A somewhat simpler trie is the burst-trie which essentially gives you an interpolation between a standard tree of some sort (like a BST) and a trie. I like it conceptually and it’s much easier to implement.

I’ve had good luck with libTrie. It may not be specifically cache optimized but the performance has always been decent for my applications.

GCC ships with a handful of data structures as part of its “Policy-based data structures”. This includes a few trie implementations.

http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/trie_based_containers.html

References,

- A Double-Array Trie implementation article (includes a
`C`

implementation) - TRASH – A dynamic LC-trie and hash data structure — (a 2006 PDF reference describing a dynamic LC-trie used in the Linux kernel to implement address lookup in the IP routing table

Judy arrays: Very fast and memory efficient ordered sparse dynamic arrays for bits, integers and strings. Judy arrays are faster and more memory efficient than any binary-search-tree (incl. avl & red-black-trees).

You can also try TommyDS at http://tommyds.sourceforge.net/

See the benchmarks page on the site for a speed comparison with nedtries and judy.

Cache optimizations are something you’ll probably are going to have to do, because you’ll have to fit the data into a single cacheline which generally is something like 64 bytes (which will probably work if you start combining data, such as pointers). But it’s tricky 🙂

Burst Trie’s seem to be a bit more space efficient. I’m not sure how much cache-efficiency you can get out of any index since CPU-caches are so tiny. However, this kind of trie is plenty compact enough to keep large data sets in RAM (where a regular Trie would not).

I wrote a Scala implementation of a burst trie that also incorporates some space-saving techniques that I found in GWT’s type-ahead implementation.

I’ve had very good results (very good balance between performance and size) with marisa-trie compared to several TRIE implementations mentioned with my data set.

Sharing my “fast” implementation for Trie, tested just in basic scenario:

```
#define ENG_LET 26
class Trie {
class TrieNode {
public:
TrieNode* sons[ENG_LET];
int strsInBranch;
bool isEndOfStr;
void print() {
cout << "----------------" << endl;
cout << "sons..";
for(int i=0; i<ENG_LET; ++i) {
if(sons[i] != NULL)
cout << " " << (char)('a'+i);
}
cout << endl;
cout << "strsInBranch = " << strsInBranch << endl;
cout << "isEndOfStr = " << isEndOfStr << endl;
for(int i=0; i<ENG_LET; ++i) {
if(sons[i] != NULL)
sons[i]->print();
}
}
TrieNode(bool _isEndOfStr = false):isEndOfStr(_isEndOfStr), strsInBranch(0) {
for(int i=0; i<ENG_LET; ++i) {
sons[i] = NULL;
}
}
~TrieNode() {
for(int i=0; i<ENG_LET; ++i) {
delete sons[i];
sons[i] = NULL;
}
}
};
TrieNode* head;
public:
Trie() { head = new TrieNode();}
~Trie() { delete head; }
void print() {
cout << "Preorder Print : " << endl;
head->print();
}
bool isExists(const string s) {
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]-'a';
if(curr->sons[letIdx] == NULL) {
return false;
}
curr = curr->sons[letIdx];
}
return curr->isEndOfStr;
}
void addString(const string& s) {
if(isExists(s))
return;
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]-'a';
if(curr->sons[letIdx] == NULL) {
curr->sons[letIdx] = new TrieNode();
}
++curr->strsInBranch;
curr = curr->sons[letIdx];
}
++curr->strsInBranch;
curr->isEndOfStr = true;
}
void removeString(const string& s) {
if(!isExists(s))
return;
TrieNode* curr = head;
for(int i=0; i<s.size(); ++i) {
int letIdx = s[i]-'a';
if(curr->sons[letIdx] == NULL) {
assert(false);
return; //string not exists, will not reach here
}
if(curr->strsInBranch==1) { //just 1 str that we want remove, remove the whole branch
delete curr;
return;
}
//more than 1 son
--curr->strsInBranch;
curr = curr->sons[letIdx];
}
curr->isEndOfStr = false;
}
void clear() {
for(int i=0; i<ENG_LET; ++i) {
delete head->sons[i];
head->sons[i] = NULL;
}
}
};
```

Tags: sed