Miva, Miva Script, Miva Empresa, Miva Mia amd Miva Merchant are registered trademarks of the Miva Corporation
 
Ivo Truxa - truXoft control systems: advanced programming and custom IT solutions home / about / webdesign / Miva / automation / contact

http://mivo.truxoft.com
MIVO!
miva beyond limits

 

MIVA®  RESOURCES:  Miva Index Secrets

by Ivo Truxa, 10/16/2000

  1. What is it all about?
  2. What is a Miva database index?
  3. Why are the indexes so big?
  4. How to index keys longer than 100 characters?
  5. How to make my indexes smaller?
  6. Conclusion
  7. Links
  8. User Comments

What is it all about?

To tell the truth I am not writing this article to try to explain in details what databases and indexes are and how do they work. There are many other programmers who know it much better and who already wrote many books about it. Follow the links below or use a search engine to find more explanation on terms you do not understand. However, I suppose that the majority of people who come to read this article already know well enough how the databases used in Miva look like. I just want to put more light on some hidden features of Miva and to share with you results of my research on the structure of Miva indexes and the results of Miva performance testing.


top

What is a Miva database index?

Shortly and precisely told: it is a balanced binary tree with eight keys per node. The maximal length of a key is 108 characters (in contrary to 100 characters as mentioned in the Miva Script Reference Manual). You can find a very detailed explanation on binary trees in the links below. The best explanation is a picture:

Miva Index: balanced binary tree

Is it clear? It must be - I've been always told to be an excellent artist! Well, for those who have little visual imagination I try to explain closer: the search starts at the root (first node) and if the searched key is not found there it continues to one of the two branches as long as not found.

In contrary to some other tree architectures where the searched expressions are only at the end of each branch (so called leafs), this binary tree stores expressions (keys) together with their respective database record numbers (pointers) in each node. Miva index has eight keys (better told up to eight) in each node and they are stored sorted within the node. As I told, the search begins in the root - Miva compares the searched key with all 8 keys in the node. If it is found, Miva returns its pointer. If it is smaller then the smallest node key Miva jumps to the next left node. If it is bigger then the biggest one, it continues to the next right node. And if it is in between, but not equal to any of the eight keys, then Miva returns 'not found'.

In contrary to some other binary tree structures, there are no branches between the keys - there are only two exits (branches) - extreme left (smaller) and extreme right (bigger).

Physically is a Miva index file built in the following way:

  • Miva ID: HTSI (HTML Script Index) - 4 bytes
  • Major version number (3) 1 byte
  • Total number of database records - 4 bytes
  • Amount of nodes in the tree - 4 bytes
  • Number of the root node - 4 bytes
  • Amount of keys per node (equal 0x0008) - 2 bytes
  • Size of a node in bytes (or node offset - distance between two neighbour nodes) - equal 0x0400 (1kB) - 2 bytes
  • Size of a key (or key offset - distance between two neighbour keys) - equal 0x006C (108 dec) - 2 bytes

The first node begins at offset 0x400h, next at 0x800 etc. The root node is somewhere at the middle of the file - that's why it is 'balanced' binary tree. Miva, when adding a new record or re-indexing, always tries to keep the tree balanced for the best efficiency. I t is made in the following way: at a new record, Miva searches for the closest key, inserts a re-sorts the node (8 keys). If there was some empty space before, there is no problem, but if it was full it splits it in two daughter nodes and a mother with branches (pointers) leading to them.

Why are the indexes so big?

This is an often seen question on the Miva user list. The answer is easy - Miva likes keeping things simple. It can be a big advantage, but it has some dark sides too. Instead of customizing the key size, Miva keeps it constant at 108 characters. And because the binary tree contains certain redundancy (new and unbalanced nodes) it may be much bigger then the original database. You can easily calculate the size of an index: the total number of records (totrec) multiple node size (1kB) divided number of keys per node (8) and the result multiple approximately by 1.5-2 (redundancy, mostly closer to 2) to get the final size. Briefly written:

Index size = totrec * 256

The size will be approximately the same for all indexes of a database. The size may differ slightly due to different level of balance of each index file.


top

How to index keys longer than 100 characters?

It may be handy to know that there is a way with Miva to index keys that are longer than 100 (better told 108) characters. I have for example a user profile containing a lot of attributes stored in a field that is 250 characters long and until I analysed the index structure I had to make slow workaround in my script. The solution is rather easy: create a new empty index (MvCREATE) and edit it with a hex-editor. For indexing on the full length of such a field you would need to change the 0x6C (key length of 108 bytes) at offset 0x0015 to 0xFF (255). Of course, you also need to change the node size of 0x400 to 0x900 - change the 0x04 at offset 0x0014 to 0x09. Another way would be reducing the number of keys per node respectively (at offset 0x0011).

When changing the index parameters, keep on mind that a node (0x400 bytes by default) contains not only 8 keys (108 bytes each), but also 8 pointers to their record numbers and pointers to the next branches. When you calculate new size, you can use this formula:

NodeSize = (1+KeysPerNode) * (12 + KeySize)

After (or before) editing the parameters you have to change the size of the index file to be equal to the node size. You can do it in most hex-editors or you can do it with copying some files together, editing in NotePad or similar.

It looks that Miva programmers, when creating the program, already thought about customized indexes or future format changes and that's why they put the parameters to the index header. Miva works fine with hacked indexes. Unfortunately, the implementation of the feature was not consequent - logically taken, only MvCREATE should change the header, but it is also MvREINDEX that resets the header parameters to defaults. In contrary, MvADD behaves correctly - it writes new records in the given format.

It means, you cannot use MvREINDEX on these manipulated indexes. If you nead to re-index it, you need to open an empty database under the original alias, prepare a new empty index file and add all records in a loop with MvADD from the original database (with a modified alias). You can use a single function with MvREVEALSTRUCTURE for re-indexing any database. I will post the function here later.

Another way to get around is to hack the Miva binary. It is as easy as editing the index file: there are only two instances of the 'HTSI' string in the Miva binary. The first is used for reading index headers (e.g. in MvFIND, MvADD), the second for writing them (MvREINDEX, MvMAKEINDEX). All the header parameters are just few bytes after the string. Exact offset will differ according to the version number and platform, but it is easy to localize.

While reducing the key and node size may be very useful (see the next paragraph), increasing the index parameters in your main Miva binary is not a good choice. Much better is to make a separate copy (say 'miva-idx') and use it only to create or re-index indexes. You can make it transparent in the way that you put all such functions in files with other extension (e.g. .mvi or why not .mvx) and add a new type handler in .htaccess letting these files parse by miva-idx instead of miva. Of course, you can have several of such binaries (and extensions) for different index sizes - from tiny (e.g. for Booleans or short integer key values) till quite large (e.g. for Memo fields).


top

How to make my indexes smaller?

You may wish to reduce the size of your indexes. It is not only because it does not look well and that they take too much disk space. The main reason for doing it is performance. I was playing long time with measuring performance of a mid-sized database (30,000 records), changing the number of keys per node from 2 to 100, but what I saw is that the size of the index has bigger influence on the performance than the tree structure! Partially it is logical and partially it is caused by a wrong implementation of the database access algorithms in Miva. I am going to write another article about the performance aspects of Miva databases in few days.

The way to reduce the size of an index is exactly the same as written above. The only problem is that Miva does not accept any node sizes below 0x200 (512 bytes). Changing this limit would request little bit more hacking with the Miva binary. Use the following to get the mvx header parameters:

NodeSize = 0x200
KeySize (default 0x6C) = maxKeySize + 12

It means 0x0D (13) for Booleans and e.g. 0x11 (17) for 5 digit ZIP codes

KeysPerNode = (NodeSize / KeySize) - 1

Using shorter keys with higher number of keys per node may be also of some advantage at very big databases (hundreds thousands or millions of records). It is just my speculation; I did not test it yet. Using more keys per node could be of much more interest especially if the algorithm of MvSKIP was better implemented. Unfortunatelly it looks that Miva programmers did not take the advantage of horizontal passing through the tree nodes at all.


top


Conclusion

Miva indexes use balanced binary tree architecture. It is one of the oldest, well proven but far not the best, fastest or simplest index architectures. In plus, after detailed performance analyse of some database functions (MvGO, MvFIND, MvSKIP, MvFILTER, MvREINDEX) I saw quite surprising facts about most of them, that clearly speak about their mistaken or inconsequent design. The positive on all these facts is that there is a lot of space for performance improvement. The only problem is that I am not sure if somebody at the Miva Co. cares at all.


top

Some Useful Links

Miva Script Manual: Using Databases
What is X-Base
What is X-Base (newer)
Balanced Binary Tree
List of several hex-editors
Database Performance Notes


top

   

Miva and some other terms used on this page are registerd trademarks of the Miva Corporation
copyright  truXoft  © 1997-2010