Database

[[Right now this is not true!]]

[[I have made considerable changes to the database and I am not quite sure that I'm going to keep them at this point.]] 

The tagname database is made up of two major parts. The first part is the realtime data map. This datamap is where the actual values are stored. The goal is to make this datamap as tight as possible so no tagname or data type information is stored here, only the actual value. This datamap is originally allocated as a fixed number of 32 bit unsigned integers. The database is allowed to grow as the number of tags increase. There are a couple of compile time defintions in the code that control how big the database begins life and what increment it will grow. These may become configurable parameters but for now they are set using a couple of #define's.

The second part of the database is a large array that records the static information about each tag. The tagname and the data type are both stored in this array and the handle into the datamap is also stored here. The handle is a 32 bit number and the lower 6 bits are special. They point to the bit inside the 32 bit register. This gives the handle bit-level resolution. So a BOOL datatype (one bit) might be at handle 0x0004, the next BOOL will be placed at 0x0005. These two pieces of data are within the same word in the datamap but are at different offsets within that word. The handle takes care of this. The handle is therefore merely a pointer to a BIT location within the datamap starting with 0x00000000 and continuing to (DATA_SIZE * 32) -1.

The datatype is also stored in this array. The size of the datatype is encoded into the bottom four bits of the type. To convert from the datatype to the number of bits the datatype occupies simply raise 2 to the number gained from masking off the lower 4 bits. (A macro TYPESIZE() will do this automatically) A single BOOL would have 0000 as the lower four bits. A BYTE would be 0011, and a 16 bit number would be 0100. Obviously with this kind of setup there is no way to have a 14 bit datatype but it sure makes it easy for the data handling routines to figure out how large a data type is. The upper bits of the datatype are arbitrary but must be unique. The database handling code in OpenDAX doesn't care about the actual type of the data simply the length.

Also contained in the array is the count. Every tag is assumed to be an array. Even if there is only one. Each tag is guaranteed to be contiguous. If you ask for 8 BYTES you will get 8 BYTES with sequential handles. You can copy 8 bytes from the datamap to somewhere else in memory and your data will remain intact.

The database is an array that is sorted by the handle. This allows us the ability to use a simple binary search to get the information we want about a tag.

I have every intention of adding the capability to handle user defined data types. I find this very powerful and should be a part of this system. I haven't yet figured out how to implement this but I suspect handled at a slightly higher level and the tagname database will simply see an array of the proper length to handle the data.

The algorithm that decides where the data is inside the datamap tries to fit the data as tightly as possible. It will fill in gaps in the datamap if the tags will fit. Multi-bit data types are placed at the proper even offsets within the datamap. For instance a BYTE (8 bit) data type will always have three zeros at the end of it's handle. So if three bits are added to the database at 0x0000, 0x0001 and 0x0003 and then a BYTE is added, it will be located at 0x0008. Then if another bit is added it will be placed at 0x0004 and this will continue until the gap is filled. If a 16 bit datatype is added after this byte it's handle will place it right after the BYTE as this will guarantee four zeros at the end of the handle. If it were a 32 bit number added after that first BYTE then a 16 bit gap will have to be left and the next word will be used. (It helps to draw this out). 32 bit tags will guarantee 5 lower end bits be zero and 64 bit will guarantee 6 zeros. Obviously this is not the most efficient way to store the data but it makes the data infinitely easier to deal with. Simple casts can be used to deal with the data, the trade off is that there will be some gaps in the datamap. Since BOOL datatypes are so very common in most control applications it's likely that in the real world most of these gaps will be filled.

With this architecture the live data and the static data are separated. Since the static data (datatype, tag name, etc.) won't change much it won't need to be communicated to the modules or remote nodes except when created or updated. The datamap is relatively compact so it can be sent to modules and remote nodes quickly which is important in applications that require near-real-time data updates. The drawback is a small bit of complexity in having to manage both data areas (the datamap and the tagname database). Another potential drawback is that more complex data handling may add some overhead to the system. For instance if we were to enforce permissions or track data changes it might curtail our ability to directly update the data table in shared memory. This brings up my next concern.

I am still not 100% confident that my method for updating the database
is the right one. I still think that I have the right idea on the data
itself. Right now I am leaning toward having two methods for updating the database. One is a simple shared memory system. Where the tagname database and the datamap are both shared memory segments. This is quite simple and very efficient. It does bring with it some concurrency issues. Semaphores will have to be used for the modules to access the data and if we are going to implement any sort of permission or on-change hooks in the database then we'd be losing most of the benefit of having the database in shared memory, since we'd have to develop a centralized facility for dealing with it anyway. I may decide that the central facility is the libdax library itself.

The other method would be through a kernel message queue. Each module would send messages to the core process to read and/or write data. The message queue is discussed elsewhere in this documentation. This eliminates the need for semaphores somewhat, and could be used as a central pathway so that data change hooks or logging facilities could be implemented. All adds complexity and slows down the updating of tags. I am still not convinced that permissions should be enforced at this low level, but there is still the problem of modules that will want to manipulate tags that have been deleted or modified.

Let's assume that a logic module is using data from an I/O module. It uses tags x and y. At some point another module deletes tag x. The logic module is still reading the handle for tag x and doing calculations on it. This in and of itself may not be a big problem, but when another module adds tag z and it gets placed where tag x used to be then we have a serious problem. Now we have a logic module that is being manipulated by a piece of data that the designer never intended, and if this is any kind of industrial control system it could hurt somebody. There will have to be some kind of database integrity check built into this system. Other systems of this type will simply not allow the tag to be deleted. Perhaps some kind of lock counter could be implemented in the tagname database.

Even if we implement a lock counter, we have the issue of allowing modules to use data without first incrementing the counter, and if we decide to force modules to lock data before they are allowed to read or write it then how do we keep up with what data the modules have locked. Should the decision be made at the module level or should we figure out a way to enfoce it at the database level? The module level would allow more freedom and would be much simpler to implement. It would; however, create some problems with the core program's ability to keep track of wayward modules. If a module locked a bunch of data and then died that data would never again be able to be deleted. If the integrity was enforced at the database level then the core could keep track of what modules have what pieces of data locked and then unlock them in case that module were to fail.

These are the issues that I am struggling with. Right now I am leaning toward allowing modules to update the datatabe at will (with proper semaphores) and indicating a corruption of the database in a commonly recognized way, probably in a status tag located at handle 0x00. Once signaled that data has been corrupted, the modules could re-read the tagname database entries for the data that they are responsible for and clear the flag. Once the flag was clear then the core (or library) would allow the module to continue using that data. Most of this would take place inside the libdax library and the actual module wouldn't have to deal with it at all. Then all that is required is the function that retrieves the data within the library look at the status flag and make sure that it knows about any corruptions of the data before it returns data to the calling module. This would have to happen anyway if we use shared memory segments because the libary would have to detach and reattach if the core needs to increase the size of the database.