iBase Digital Asset Management Blog

Unicode and the importance of data integrity within digital asset management software

23rd November 2015

When storing your data you may take it for granted that all software is prepared to store it without any type of corruption or alteration, but unfortunately this isn’t always the case. At iBase we’re committed to ensuring your data is unaltered when it’s stored using our DAM software, and this may seem to be stating the obvious but the rabbit hole of data, character sets and code points is a rather deep one.

data integrity within digital asset management

For those more technically-minded, it’ll be of particular interest to note that UTF-8 is the most common format for data used in databases connected to the web. These fall under the umbrella of Unicode, of which there are 11 standard annexes, 8 technical standards and 10 technical reports.

Each UTF is reversible, thus every UTF standard supports lossless round tripping: mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again. To ensure round tripping, a UTF mapping must map all code points (except surrogate code points) to unique byte sequences. This includes reserved (unassigned) code points and the 66 non-characters (including U+FFFE and U+FFFF).

A UTF character can contain up to 21 bits depending on how complex the character is. Each of these characters can also be stacked as many times as a user requires, which means the data can often be a trifle difficult to contain. Rest assured that our DAM software won’t allow malformed unicode to slip into the data, and neither will you be left with corrupt strings lying around the place from bad conversions – our software will detect those before they enter the system and alert you to their presence.

For more information on Unicode, or any other aspect of DAM software, contact us at support@ibase.com or call us on +44 (0) 1943 603 636.

Brand asset management
digital asset management
Unicode