Author: Arham Hasan
January 15, 2025
Ever heard the phrase, “like finding a needle in a haystack?” The problem we diagnosed for a client recently is a sort of analogue to this, save one key difference: the needle is invisible.
The issue our client was experiencing pertained to a .xls file containing dozens of logged items. This file would not open, for some peculiar reason. Fortunately, we caught on pretty quick that it was probably some sort of illegal character-type issue. To test this, we briefly removed a section of the .xls file - through the data manipulation abilities of C/AL in Microsoft Dynamics NAV - which would be likely to contain an odd character, and surely enough, the file opened fine.
So: we’ve found our haystack. Now we go invisible needle-hunting! We tried a variety of methods before getting to something that really worked out. First, we scanned the section’s strings for some particularly likely culprits - the “>” and “<” and “&” and “ ‘ “ and “ “ “ - but turned up unlucky. Then we thought to compare the characters in the section’s strings to a premade array of “normal characters” to see what was possibly foreign - but eventually gave up on this in favour of our third method.
Here’s what worked - we performed a sort of binary search. First, we removed the first half of each string - which were no more than 50 or so characters long. If the code still didn’t work, then we’d know our evil character lies in the other half, so we’d remove half of that half (i.e. if it were 99 characters, the evil character would have to lie between 50-99, and then we would search 50-74 and 75-99). Continually doing this would get us to the location of the imposter symbol - for 50 characters it would take a max of 6 tries. After this, we performed the same binary search, but this time by entry. Take the first half of the entries away; if it works, then the evil string is in that first half. Slowly, bit by bit, having to rerun the .xls and download it around 12 times, we ended up finding the exact location of where our mysterious invisible needle lays. Pasting it into notepad - like taking away the invisible cloak - revealed and unmasked the perpetrator:
ASCII(31), or U+001F, the Unit Separator. A C0 control character, it’s not actually for graphical purposes, but rather for a certain type of function for the computer to understand. For example, U+0000, the null character, is used in some languages to indicate the end of a string - you don’t see it, but the computer interprets it. How our needle made its way here, we’ve got no clue. But with the power of C/AL and a lot of sifting - our equivalent of a magnet, in this case - the old metaphor may be put to rest.
Comments
Post a Comment