Defeating Invisicharacters with Pie
Happy Santa Lucia Day to those who celebrate!
Source: Table wreath from an eBay blog post, here.
Desserts Can Save the Day
Although Santa Lucia Day is typically celebrated with lussekatter, i.e. rolls of sweet deliciousness, what is going to help us today is actually pie.
That's right, pie.
I recently ran into multiple issues where I had invisible characters in a CSV file. Notably, issues with carriage returns and a feff
at the beginning of each line, which is a zero break no space. You may recall from that post I mentioned that there was always more than one error - and here we are.
In a subsequent CSV file, I noticed even. more. invisicharacters. Visually, it looked something like this:
12345 ,"some text"
And I thought: oh, some white spaces. Instead of going right for the kill, my recent invisiperience taught me caution. I moved the cursor over the character and hit x
.
Source: X Marks the Spot map, here.
Except it doesn't.
This one was a little harder to troubleshoot. I use my friend's vimrc configuration, which means that I could use ctrl-H to view the hexidecimal characters. Turns out, they weren't white spaces, in fact:
00000000: 31 32 33 34 35 c2 a0 2c 22 73 6f 6d 65 20 74 65 12345..,"some te$
c2 a0
in UTF-8 translates to 00A0
, which is a non-breaking space.
!@#$ invisicharacters.
Of course right now it isn't UTF-8, which is why the hex is c2 a0
, which means that using perl -CSD
in my replacement doesn't work like it did for feff
, which was UTF-8.
Why doesn't it work?
perl -CSD
is shorthand for -CIOEio
, which breaks down as:
C Command Switch
I stdin is UTF-8
O stdout is UTF-8
E stderr is UTF-8
i Perl input stream is UTF-8
o Perl output stream is UTF-8
You can read more about this on the Perldoc.
So, if CSD
can't help in this case, what will?
-pie
:
p Loops throgh args similar to sed
i Edit in place
e Perl command line expression
To fix this specifically:
perl -pie `s/\x{c2}\x{a0}//` $FILE
This tells perl
to replace the first (and only, in this case) instance of c2 a0
with nothing, thus removing it from the line, for the file $FILE
.
Header source: Pie vector from VectEezy and ASCII conversion from picascii.