we may finally know what causes alzheimer’s – and how to stop it
I have noticed for years that certain emails and documents have strange characters where punctuation and other characters should be. An example is this word: yesterday’due south Where the characters ’ should clearly exist an apostrophe. Why is this happening and what can I practise to eliminate this occurring? I suspect that it happens more often when the originating calculator organisation is a mac.
It's all about character encoding.
And that simple sentence represents a scrap of complication.
Let me comprehend a few concepts, and throw out a few tips on how it can sometimes be avoided.
Encoding
Every bit I've discussed before, typically in the context of electronic mail, in that location are several means to "encode" the characters – the letters and numbers and symbols – you see on the screen.
The cardinal concept is that all characters are actually stored equally numbers. The capital letter letter "A", for instance, is the number 65. "B" is 66, and so on.
"
The key concept is that all characters are really stored every bit numbers.
The "ASCII" character set or encoding uses a single byte – values from 0 to 255 – to correspond upwards to 256 different characters. (Technically ASCII actually only uses
7 $.25 of that byte, or values from 0-127. The most common true 8-bit encoding used on the internet today is "ISO-8859-ane".)
The trouble, of course, is that there are way more than than 256 possible characters. While nosotros might spend near of our time with common characters like A-Z, a-z, 0-nine and a scattering of punctuation, in reality the there are thousands of other possible characters – particularly if you retrieve globally.
At the other end of the spectrum is the "Unicode" encoding, which uses ii (or more) bytes, giving many more possible different characters. "A" is even so 65, but if we look at it in hexadecimal the single byte Ascii "A" is 41, while the two-byte Unicode "A" is 0041.
Enter "UTF-viii", for "8 flake Unicode Transformation Format".
In UTF-8 the unabridged Unicode grapheme fix is broken down by an algorithm into byte sequences that are either 1, 2, 3 or iv bytes long. The reason is uncomplicated: the vast majority of characters in common usage in Western languages fall into the ane byte range. Letters remain smaller, but should one of those "other" characters be needed it can be incorporated by using its "longer" representation.
All that is a lot of back story to the problem.
Mis-Estimation
When you lot see funny characters it'southward considering data encoded using UTF-8 is likely existence interpreted as ISO-8859-1.
Let's use an example: that apostrophe.
Showtime, let'south be articulate as mud: there are apostrophes, and apostrophes. In reality the characters nosotros ofttimes refer to as apostrophes could be:
- the apostrophe: (')
- the acute accent: (´)
- the grave accent: (`)
- the right single quote (')
- the left single quote (')
(Those might look like, different, or non appear at all depending on the fonts and grapheme sets available on your calculator. I told you lot this was complex. )
Each, of course has a unlike encoding. Let's have the correct unmarried quote (for reasons I'll explain beneath):
- ASCII: doesn't exist
- ISO-8859-i: 0xB4 in hexadecimal
- Unicode: 0x07E3 in hexadecimal
- UTF-viii: 0xE28099
I don't wait you to intendance near the actual numbers there, but just notice how dramatically unlike they are.
Now, what happens when the UTF-viii series of numbers is interpreted as if information technology were ISO-8859-1?
’
Look familiar?
0xE28099 breaks down as 0xE2 (â), 0x80 (€) and 0x99 (™). What was ane graphic symbol in UTF-8 (') gets mistakenly displayed every bit three (’) when misinterpreted as ISO-8859-i.
The Culprits
At that place are typically two.
Email programs: electronic mail messages tin can include, as part of the header information you don't see, the blazon of encoding used to represent the contents of the message. The problem is that some become it incorrect, or, as y'all compose post you enter characters that cannot actually be represented past the current encoding scheme. In the afterward case the email programme has to do "something", and that may include sending the character anyway, in one encoding scheme, even though the message is flagged every bit being in another.
I can hear you lot saying "but I didn't type in any special characters!".
Use Word to edit your email or your web page? And then you probably did. Microsoft Word is culprit number 2.
In particular, the "Smart Quotes" option in Word volition frequently supervene upon a plain apostrophe (') with an acute accent (´) or – as we saw above – right single quote ('). When that gets sent in or displayed using ISO-8859-1 encoding, you lot get the results above.
The solution? Ideally, watch what you're typing. I know that "Smart Quotes", while overnice in printed documents, causes me enough grief elsewhere that it'south ane of the outset options I turn off when configuring Microsoft Give-and-take.
If you can, configure your email program to send in UTF-8 encoding (many, if not most, don't make this easily configurable).
Just regardless of how yous got here, at least now you'll know why.
Source: https://askleo.com/why_do_i_get_odd_characters_instead_of_quotes_in_my_documents/
0 Response to "we may finally know what causes alzheimer’s – and how to stop it"
Post a Comment