mirror of
https://github.com/jbranchaud/til
synced 2026-01-03 15:18:01 +00:00
Add Decompose Unicode Character With Diacritic Mark as a Ruby TIL
This commit is contained in:
@@ -10,7 +10,7 @@ pairing with smart people at Hashrocket.
|
|||||||
|
|
||||||
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
|
For a steady stream of TILs, [sign up for my newsletter](https://crafty-builder-6996.ck.page/e169c61186).
|
||||||
|
|
||||||
_1652 TILs and counting..._
|
_1653 TILs and counting..._
|
||||||
|
|
||||||
See some of the other learning resources I work on:
|
See some of the other learning resources I work on:
|
||||||
- [Get Started with Vimium](https://egghead.io/courses/get-started-with-vimium~3t5f7)
|
- [Get Started with Vimium](https://egghead.io/courses/get-started-with-vimium~3t5f7)
|
||||||
@@ -1314,6 +1314,7 @@ If you've learned something here, support my efforts writing daily TILs by
|
|||||||
- [Create Listing Of All Middleman Pages](ruby/create-listing-of-all-middleman-pages.md)
|
- [Create Listing Of All Middleman Pages](ruby/create-listing-of-all-middleman-pages.md)
|
||||||
- [Create Named Structs With Struct.new](ruby/create-named-structs-with-struct-new.md)
|
- [Create Named Structs With Struct.new](ruby/create-named-structs-with-struct-new.md)
|
||||||
- [Create Thumbnail Image For A PDF](ruby/create-thumbnail-image-for-a-pdf.md)
|
- [Create Thumbnail Image For A PDF](ruby/create-thumbnail-image-for-a-pdf.md)
|
||||||
|
- [Decompose Unicode Character With Diacritic Mark](ruby/decompose-unicode-character-with-diacritic-mark.md)
|
||||||
- [Defaulting To Frozen String Literals](ruby/defaulting-to-frozen-string-literals.md)
|
- [Defaulting To Frozen String Literals](ruby/defaulting-to-frozen-string-literals.md)
|
||||||
- [Define A Custom RSpec Matcher](ruby/define-a-custom-rspec-matcher.md)
|
- [Define A Custom RSpec Matcher](ruby/define-a-custom-rspec-matcher.md)
|
||||||
- [Define A Method On A Struct](ruby/define-a-method-on-a-struct.md)
|
- [Define A Method On A Struct](ruby/define-a-method-on-a-struct.md)
|
||||||
|
|||||||
55
ruby/decompose-unicode-character-with-diacritic-mark.md
Normal file
55
ruby/decompose-unicode-character-with-diacritic-mark.md
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
# Decompose Unicode Character With Diacritic Mark
|
||||||
|
|
||||||
|
A character like the `ñ` is typically represented by the unicode codepoint of
|
||||||
|
`U+00F1`. However, it is also possible to represent it with two unicode
|
||||||
|
codepoints -- the `n` (`U+006E`) and the combining diacritical mark `˜`
|
||||||
|
(`U+0303`).
|
||||||
|
|
||||||
|
We can see that by comparing a typed `ñ` with one where we split it apart into
|
||||||
|
the separate codepoints. We can do that with
|
||||||
|
[`#unicode_normalize`](https://apidock.com/ruby/v2_5_5/String/unicode_normalize)
|
||||||
|
and the `:nfd` argument which stands for _Normalized Form Decomposed_.
|
||||||
|
|
||||||
|
```ruby
|
||||||
|
> "ñ" == "ñ".unicode_normalize(:nfd)
|
||||||
|
=> false
|
||||||
|
> "ñ".unicode_normalize(:nfd).length
|
||||||
|
=> 2
|
||||||
|
> "ñ".length
|
||||||
|
=> 1
|
||||||
|
```
|
||||||
|
|
||||||
|
We can inspect the exact codepoints by iterating over each character and
|
||||||
|
printing out the codepoint value.
|
||||||
|
|
||||||
|
```ruby
|
||||||
|
"ñ".each_char.with_index do |char, i|
|
||||||
|
puts "#{i}: '#{char}' -> U+#{char.ord.to_s(16).upcase.rjust(4, '0')}"
|
||||||
|
end
|
||||||
|
# 0: 'ñ' -> U+00F1
|
||||||
|
# => "ñ"
|
||||||
|
|
||||||
|
"ñ".unicode_normalize(:nfd).each_char.with_index do |char, i|
|
||||||
|
puts "#{i}: '#{char}' -> U+#{char.ord.to_s(16).upcase.rjust(4, '0')}"
|
||||||
|
end
|
||||||
|
# 0: 'n' -> U+006E
|
||||||
|
# 1: '̃' -> U+0303
|
||||||
|
#=> "ñ"
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice the difference after the character has been decomposed such that the
|
||||||
|
diacritic is separated from the character.
|
||||||
|
|
||||||
|
This can be done with other characters containing diacritics.
|
||||||
|
|
||||||
|
And here we go the other direction with
|
||||||
|
[`#pack`](https://ruby-doc.org/core-3.0.1/Array.html#method-i-pack).
|
||||||
|
|
||||||
|
```ruby
|
||||||
|
> [0x006E, 0x0303].pack("U*")
|
||||||
|
=> "ñ"
|
||||||
|
> [0x00F1].pack("U*")
|
||||||
|
=> "ñ"
|
||||||
|
> [0x006E, 0x0303].pack("U*") == [0x00F1].pack("U*")
|
||||||
|
=> false
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user