Class: Unisec::Surrogates

Inherits:
Object
  • Object
show all
Defined in:
lib/unisec/surrogates.rb

Overview

UTF-16 surrogates conversion.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(*args) ⇒ Surrogates

Init the surrogate pair.

Examples:

surr = Unisec::Surrogates.new(128169)
# => #<Unisec::Surrogates:0x00007f96920a7ca8 @cp=128169, @hs=55357, @ls=56489>
surr.cp # => 128169
surr.hs # => 55357
surr.ls # => 56489
Unisec::Surrogates.new(55357, 56489)
# => #<Unisec::Surrogates:0x00007f96920689b8 @cp=128169, @hs=55357, @ls=56489>

Parameters:

  • args (Integer)

    If one argument is provided, it's evaluated as the code point and the two surrogates will be calculated automatically. If two arguments are provided, they are evaluated as a surrogate pair (high then low) and the code point will be calculated.



34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/unisec/surrogates.rb', line 34

def initialize(*args)
  if args.size == 1
    @cp = args[0]
    @hs = high_surrogate
    @ls = low_surrogate
  elsif args.size == 2
    @hs = args[0]
    @ls = args[1]
    @cp = code_point
  else
    raise ArgumentError
  end
end

Instance Attribute Details

#cpInteger (readonly)

Unicode code point

Returns:



11
12
13
# File 'lib/unisec/surrogates.rb', line 11

def cp
  @cp
end

#hsInteger (readonly)

High surrogate (1st code unit of a surrogate pair). Also called lead surrogate.

Returns:

  • (Integer)

    decimal high surrogate



15
16
17
# File 'lib/unisec/surrogates.rb', line 15

def hs
  @hs
end

#lsInteger (readonly)

Low surrogate (2nd code unit of a surrogate pair). Also called trail surrogate.

Returns:

  • (Integer)

    decimal low surrogate



19
20
21
# File 'lib/unisec/surrogates.rb', line 19

def ls
  @ls
end

Class Method Details

.code_point(hs, ls) ⇒ Integer

Calculate the Unicode code point based on the surrogates.

Examples:

Unisec::Surrogates.code_point(55357, 56489) # => 128169

Parameters:

  • hs (Integer)

    decimal high surrogate

  • ls (Integer)

    decimal low surrogate

Returns:



72
73
74
# File 'lib/unisec/surrogates.rb', line 72

def self.code_point(hs, ls)
  (((hs - 0xd800) * 0x400) + ls - 0xdc00 + 0x10000)
end

.high_surrogate(codepoint) ⇒ Integer

Calculate the high surrogate based on the Unicode code point.

Examples:

Unisec::Surrogates.high_surrogate(128169) # => 55357

Parameters:

  • codepoint (Integer)

    decimal codepoint

Returns:

  • (Integer)

    decimal high surrogate



53
54
55
# File 'lib/unisec/surrogates.rb', line 53

def self.high_surrogate(codepoint)
  (((codepoint - 0x10000) / 0x400).floor + 0xd800)
end

.low_surrogate(codepoint) ⇒ Integer

Calculate the low surrogate based on the Unicode code point.

Examples:

Unisec::Surrogates.low_surrogate(128169) # => 56489

Parameters:

  • codepoint (Integer)

    decimal codepoint

Returns:

  • (Integer)

    decimal low surrogate



62
63
64
# File 'lib/unisec/surrogates.rb', line 62

def self.low_surrogate(codepoint)
  (((codepoint - 0x10000) % 0x400) + 0xdc00)
end

Instance Method Details

#code_pointInteger

Same as accessing #cp. Calculate the code_point.

Returns:

  • (Integer)

    decimal code point surr = Unisec::Surrogates.new(55357, 56489) surr.code_point # => 128169



98
99
100
# File 'lib/unisec/surrogates.rb', line 98

def code_point
  @cp = Surrogates.code_point(@hs, @ls)
end

#displayString

Display a CLI-friendly output summurizing everithing about the surrogates: the corresponding character, code point, high and low surrogates (each displayed as hexadecimal, decimal and binary).

Examples:

surr = Unisec::Surrogates.new(128169)
puts surr.display # =>
# Char: 💩
# Code Point: 0x1F4A9, 0d128169, 0b11111010010101001
# High Surrogate: 0xD83D, 0d55357, 0b1101100000111101
# Low Surrogate: 0xDCA9, 0d56489, 0b1101110010101001

Returns:

  • (String)

    CLI-ready output



113
114
115
116
117
118
# File 'lib/unisec/surrogates.rb', line 113

def display
  "Char: #{[@cp].pack('U*')}\n" \
    "Code Point: 0x#{@cp.to_hex}, 0d#{@cp}, 0b#{@cp.to_bin}\n" \
    "High Surrogate: 0x#{@hs.to_hex}, 0d#{@hs}, 0b#{@hs.to_bin}\n" \
    "Low Surrogate: 0x#{@ls.to_hex}, 0d#{@ls}, 0b#{@ls.to_bin}"
end

#high_surrogateInteger

Same as accessing #hs. Calculate the high_surrogate.

Examples:

surr = Unisec::Surrogates.new(128169)
surr.high_surrogate # => 55357

Returns:

  • (Integer)

    decimal high surrogate



81
82
83
# File 'lib/unisec/surrogates.rb', line 81

def high_surrogate
  @hs = Surrogates.high_surrogate(@cp)
end

#low_surrogateInteger

Same as accessing #ls. Calculate the low_surrogate.

Examples:

surr = Unisec::Surrogates.new(128169)
surr.low_surrogate # => 56489

Returns:

  • (Integer)

    decimal low surrogate



90
91
92
# File 'lib/unisec/surrogates.rb', line 90

def low_surrogate
  @ls = Surrogates.low_surrogate(@cp)
end