mmmdpp, architecture doc finished

This commit is contained in:
Yessiest 2025-03-02 18:00:47 +04:00
parent f3d049feb2
commit e418796cfe
4 changed files with 226 additions and 507 deletions

View File

@ -21,6 +21,13 @@ This parser processes text in what can be boiled down to three phases.
- Overlay phase - Overlay phase
- Inline phase - Inline phase
It should be noted that all phases have their own related parser
classes, and a shared behaviour system, where each parser takes control
at some point, and may win ambiguous cases by having higher priority
(see `#define_child`, `#define_overlay` methods for priority parameter)
### Block/Line phase ###
The first phase breaks down blocks, line by line, into block structures. The first phase breaks down blocks, line by line, into block structures.
Blocks (preferably inherited from the Block class) can contain other blocks. Blocks (preferably inherited from the Block class) can contain other blocks.
(i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks) (i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks)
@ -89,4 +96,183 @@ While parsing text, a block may use additional info:
in lazy continuation mode (likely only ever matters for Paragraph); and in lazy continuation mode (likely only ever matters for Paragraph); and
`parent` - the parent block containing this block. `parent` - the parent block containing this block.
Block interpretations are tried in decreasing order of their priority
value, as applied using the `#define_child` method.
For blocks to be properly indexed, they need to be a valid child or
a valid descendant (meaning reachable through child chain) of the
Document class.
### Overlay phase ###
Overlay phase doesn't start at some specific point in time. Rather,
Overlay phase happens for every block individually - when that block
closes.
Overlay mechanism can be applied to any DOMObject type, so long as its
close method is called at some point (this may not be of interest to
people that do not implement custom syntax, as it generally translates
to "only block level elements get their overlays processed")
Overlay mechanism provides the ability to perform some action on the block
right after it gets closed and right before it gets interpreted by the
inline phase. Overlays may do the following:
- Change the block's class
(by returning a class from the `#process` method)
- Change the block's content (by directly editing it)
- Change the block's properties (by modifying its `properties` hash)
Overlay interpretations are tried in decreasing order of their priority
value, as defined using the `#define_overlay` method.
### Inline phase ###
Once all blocks have been processed, and all overlays have been applied
to their respective block types, the hook in the Document class's
`#parser` method executes inline parsing phase of all leaf blocks
(descendants of the `Leaf` class) and paragraphs.
The outer class encompassing all inline children of a block is
`InlineRoot`. As such, if an inline element is to ever appear within the
text, it needs to be reachable as a child or a descendant of InlineRoot.
Inline parsing works in three parts:
- First, the contens are tokenized (every parser marks its own tokens)
- Second, the forward walk procedure is called
- Third, the reverse walk procedure is called
This process is repeated for every group of parsers with equal priority.
At one point in time, only all the parsers of equal priority may run in
the same step. Then, the process goes to the next step, of parsers of
higher priority value. As counter-intuitive as this is, this means that
it goes to the parsers of _lower_ priority.
At the very end of the process, the remaining strings are concatenated
within the mixed array of inlines and strings, and turned into Text
nodes, after which the contents of the array are appended as children to
the root node.
This process is recursively applied to all elements which may have child
elements. This is ensured when an inline parser calls the "build"
utility method.
The inline parser is a class that implements static methods `tokenize`
and either `forward_walk` or `reverse_walk`. Both may be implemented at
the same time, but this isn't advisable.
The tokenization process is characterized by calling every parser in the
current group with every string in tokens array using the `tokenize`
method. It is expected that the parser breaks the string down into an
array of other strings and tokens. A token is an array where the first
element is the literal text representation of the token, the second
value is the class of the parser, and the _last_ value (_not third_) is
the `:close` or `:open` symbol (though functionally it may hold any
symbol value). Any additional information the parser may need in later
stages may be stored between the last element and the second element.
Example:
Input:
"_this _is a string of_ tokens_"
Output:
[["_", ::PointBlank::Parsing::EmphInline, :open],
"this ",
["_", ::PointBlank::Parsing::EmphInline, :open],
"is a string of",
["_", ::PointBlank::Parsing::EmphInline, :close],
" tokens",
["_", ::PointBlank::Parsing::EmphInline, :close]]
The forward walk is characterized by calling parsers which implement the
`#forward_walk` method. When the main class encounters an opening token
in `forward_walk`, it will call the `#forward_walk` method of the class
that represents this token. It is expected that the parser class will
then attempt to build the first available occurence of the inline
element it represents, after which it will return the array of all
tokens and strings that it was passed where the first element will be
the newly constructed inline element. If it is unable to close the
block, it should simply return the original contents, unmodified.
Example:
Original text:
this is outside the inline `this is inside the inline` and this
is right after the inline `and this is the next inline`
Input:
[["`", ::PointBlank::Parsing::CodeInline, :open],
"this is inside the inline"
["`", ::PointBlank::Parsing::CodeInline, :close],
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
Output:
[<::PointBlank::DOM::InlineCode
@content = "this is inside the inline">,
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
The reverse walk is characterized by calling parsers which implement the
`#reverse_walk` method when the main class encounters a closing token
for this class (the one that contains the `:close` symbol in the last
position of the token information array). After that the main class will
call the parser's `#reverse_walk` method with the current list of
tokens, inlines and strings. It is expected that the parser will then
collect all the blocks, strings and inlines that fit within the block
closed by the last element in the list, and once it encounters the
appropriate opening token for the closing token in the last position of
the array, it will then replace the elements fitting within that inline
with a class containing all the collected elements. If it is unable to
find a matching opening token for the closing token in the last
position, it should simply return the original contents, unmodified.
Example:
Original text:
blah blah something something lots of text before the emphasis
_this is emphasized `and this is an inline` but it's still
emphasized_
Input:
["blah blah something something lots of text before the emphasis",
["_", ::PointBlank::Parsing::EmphInline, :open],
"this is emphasized",
<::PointBlank::DOM::InlineCode,
@content = "and this is an inline">,
" but it's still emphasized",
["_", ::PointBlank::Parsing::EmphInline, :close]]
Output:
["blah blah something something lots of text before the emphasis",
<::PointBlank::DOM::InlineEmphasis,
children = [...,
<::PointBlank::DOM::InlineCode ...>
...]>]
Both `#forward_walk` and `#reverse_walk` are not restricted to making
just the changes discussed above, and can arbitrarily modify the token
arrays. That, however, should be done with great care, so as to not
accidentally break compatibility with other parsers.
To ensure that the collected tokens in the `#reverse_walk` and
`#forward_walk` are processes correctly, the colllected arrays of
tokens, blocks and inlines should be built into an object that
represents this parser using the `build` method (it will automatically
attempt to find the correct class to construct using the
`#define_parser` directive in the DOMObject subclass definition)

479
bin/mdpp
View File

@ -1,479 +0,0 @@
#!/usr/bin/ruby
# frozen_string_literal: true
require 'optparse'
require 'rbmark'
require 'io/console'
require 'io/console/size'
module MDPP
# Module for managing terminal output
module TextManager
# ANSI SGR escape code for bg color
# @param text [String]
# @param properties [Hash]
# @return [String]
def bg(text, properties)
color = properties['bg']
if color.is_a? Integer
"\e[48;5;#{color}m#{text}\e[49m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[48;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[49m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for fg color
# @param text [String]
# @param properties [Hash]
# @return [String]
def fg(text, properties)
color = properties['fg']
if color.is_a? Integer
"\e[38;5;#{color}m#{text}\e[39m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[38;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[39m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for bold text
# @param text [String]
# @return [String]
def bold(text)
"\e[1m#{text}\e[22m"
end
# ANSI SGR escape code for italics text
# @param text [String]
# @return [String]
def italics(text)
"\e[3m#{text}\e[23m"
end
# ANSI SGR escape code for underline text
# @param text [String]
# @return [String]
def underline(text)
"\e[4m#{text}\e[24m"
end
# ANSI SGR escape code for strikethrough text
# @param text [String]
# @return [String]
def strikethrough(text)
"\e[9m#{text}\e[29m"
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/ +/)
output = []
line = ""
until words.empty?
word = words.shift
if word.length > width
words.prepend(word[width..])
word = word[..width - 1]
end
if line.length + word.length + 1 > width
output.append(line.lstrip)
line = word
next
end
line = [line, word].join(line.end_with?("\n") ? '' : ' ')
end
output.append(line.lstrip)
output.join("\n")
end
# Draw a screen-width box around text
# @param text [String]
# @param center_margins [Integer]
# @return [String]
def box(text)
size = IO.console.winsize[1] - 2
text = wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
"│#{line.strip.ljust(size)}│" unless line.empty?
end.join("\n")
<<~TEXT
╭#{'─' * size}╮
#{text}
╰#{'─' * size}╯
TEXT
end
# Draw text right-justified
def rjust(text)
size = IO.console.winsize[1]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
line.strip.rjust(size) unless line.empty?
end.join("\n")
end
# Draw text centered
def center(text)
size = IO.console.winsize[1]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
line.strip.center(size) unless line.empty?
end.join("\n")
end
# Underline the last line of the text piece
def underline_block(text)
textlines = text.lines
last = "".match(/()()()/)
textlines.each do |x|
current = x.match(/\A(\s*)(.+?)(\s*)\Z/)
last = current if current[2].length > last[2].length
end
ltxt = last[1]
ctxt = textlines.last.slice(last.offset(2)[0]..last.offset(2)[1] - 1)
rtxt = last[3]
textlines[-1] = [ltxt, underline(ctxt), rtxt].join('')
textlines.join("")
end
# Add extra newlines around the text
def extra_newlines(text)
size = IO.console.winsize[1]
textlines = text.lines
textlines.prepend("#{' ' * size}\n")
textlines.append("\n#{' ' * size}\n")
textlines.join("")
end
# Underline last line edge to edge
def underline_full_block(text)
textlines = text.lines
textlines[-1] = underline(textlines.last)
textlines.join("")
end
# Indent all lines
def indent(text, properties)
_indent(text, level: properties['level'])
end
# Indent all lines (inner)
def _indent(text, **_useless)
text.lines.map do |line|
" #{line}"
end.join("")
end
# Bulletpoints
def bullet(text, _number, properties)
level = properties['level']
"-#{_indent(text, level: level)[1..]}"
end
# Numbers
def numbered(text, number, properties)
level = properties['level']
"#{number}.#{_indent(text, level: level)[number.to_s.length + 1..]}"
end
# Sideline for quotes
def sideline(text)
text.lines.map do |line|
"│ #{line}"
end.join("")
end
# Long bracket for code blocks
def longbracket(text, properties)
textlines = text.lines
textlines = textlines.map do |line|
"│ #{line}"
end
textlines.prepend("┌ (#{properties['element'][:language]})\n")
textlines.append("\n└\n")
textlines.join("")
end
# Add text to bibliography
def bibliography(text, properties)
return "#{text}[#{properties['element'][:link]}]" if @options['nb']
@bibliography.append([text, properties['element'][:link]])
"#{text}[#{@bibliography.length + 1}]"
end
end
DEFAULT_STYLE = {
"RBMark::DOM::Paragraph" => {
"inline" => true,
"indent" => true
},
"RBMark::DOM::Text" => {
"inline" => true
},
"RBMark::DOM::Heading1" => {
"inline" => true,
"center" => true,
"bold" => true,
"extra_newlines" => true,
"underline_full_block" => true
},
"RBMark::DOM::Heading2" => {
"inline" => true,
"center" => true,
"underline_block" => true
},
"RBMark::DOM::Heading3" => {
"inline" => true,
"underline" => true,
"bold" => true,
"indent" => true
},
"RBMark::DOM::Heading4" => {
"inline" => true,
"underline" => true,
"indent" => true
},
"RBMark::DOM::InlineImage" => {
"bibliography" => true,
"inline" => true
},
"RBMark::DOM::InlineLink" => {
"bibliography" => true,
"inline" => true
},
"RBMark::DOM::InlinePre" => {
"inline" => true
},
"RBMark::DOM::InlineStrike" => {
"inline" => true,
"strikethrough" => true
},
"RBMark::DOM::InlineUnder" => {
"inline" => true,
"underline" => true
},
"RBMark::DOM::InlineItalics" => {
"inline" => true,
"italics" => true
},
"RBMark::DOM::InlineBold" => {
"inline" => true,
"bold" => true
},
"RBMark::DOM::QuoteBlock" => {
"sideline" => true
},
"RBMark::DOM::CodeBlock" => {
"longbracket" => true
},
"RBMark::DOM::ULBlock" => {
"bullet" => true
},
"RBMark::DOM::OLBlock" => {
"numbered" => true
},
"RBMark::DOM::HorizontalRule" => {
"extra_newlines" => true
},
"RBMark::DOM::IndentBlock" => {
"indent" => true
}
}.freeze
STYLE_PRIO0 = [
["numbered", true],
["bullet", true]
].freeze
STYLE_PRIO1 = [
["center", false],
["rjust", false],
["box", false],
["indent", true],
["underline", false],
["bold", false],
["italics", false],
["strikethrough", false],
["bg", true],
["fg", true],
["bibliography", true],
["extra_newlines", false],
["sideline", false],
["longbracket", true],
["underline_block", false],
["underline_full_block", false]
].freeze
# Primary document renderer
class Renderer
include ::MDPP::TextManager
# @param input [String]
# @param options [Hash]
def initialize(input, options)
@doc = RBMark::DOM::Document.parse(input)
@style = ::MDPP::DEFAULT_STYLE.dup
@bibliography = []
@options = options
return unless options['style']
@style = @style.map do |k, v|
v = v.merge(**options['style'][k]) if options['style'][k]
[k, v]
end.to_h
end
# Return rendered text
# @return [String]
def render
text = _render(@doc.children, @doc.properties)
text += _render_bibliography unless @bibliography.empty? or
@options['nb']
text
end
private
def _render_bibliography
size = IO.console.winsize[1]
text = "\n#{'─' * size}\n"
text += @bibliography.map.with_index do |element, index|
"- [#{index + 1}] #{wordwrap(element.join(': '), size - 15)}"
end.join("\n")
text
end
def _render(children, props)
blocks = children.map do |child|
case child
when ::RBMark::DOM::Text then child.content
when ::RBMark::DOM::InlineBreak then "\n"
when ::RBMark::DOM::HorizontalRule
size = IO.console.winsize[1]
"─" * size
else
child_props = get_props(child, props)
calc_wordwrap(
_render(child.children,
child_props),
props, child_props
)
end
end
apply_props(blocks, props)
end
def calc_wordwrap(obj, props, obj_props)
size = IO.console.winsize[1]
return obj if obj_props['center'] or
obj_props['rjust']
if !props['inline'] and obj_props['inline']
wordwrap(obj, size - 2 * (props['level'].to_i + 1))
else
obj
end
end
def get_props(obj, props)
new_props = @style[obj.class.to_s].dup || {}
if props["level"]
new_props["level"] = props["level"]
new_props["level"] += 1 unless new_props["inline"]
else
new_props["level"] = 2
end
new_props["element"] = obj.properties
new_props
end
def apply_props(blockarray, properties)
blockarray = prio0(blockarray, properties)
text = blockarray.join(properties['inline'] ? "" : "\n\n")
.gsub(/\n{2,}/, "\n\n")
prio1(text, properties)
end
def prio0(blocks, props)
::MDPP::STYLE_PRIO0.filter { |x| props.include? x[0] }.each do |style|
blocks = blocks.map.with_index do |block, index|
if style[1]
method(style[0].to_s).call(block, index + 1, props)
else
method(style[0].to_s).call(block, index + 1)
end
end
end
blocks
end
def prio1(block, props)
::MDPP::STYLE_PRIO1.filter { |x| props.include? x[0] }.each do |style|
block = if style[1]
method(style[0].to_s).call(block, props)
else
method(style[0].to_s).call(block)
end
end
block
end
end
end
options = {}
OptionParser.new do |opts|
opts.banner = <<~TEXT
MDPP - Markdown PrettyPrint based on RBMark parser
Usage: mdpp [options] <file | ->
TEXT
opts.on("-h", "--help", "Prints this help message") do
puts opts
exit 0
end
opts.on("-e", "--extension EXTENSION",
"require EXTENSION before parsing") do |libname|
require libname
end
opts.on(
"-c",
"--config CONFIG",
"try to load CONFIG (~/.config/mdpp.rb is loaded by default)"
) do |config|
# rubocop:disable Security/Eval
options.merge!(eval(File.read(config))) if File.exist?(config)
# rubocop:enable Security/Eval
end
opts.on(
"-b",
"--no-bibliography",
"Do not print bibliography (links, references, etc.) at the bottom"
) do
options["nb"] = true
end
end.parse!
# rubocop:disable Security/Eval
if File.exist?("#{ENV['HOME']}/.config/mdpp.rb")
options.merge!(eval(File.read("#{ENV['HOME']}/.config/mdpp.rb")))
end
# rubocop:enable Security/Eval
text = if ARGV[0].nil? or ARGV[0] == "-"
$stdin.read
else
File.read(ARGV[0])
end
renderer = MDPP::Renderer.new(text, options)
puts renderer.render

View File

@ -1,7 +1,7 @@
#!/usr/bin/ruby #!/usr/bin/ruby
# frozen_string_literal: true # frozen_string_literal: true
require_relative 'document' require_relative 'lib/blankshell'
require 'io/console' require 'io/console'
require 'io/console/size' require 'io/console/size'
@ -185,63 +185,76 @@ module MDPP
end end
DEFAULT_STYLE = { DEFAULT_STYLE = {
"RBMark::DOM::Paragraph" => { "PointBlank::DOM::Paragraph" => {
"inline" => true, "inline" => true,
"indent" => true "indent" => true
}, },
"RBMark::DOM::Text" => { "PointBlank::DOM::Text" => {
"inline" => true "inline" => true
}, },
"RBMark::DOM::Heading1" => { "PointBlank::DOM::SetextHeading1" => {
"inline" => true, "inline" => true,
"center" => true, "center" => true,
"bold" => true, "bold" => true,
"extra_newlines" => true, "extra_newlines" => true,
"underline_full_block" => true "underline_full_block" => true
}, },
"RBMark::DOM::Heading2" => { "PointBlank::DOM::SetextHeading2" => {
"inline" => true, "inline" => true,
"center" => true, "center" => true,
"underline_block" => true "underline_block" => true
}, },
"RBMark::DOM::Heading3" => { "PointBlank::DOM::ATXHeading1" => {
"inline" => true,
"center" => true,
"bold" => true,
"extra_newlines" => true,
"underline_full_block" => true
},
"PointBlank::DOM::ATXHeading2" => {
"inline" => true,
"center" => true,
"underline_block" => true
},
"PointBlank::DOM::ATXHeading3" => {
"inline" => true, "inline" => true,
"underline" => true, "underline" => true,
"bold" => true "bold" => true
}, },
"RBMark::DOM::Heading4" => { "PointBlank::DOM::ATXHeading4" => {
"inline" => true,
"bold" => true,
"underline" => true
},
"PointBlank::DOM::ATXHeading5" => {
"inline" => true,
"underline" => true,
},
"PointBlank::DOM::ATXHeading6" => {
"inline" => true, "inline" => true,
"underline" => true "underline" => true
}, },
"RBMark::DOM::InlineImage" => { "PointBlank::DOM::InlineImage" => {
"inline" => true "inline" => true
}, },
"RBMark::DOM::InlineLink" => { "PointBlank::DOM::InlineLink" => {
"inline" => true "inline" => true
}, },
"RBMark::DOM::InlinePre" => { "PointBlank::DOM::InlinePre" => {
"inline" => true "inline" => true
}, },
"RBMark::DOM::InlineStrike" => { "PointBlank::DOM::InlineEmphasis" => {
"inline" => true,
"strikethrough" => true
},
"RBMark::DOM::InlineUnder" => {
"inline" => true,
"underline" => true
},
"RBMark::DOM::InlineItalics" => {
"inline" => true, "inline" => true,
"italics" => true "italics" => true
}, },
"RBMark::DOM::InlineBold" => { "PointBlank::DOM::InlineStrong" => {
"inline" => true, "inline" => true,
"bold" => true "bold" => true
}, },
"RBMark::DOM::ULBlock" => { "PointBlank::DOM::ULBlock" => {
"bullet" => true "bullet" => true
}, },
"RBMark::DOM::OLBlock" => { "PointBlank::DOM::OLBlock" => {
"numbered" => true "numbered" => true
} }
}.freeze }.freeze
@ -274,8 +287,7 @@ module MDPP
# @param input [String] # @param input [String]
# @param options [Hash] # @param options [Hash]
def initialize(input, options) def initialize(input, options)
@doc = RBMark::DOM::Document.parse(input) @doc = PointBlank::DOM::Document.parse(input)
pp @doc
@color_mode = options.fetch("color", true) @color_mode = options.fetch("color", true)
@ansi_mode = options.fetch("ansi", true) @ansi_mode = options.fetch("ansi", true)
@style = ::MDPP::DEFAULT_STYLE.dup @style = ::MDPP::DEFAULT_STYLE.dup
@ -297,10 +309,10 @@ module MDPP
def _render(children, props) def _render(children, props)
blocks = children.map do |child| blocks = children.map do |child|
if child.is_a? ::RBMark::DOM::Text or if child.is_a? ::PointBlank::DOM::Text or
child.is_a? ::RBMark::DOM::CodeBlock child.is_a? ::PointBlank::DOM::CodeBlock
child.content child.content
elsif child.is_a? ::RBMark::DOM::InlineBreak elsif child.is_a? ::PointBlank::DOM::InlineBreak
"\n" "\n"
else else
child_props = get_props(child, props) child_props = get_props(child, props)