Compare commits

...

25 Commits

Author SHA1 Message Date
Yessiest 4c484f64fc better emphasis behaviour compatibility 2012-01-01 04:03:02 +04:00
Yessiest 4d4d8cfc4c more compatibility fixes 2025-03-13 10:57:52 +04:00
Yessiest 0c7be01e11 paragraph underlining algorithm correction 2025-03-13 10:23:21 +04:00
Yessiest 0863d4cf4a compatibility fixes for emphasis 2025-03-13 10:19:44 +04:00
Yessiest f03f8dfa29 oops forgot the garbage 2025-03-10 03:04:00 +04:00
Yessiest 01d88135dc fixed parsing of title-less links 2025-03-10 03:00:23 +04:00
Yessiest 74edbf603e fix for horizontal rule rendering in plainterm 2025-03-10 02:38:10 +04:00
Yessiest 5a302976aa properly handling uri emplacement 2025-03-07 23:58:14 +00:00
Yessiest c31365115b proper autolinks in html 2025-03-07 23:49:11 +00:00
Yessiest acf03f6b36 finally this gem builds 2025-03-07 23:20:26 +00:00
Yessiest 4a3c55f00f slimmy 2025-03-07 23:19:18 +00:00
Yessiest 9ac12573a3 BIG mode 2025-03-07 23:11:28 +00:00
Yessiest 41680e45e1 extra flags for inline image and inline link 2025-03-07 21:32:51 +00:00
Yessiest 3fd7e48907 security considerations document 2025-03-07 21:30:23 +00:00
Yessiest 1fb5f15ead HTML renderer fixes, additional compliance 2025-03-07 21:29:24 +00:00
Yessiest 06e861ffcd HTML entity encoding implemented, HTML renderer implemented 2025-03-07 18:35:56 +00:00
Yessiest 65471b5a1b better plaintext renderer 2025-03-05 01:04:21 +04:00
Yessiest e418796cfe mmmdpp, architecture doc finished 2025-03-02 18:00:47 +04:00
Yessiest f3d049feb2 fixes for list parsing 2025-03-02 13:38:25 +04:00
Yessiest 7b8590b9c6 probably proper compatibility 2025-03-01 23:13:30 +00:00
Yessiest af93de6f4d extra minute details regarding proper parsing 2025-03-01 21:51:08 +00:00
Yessiest 1a9dd30112 it's all downhill from here 2025-03-01 19:54:20 +00:00
Yessiest 8b63d77006 dam 2025-01-24 22:07:26 +00:00
Yessiest 468bd043ca damn 2025-01-24 22:07:17 +00:00
Yessiest 40e9144010 no 2024-12-29 22:41:55 +04:00
25 changed files with 5580 additions and 1998 deletions

4
.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
/lib/example.md
/*.gem
/.yardoc
test*

View File

@ -1,3 +1,3 @@
# rubymark
Minimalistic modular markdown parser in Ruby
Modular, compliant Markdown parser in Ruby

278
architecture.md Normal file
View File

@ -0,0 +1,278 @@
Architecture of madness
=======================
Prelude
-------
It needs to be stressed that making the parser modular while keeping it
relatively simple was a laborous undertaking. There has not been a standard
more hostile towards the people who dare attempt to implement it than
CommonMark. It should also be noted, that despite it being titled a
"Standard" in this document, it is less widely adopted than the Github
Flavored Markdown syntax. Github Flavored Markdown, however, is only but
a mere subset of this parser's model, albeit requiring a few extensions.
Current state (as of March 02, 2025)
------------------------------------
This parser processes text in what can be boiled down to three phases.
- Block/Line phase
- Overlay phase
- Inline phase
It should be noted that all phases have their own related parser
classes, and a shared behaviour system, where each parser takes control
at some point, and may win ambiguous cases by having higher priority
(see `#define_child`, `#define_overlay` methods for priority parameter)
### Block/Line phase ###
The first phase breaks down blocks, line by line, into block structures.
Blocks (preferably inherited from the Block class) can contain other blocks.
(i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks)
may not contain anything else (except inline content, more on that later).
Blocks are designed to be parsed independently. This means that it *should*
be possible to tear out any standard block and make it not get parsed.
This, however, isn't thoroughly tested for.
Blocks as proper, real classes have a certain lifecycle to follow when
being constructed:
1. Open condition
- A block needs to find its first marker on the current line to open
(see `#begin?` method)
- Once it's open, it's immediately initialized and fed the line it just
read (but now as an object, not as a class) (see `#consume` method)
2. Marker/Line consumption
- While it should be kept open, the block parser instance will
keep reading inupt through `#consume` method, returning a pair
of modified line (after consuming its tokens from it) and
a boolean value indicating permission of lazy continuation
(if it's a block like a QuoteBlock or ULBlock that can be lazily
overflowed).
Every line the parser needs to record needs to be pushed
through the `#push` method.
3. Closure
- If the current line no longer belongs to the current block
(if the block should have been closed on the previous line),
it simply needs to `return` a pair of `nil`, and a boolean value for
permission of lazy continuation
- If a block should be closed on the current line, it should capture it,
keep track of the "closed" state, then `return` `nil` on the next call
of `#consume`
- Once a block is closed, it:
1. Receives its content from the parser
2. Parser receives the "close" method call
3. (optional) Parser may have a callable method `#applyprops`. If
it exists, it gets called with the current constructed block.
4. (optional) All overlays assigned to this block's class are
processed on the contents of this block (more on that in
Overlay phase)
5. (optional) Parser may return a different class, which
the current block should be cast into (Overlays may change
the class as well)
6. (optional) If a block can respond to `#parse_inner` method, it
will get called, allowing the block to parse its own contents.
- After this point, the block is no longer touched until the document
fully gets processed.
4. Inline processing
- (Applies only to Paragraph and any child of LeafBlock)
When the document gets fully processed, the contents of the current
block are taken, assigned to an InlineRoot instance, and then parsed
in Inline mode
5. Completion
- The resulting document is then returned.
While there is a lot of functionality available in desgining blocks, it is
not necessary for the simplest of the block kinds available. The simplest
example of a block parser is likely the ThematicBreakParser class, which
implements the only 2 methods needed for a block parser to function.
While parsing text, a block may use additional info:
- In consume method: `lazy` hasharg, if the current line is being processed
in lazy continuation mode (likely only ever matters for Paragraph); and
`parent` - the parent block containing this block.
Block interpretations are tried in decreasing order of their priority
value, as applied using the `#define_child` method.
For blocks to be properly indexed, they need to be a valid child or
a valid descendant (meaning reachable through child chain) of the
Document class.
### Overlay phase ###
Overlay phase doesn't start at some specific point in time. Rather,
Overlay phase happens for every block individually - when that block
closes.
Overlay mechanism can be applied to any DOMObject type, so long as its
close method is called at some point (this may not be of interest to
people that do not implement custom syntax, as it generally translates
to "only block level elements get their overlays processed")
Overlay mechanism provides the ability to perform some action on the block
right after it gets closed and right before it gets interpreted by the
inline phase. Overlays may do the following:
- Change the block's class
(by returning a class from the `#process` method)
- Change the block's content (by directly editing it)
- Change the block's properties (by modifying its `properties` hash)
Overlay interpretations are tried in decreasing order of their priority
value, as defined using the `#define_overlay` method.
### Inline phase ###
Once all blocks have been processed, and all overlays have been applied
to their respective block types, the hook in the Document class's
`#parser` method executes inline parsing phase of all leaf blocks
(descendants of the `Leaf` class) and paragraphs.
The outer class encompassing all inline children of a block is
`InlineRoot`. As such, if an inline element is to ever appear within the
text, it needs to be reachable as a child or a descendant of InlineRoot.
Inline parsing works in three parts:
- First, the contens are tokenized (every parser marks its own tokens)
- Second, the forward walk procedure is called
- Third, the reverse walk procedure is called
This process is repeated for every group of parsers with equal priority.
At one point in time, only all the parsers of equal priority may run in
the same step. Then, the process goes to the next step, of parsers of
higher priority value. As counter-intuitive as this is, this means that
it goes to the parsers of _lower_ priority.
At the very end of the process, the remaining strings are concatenated
within the mixed array of inlines and strings, and turned into Text
nodes, after which the contents of the array are appended as children to
the root node.
This process is recursively applied to all elements which may have child
elements. This is ensured when an inline parser calls the "build"
utility method.
The inline parser is a class that implements static methods `tokenize`
and either `forward_walk` or `reverse_walk`. Both may be implemented at
the same time, but this isn't advisable.
The tokenization process is characterized by calling every parser in the
current group with every string in tokens array using the `tokenize`
method. It is expected that the parser breaks the string down into an
array of other strings and tokens. A token is an array where the first
element is the literal text representation of the token, the second
value is the class of the parser, and the _last_ value (_not third_) is
the `:close` or `:open` symbol (though functionally it may hold any
symbol value). Any additional information the parser may need in later
stages may be stored between the last element and the second element.
Example:
Input:
"_this _is a string of_ tokens_"
Output:
[["_", ::PointBlank::Parsing::EmphInline, :open],
"this ",
["_", ::PointBlank::Parsing::EmphInline, :open],
"is a string of",
["_", ::PointBlank::Parsing::EmphInline, :close],
" tokens",
["_", ::PointBlank::Parsing::EmphInline, :close]]
The forward walk is characterized by calling parsers which implement the
`#forward_walk` method. When the main class encounters an opening token
in `forward_walk`, it will call the `#forward_walk` method of the class
that represents this token. It is expected that the parser class will
then attempt to build the first available occurence of the inline
element it represents, after which it will return the array of all
tokens and strings that it was passed where the first element will be
the newly constructed inline element. If it is unable to close the
block, it should simply return the original contents, unmodified.
Example:
Original text:
this is outside the inline `this is inside the inline` and this
is right after the inline `and this is the next inline`
Input:
[["`", ::PointBlank::Parsing::CodeInline, :open],
"this is inside the inline"
["`", ::PointBlank::Parsing::CodeInline, :close],
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
Output:
[<::PointBlank::DOM::InlineCode
@content = "this is inside the inline">,
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
The reverse walk is characterized by calling parsers which implement the
`#reverse_walk` method when the main class encounters a closing token
for this class (the one that contains the `:close` symbol in the last
position of the token information array). After that the main class will
call the parser's `#reverse_walk` method with the current list of
tokens, inlines and strings. It is expected that the parser will then
collect all the blocks, strings and inlines that fit within the block
closed by the last element in the list, and once it encounters the
appropriate opening token for the closing token in the last position of
the array, it will then replace the elements fitting within that inline
with a class containing all the collected elements. If it is unable to
find a matching opening token for the closing token in the last
position, it should simply return the original contents, unmodified.
Example:
Original text:
blah blah something something lots of text before the emphasis
_this is emphasized `and this is an inline` but it's still
emphasized_
Input:
["blah blah something something lots of text before the emphasis",
["_", ::PointBlank::Parsing::EmphInline, :open],
"this is emphasized",
<::PointBlank::DOM::InlineCode,
@content = "and this is an inline">,
" but it's still emphasized",
["_", ::PointBlank::Parsing::EmphInline, :close]]
Output:
["blah blah something something lots of text before the emphasis",
<::PointBlank::DOM::InlineEmphasis,
children = [...,
<::PointBlank::DOM::InlineCode ...>
...]>]
Both `#forward_walk` and `#reverse_walk` are not restricted to making
just the changes discussed above, and can arbitrarily modify the token
arrays. That, however, should be done with great care, so as to not
accidentally break compatibility with other parsers.
To ensure that the collected tokens in the `#reverse_walk` and
`#forward_walk` are processes correctly, the colllected arrays of
tokens, blocks and inlines should be built into an object that
represents this parser using the `build` method (it will automatically
attempt to find the correct class to construct using the
`#define_parser` directive in the DOMObject subclass definition)

479
bin/mdpp
View File

@ -1,479 +0,0 @@
#!/usr/bin/ruby
# frozen_string_literal: true
require 'optparse'
require 'rbmark'
require 'io/console'
require 'io/console/size'
module MDPP
# Module for managing terminal output
module TextManager
# ANSI SGR escape code for bg color
# @param text [String]
# @param properties [Hash]
# @return [String]
def bg(text, properties)
color = properties['bg']
if color.is_a? Integer
"\e[48;5;#{color}m#{text}\e[49m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[48;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[49m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for fg color
# @param text [String]
# @param properties [Hash]
# @return [String]
def fg(text, properties)
color = properties['fg']
if color.is_a? Integer
"\e[38;5;#{color}m#{text}\e[39m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[38;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[39m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for bold text
# @param text [String]
# @return [String]
def bold(text)
"\e[1m#{text}\e[22m"
end
# ANSI SGR escape code for italics text
# @param text [String]
# @return [String]
def italics(text)
"\e[3m#{text}\e[23m"
end
# ANSI SGR escape code for underline text
# @param text [String]
# @return [String]
def underline(text)
"\e[4m#{text}\e[24m"
end
# ANSI SGR escape code for strikethrough text
# @param text [String]
# @return [String]
def strikethrough(text)
"\e[9m#{text}\e[29m"
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/ +/)
output = []
line = ""
until words.empty?
word = words.shift
if word.length > width
words.prepend(word[width..])
word = word[..width - 1]
end
if line.length + word.length + 1 > width
output.append(line.lstrip)
line = word
next
end
line = [line, word].join(line.end_with?("\n") ? '' : ' ')
end
output.append(line.lstrip)
output.join("\n")
end
# Draw a screen-width box around text
# @param text [String]
# @param center_margins [Integer]
# @return [String]
def box(text)
size = IO.console.winsize[1] - 2
text = wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
"│#{line.strip.ljust(size)}│" unless line.empty?
end.join("\n")
<<~TEXT
╭#{'─' * size}╮
#{text}
╰#{'─' * size}╯
TEXT
end
# Draw text right-justified
def rjust(text)
size = IO.console.winsize[1]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
line.strip.rjust(size) unless line.empty?
end.join("\n")
end
# Draw text centered
def center(text)
size = IO.console.winsize[1]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
line.strip.center(size) unless line.empty?
end.join("\n")
end
# Underline the last line of the text piece
def underline_block(text)
textlines = text.lines
last = "".match(/()()()/)
textlines.each do |x|
current = x.match(/\A(\s*)(.+?)(\s*)\Z/)
last = current if current[2].length > last[2].length
end
ltxt = last[1]
ctxt = textlines.last.slice(last.offset(2)[0]..last.offset(2)[1] - 1)
rtxt = last[3]
textlines[-1] = [ltxt, underline(ctxt), rtxt].join('')
textlines.join("")
end
# Add extra newlines around the text
def extra_newlines(text)
size = IO.console.winsize[1]
textlines = text.lines
textlines.prepend("#{' ' * size}\n")
textlines.append("\n#{' ' * size}\n")
textlines.join("")
end
# Underline last line edge to edge
def underline_full_block(text)
textlines = text.lines
textlines[-1] = underline(textlines.last)
textlines.join("")
end
# Indent all lines
def indent(text, properties)
_indent(text, level: properties['level'])
end
# Indent all lines (inner)
def _indent(text, **_useless)
text.lines.map do |line|
" #{line}"
end.join("")
end
# Bulletpoints
def bullet(text, _number, properties)
level = properties['level']
"-#{_indent(text, level: level)[1..]}"
end
# Numbers
def numbered(text, number, properties)
level = properties['level']
"#{number}.#{_indent(text, level: level)[number.to_s.length + 1..]}"
end
# Sideline for quotes
def sideline(text)
text.lines.map do |line|
"│ #{line}"
end.join("")
end
# Long bracket for code blocks
def longbracket(text, properties)
textlines = text.lines
textlines = textlines.map do |line|
"│ #{line}"
end
textlines.prepend("┌ (#{properties['element'][:language]})\n")
textlines.append("\n└\n")
textlines.join("")
end
# Add text to bibliography
def bibliography(text, properties)
return "#{text}[#{properties['element'][:link]}]" if @options['nb']
@bibliography.append([text, properties['element'][:link]])
"#{text}[#{@bibliography.length + 1}]"
end
end
DEFAULT_STYLE = {
"RBMark::DOM::Paragraph" => {
"inline" => true,
"indent" => true
},
"RBMark::DOM::Text" => {
"inline" => true
},
"RBMark::DOM::Heading1" => {
"inline" => true,
"center" => true,
"bold" => true,
"extra_newlines" => true,
"underline_full_block" => true
},
"RBMark::DOM::Heading2" => {
"inline" => true,
"center" => true,
"underline_block" => true
},
"RBMark::DOM::Heading3" => {
"inline" => true,
"underline" => true,
"bold" => true,
"indent" => true
},
"RBMark::DOM::Heading4" => {
"inline" => true,
"underline" => true,
"indent" => true
},
"RBMark::DOM::InlineImage" => {
"bibliography" => true,
"inline" => true
},
"RBMark::DOM::InlineLink" => {
"bibliography" => true,
"inline" => true
},
"RBMark::DOM::InlinePre" => {
"inline" => true
},
"RBMark::DOM::InlineStrike" => {
"inline" => true,
"strikethrough" => true
},
"RBMark::DOM::InlineUnder" => {
"inline" => true,
"underline" => true
},
"RBMark::DOM::InlineItalics" => {
"inline" => true,
"italics" => true
},
"RBMark::DOM::InlineBold" => {
"inline" => true,
"bold" => true
},
"RBMark::DOM::QuoteBlock" => {
"sideline" => true
},
"RBMark::DOM::CodeBlock" => {
"longbracket" => true
},
"RBMark::DOM::ULBlock" => {
"bullet" => true
},
"RBMark::DOM::OLBlock" => {
"numbered" => true
},
"RBMark::DOM::HorizontalRule" => {
"extra_newlines" => true
},
"RBMark::DOM::IndentBlock" => {
"indent" => true
}
}.freeze
STYLE_PRIO0 = [
["numbered", true],
["bullet", true]
].freeze
STYLE_PRIO1 = [
["center", false],
["rjust", false],
["box", false],
["indent", true],
["underline", false],
["bold", false],
["italics", false],
["strikethrough", false],
["bg", true],
["fg", true],
["bibliography", true],
["extra_newlines", false],
["sideline", false],
["longbracket", true],
["underline_block", false],
["underline_full_block", false]
].freeze
# Primary document renderer
class Renderer
include ::MDPP::TextManager
# @param input [String]
# @param options [Hash]
def initialize(input, options)
@doc = RBMark::DOM::Document.parse(input)
@style = ::MDPP::DEFAULT_STYLE.dup
@bibliography = []
@options = options
return unless options['style']
@style = @style.map do |k, v|
v = v.merge(**options['style'][k]) if options['style'][k]
[k, v]
end.to_h
end
# Return rendered text
# @return [String]
def render
text = _render(@doc.children, @doc.properties)
text += _render_bibliography unless @bibliography.empty? or
@options['nb']
text
end
private
def _render_bibliography
size = IO.console.winsize[1]
text = "\n#{'─' * size}\n"
text += @bibliography.map.with_index do |element, index|
"- [#{index + 1}] #{wordwrap(element.join(': '), size - 15)}"
end.join("\n")
text
end
def _render(children, props)
blocks = children.map do |child|
case child
when ::RBMark::DOM::Text then child.content
when ::RBMark::DOM::InlineBreak then "\n"
when ::RBMark::DOM::HorizontalRule
size = IO.console.winsize[1]
"─" * size
else
child_props = get_props(child, props)
calc_wordwrap(
_render(child.children,
child_props),
props, child_props
)
end
end
apply_props(blocks, props)
end
def calc_wordwrap(obj, props, obj_props)
size = IO.console.winsize[1]
return obj if obj_props['center'] or
obj_props['rjust']
if !props['inline'] and obj_props['inline']
wordwrap(obj, size - 2 * (props['level'].to_i + 1))
else
obj
end
end
def get_props(obj, props)
new_props = @style[obj.class.to_s].dup || {}
if props["level"]
new_props["level"] = props["level"]
new_props["level"] += 1 unless new_props["inline"]
else
new_props["level"] = 2
end
new_props["element"] = obj.properties
new_props
end
def apply_props(blockarray, properties)
blockarray = prio0(blockarray, properties)
text = blockarray.join(properties['inline'] ? "" : "\n\n")
.gsub(/\n{2,}/, "\n\n")
prio1(text, properties)
end
def prio0(blocks, props)
::MDPP::STYLE_PRIO0.filter { |x| props.include? x[0] }.each do |style|
blocks = blocks.map.with_index do |block, index|
if style[1]
method(style[0].to_s).call(block, index + 1, props)
else
method(style[0].to_s).call(block, index + 1)
end
end
end
blocks
end
def prio1(block, props)
::MDPP::STYLE_PRIO1.filter { |x| props.include? x[0] }.each do |style|
block = if style[1]
method(style[0].to_s).call(block, props)
else
method(style[0].to_s).call(block)
end
end
block
end
end
end
options = {}
OptionParser.new do |opts|
opts.banner = <<~TEXT
MDPP - Markdown PrettyPrint based on RBMark parser
Usage: mdpp [options] <file | ->
TEXT
opts.on("-h", "--help", "Prints this help message") do
puts opts
exit 0
end
opts.on("-e", "--extension EXTENSION",
"require EXTENSION before parsing") do |libname|
require libname
end
opts.on(
"-c",
"--config CONFIG",
"try to load CONFIG (~/.config/mdpp.rb is loaded by default)"
) do |config|
# rubocop:disable Security/Eval
options.merge!(eval(File.read(config))) if File.exist?(config)
# rubocop:enable Security/Eval
end
opts.on(
"-b",
"--no-bibliography",
"Do not print bibliography (links, references, etc.) at the bottom"
) do
options["nb"] = true
end
end.parse!
# rubocop:disable Security/Eval
if File.exist?("#{ENV['HOME']}/.config/mdpp.rb")
options.merge!(eval(File.read("#{ENV['HOME']}/.config/mdpp.rb")))
end
# rubocop:enable Security/Eval
text = if ARGV[0].nil? or ARGV[0] == "-"
$stdin.read
else
File.read(ARGV[0])
end
renderer = MDPP::Renderer.new(text, options)
puts renderer.render

168
bin/mmmdpp Executable file
View File

@ -0,0 +1,168 @@
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'io/console/size'
require 'optionparser'
require 'json'
require 'mmmd'
class ParserError < StandardError
end
class OptionNavigator
def initialize
@options = {}
end
# Read a definition
# @param define [String]
def read_definition(define)
define.split(";").each do |part|
locstring, _, value = part.partition(":")
locstring = deconstruct(locstring.strip)
assign(locstring, JSON.parse(value))
end
end
attr_reader :options
private
def check_unescaped(str, index)
return true if index.zero?
reverse_index = index - 1
count = 0
while str[reverse_index] == "\\"
break if reverse_index.zero?
count += 1
reverse_index -= 1
end
count.even?
end
def find_unescaped(str, pattern, index)
found = str.index(pattern, index)
return nil unless found
until check_unescaped(str, found)
index = found + 1
found = str.index(pattern, index)
return nil unless found
end
found
end
def deconstruct(locstring)
parts = []
buffer = ""
part = nil
until locstring.empty?
case locstring[0]
when '"'
raise ParserError, 'separator missing' unless buffer.empty?
closepart = find_unescaped(locstring, '"', 1)
raise ParserError, 'unclosed string' unless closepart
buffer = locstring[0..closepart]
part = buffer[1..-2]
locstring = locstring[closepart + 1..]
when '.'
parts.append(part)
buffer = ""
part = nil
locstring = locstring[1..]
when '['
raise ParserError, 'separator missing' unless buffer.empty?
closepart = find_unescaped(locstring, ']', 1)
raise ParserError, 'unclosed index' unless closepart
buffer = locstring[0..closepart]
part = locstring[1..-2].to_i
locstring = locstring.delete_prefix(buffer)
else
raise ParserError, 'separator missing' unless buffer.empty?
buffer = locstring.match(/^[\w_]+/)[0]
part = buffer.to_sym
locstring = locstring.delete_prefix(buffer)
end
end
parts.append(part) if part
parts
end
def assign(keys, value)
current = @options
while keys.length > 1
current_key = keys.shift
unless current[current_key]
next_key = keys.first
case next_key
when Integer
current[current_key] = []
when String
current[current_key] = {}
when Symbol
current[current_key] = {}
end
end
current = current[current_key]
end
current[keys.shift] = value
end
end
options = {
include: [],
nav: OptionNavigator.new
}
parser = OptionParser.new do |opts|
opts.banner = "Usage: mmmdpp [OPTIONS] (input|-) (output|-)"
opts.on("-r", "--renderer [STRING]", String,
"Specify renderer to use for this document") do |renderer|
options[:renderer] = renderer
end
opts.on("-i", "--include [STRING]", String,
"Script to execute before rendering.\
May be specified multiple times.") do |inc|
options[:include].append(inc)
end
opts.on("-o", "--option [STRING]", String,
"Add option string. Can be repeated. Format: <key>: <JSON value>\n"\
"<key>: (<\"string\">|<symbol>|<[integer]>)"\
"[.(<\"string\"|<symbol>|<[integer]>[...]]\n"\
"Example: \"style\".\"CodeBlock\".literal.[0]: 50") do |value|
options[:nav].read_definition(value) if value
end
end
parser.parse!
unless ARGV[1]
warn parser.help
exit 1
end
Renderers = {
"HTML" => -> { ::MMMD::Renderers::HTML },
"Plainterm" => -> { ::MMMD::Renderers::Plainterm }
}.freeze
options[:include].each { |name| Kernel.load(name) }
renderer_opts = options[:nav].options
renderer_opts["hsize"] ||= IO.console_size[1]
input = ARGV[0] == "-" ? $stdin.read : File.read(ARGV[0])
output = ARGV[1] == "-" ? $stdout : File.open(ARGV[1], "w")
doc = MMMD.parse(input)
rclass = Renderers[options[:renderer] || "Plainterm"]
raise StandardError, "unknown renderer: #{options[:renderer]}" unless rclass
renderer = rclass.call.new(doc, renderer_opts)
output.puts(renderer.render)
output.close

14
lib/mmmd.rb Normal file
View File

@ -0,0 +1,14 @@
# frozen_string_literal: true
require_relative 'mmmd/blankshell'
require_relative 'mmmd/renderers'
# Extensible, multi-format markdown processor
module MMMD
# Parse a Markdown document into a DOM form
# @param doc [String]
# @return [::PointBlank::DOM::Document]
def self.parse(doc)
::PointBlank::DOM::Document.parse(doc)
end
end

1932
lib/mmmd/blankshell.rb Normal file

File diff suppressed because it is too large Load Diff

2233
lib/mmmd/entities.json Normal file

File diff suppressed because it is too large Load Diff

11
lib/mmmd/renderers.rb Normal file
View File

@ -0,0 +1,11 @@
# frozen_string_literal: true
$LOAD_PATH.append(__dir__)
module MMMD
# Renderers from Markdown to expected output format
module Renderers
autoload :HTML, 'renderers/html'
autoload :Plainterm, 'renderers/plainterm'
end
end

356
lib/mmmd/renderers/html.rb Normal file
View File

@ -0,0 +1,356 @@
# frozen_string_literal: true
require_relative "../util"
module MMMD
module Renderers
module HTMLConstants
ELEMENT_MAP = {
"PointBlank::DOM::InlinePre" => {
tag: "code",
style: "white-space: pre;"
},
"PointBlank::DOM::InlineBreak" => {
tag: "br"
},
"PointBlank::DOM::InlineStrong" => {
tag: "strong"
},
"PointBlank::DOM::InlineEmphasis" => {
tag: "em"
},
"PointBlank::DOM::InlineUnder" => {
tag: "span",
style: "text-decoration: underline;"
},
"PointBlank::DOM::InlineStrike" => {
tag: "s"
},
"PointBlank::DOM::InlineLink" => {
tag: "a",
href: true,
title: true
},
"PointBlank::DOM::InlineImage" => {
tag: "img",
src: true,
inline: true,
alt: true,
title: true
},
"PointBlank::DOM::ULBlock" => {
tag: "ul"
},
"PointBlank::DOM::OLBlock" => {
tag: "ol"
},
"PointBlank::DOM::IndentBlock" => {
tag: "pre"
},
"PointBlank::DOM::ULListElement" => {
tag: "li"
},
"PointBlank::DOM::OLListElement" => {
tag: "li"
},
"PointBlank::DOM::Paragraph" => {
tag: "p"
},
"PointBlank::DOM::SetextHeading1" => {
tag: "h1"
},
"PointBlank::DOM::SetextHeading2" => {
tag: "h2"
},
"PointBlank::DOM::ATXHeading1" => {
tag: "h1"
},
"PointBlank::DOM::ATXHeading2" => {
tag: "h2"
},
"PointBlank::DOM::ATXHeading3" => {
tag: "h3"
},
"PointBlank::DOM::ATXHeading4" => {
tag: "h4"
},
"PointBlank::DOM::ATXHeading5" => {
tag: "h5"
},
"PointBlank::DOM::ATXHeading6" => {
tag: "h6"
},
"PointBlank::DOM::Document" => {
tag: "main"
},
"PointBlank::DOM::CodeBlock" => {
tag: "pre",
outer: {
tag: "code"
}
},
"PointBlank::DOM::QuoteBlock" => {
tag: "blockquote"
},
"PointBlank::DOM::HorizontalRule" => {
tag: "hr",
inline: true
},
"PointBlank::DOM::Text" => {
sanitize: true
},
"PointBlank::DOM::InlineAutolink" => {
tag: "a",
href: true
}
}.freeze
# Class for managing styles and style overrides
class MapManager
class << self
# Define a default mapping for specified class
# @param key [String] class name
# @param mapping [Hash] mapping
# @return [void]
def define_mapping(key, mapping)
@mapping ||= ELEMENT_MAP.dup
@mapping[key] = mapping
end
# Get computed mapping
# @return [Hash]
def mapping
@mapping ||= ELEMENT_MAP.dup
end
end
def initialize(overrides)
@mapping = self.class.mapping
@mapping = @mapping.merge(overrides["mapping"]) if overrides["mapping"]
end
attr_reader :mapping
end
end
# HTML Renderer
class HTML
def initialize(dom, options)
@document = dom
@options = options
@options["linewrap"] ||= 80
@options["init_level"] ||= 2
@options["indent"] ||= 2
mapmanager = HTMLConstants::MapManager.new(options)
@mapping = mapmanager.mapping
return unless @options["nowrap"]
@options["init_level"] = 0
@mapping.delete("PointBlank::DOM::Document")
end
# Render document to HTML
def render
text = _render(@document, @options, level: @options["init_level"])
@options["init_level"].times { text = indent(text) }
if @options["nowrap"]
text
else
[
preambule,
remove_pre_spaces(text),
postambule
].join("\n")
end
end
private
# Find and remove extra spaces inbetween preformatted text
# @param string [String]
# @return [String]
def remove_pre_spaces(string)
output = []
buffer = []
open = nil
string.lines.each do |line|
opentoken = line.match?(/<pre>/)
closetoken = line.match?(/<\/pre>/)
if closetoken
open = false
buffer = strip_leading_spaces_in_buffer(buffer)
output.append(*buffer)
buffer = []
end
(open ? buffer : output).append(line)
open = true if opentoken && !closetoken
end
output.append(*buffer) unless buffer.empty?
output.join('')
end
# Strip leading spaces in the buffer
# @param lines [Array<String>]
# @return [Array<String>]
def strip_leading_spaces_in_buffer(buffer)
minprefix = buffer.map { |x| x.match(/^ */)[0] }
.min_by(&:length)
buffer.map do |line|
line.delete_prefix(minprefix)
end
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/( +|<[^>]+>)/)
output = []
line = ""
length = 0
until words.empty?
word = words.shift
wordlength = word.length
if length + wordlength + 1 > width
output.append(line.lstrip)
line = word
length = wordlength
next
end
length += wordlength
line += word
end
output.append(line.lstrip)
output.join("\n")
end
def _render(element, options, inline: false, level: 0, literaltext: false)
modeswitch = element.is_a?(::PointBlank::DOM::LeafBlock) ||
element.is_a?(::PointBlank::DOM::Paragraph)
inline ||= modeswitch
level += 1 unless inline
text = if element.children.empty?
element.content
else
literal = @mapping[element.class.name]
&.fetch(:inline, false) ||
literaltext
element.children.map do |child|
_render(child, options, inline: inline,
level: level,
literaltext: literal)
end.join(inline ? '' : "\n")
end
run_filters(text, element, level: level,
inline: inline,
modeswitch: modeswitch,
literaltext: literaltext)
end
def run_filters(text, element, level:, inline:, modeswitch:,
literaltext:)
element_style = @mapping[element.class.name]
return text unless element_style
return text if literaltext
hsize = @options["linewrap"] - (level * @options["indent"])
text = wordwrap(text, hsize) if modeswitch
if element_style[:sanitize]
text = MMMD::EntityUtils.encode_entities(text)
end
if element_style[:inline]
innerclose(element, element_style, text)
else
openclose(text, element, element_style, inline)
end
end
def openclose(text, element, element_style, inline)
opentag, closetag = construct_tags(element_style, element)
if inline
opentag + text + closetag
else
[opentag,
indent(text),
closetag].join("\n")
end
end
def innerclose(element, style, text)
props = element.properties
tag = "<#{style[:tag]}"
tag += " style=#{style[:style].inspect}" if style[:style]
tag += " href=#{read_link(element)}" if style[:href]
tag += " alt=#{text.inspect}" if style[:alt]
tag += " src=#{read_link(element)}" if style[:src]
tag += " title=#{read_title(element)}" if style[:title] && props[:title]
tag += ">"
if style[:outer]
outeropen, outerclose = construct_tags(style[:outer], element)
tag = outeropen + tag + outerclose
end
tag
end
def construct_tags(style, element)
return ["", ""] unless style && style[:tag]
props = element.properties
opentag = "<#{style[:tag]}"
closetag = "</#{style[:tag]}>"
opentag += " style=#{style[:style].inspect}" if style[:style]
opentag += " href=#{read_link(element)}" if style[:href]
opentag += " src=#{read_link(element)}" if style[:src]
opentag += " title=#{read_title(element)}" if style[:title] &&
props[:title]
opentag += ">"
if style[:outer]
outeropen, outerclose = construct_tags(style[:outer], element)
opentag = outeropen + opentag
closetag += outerclose
end
[opentag, closetag]
end
def read_title(element)
title = element.properties[:title]
title = ::MMMD::EntityUtils.encode_entities(title)
title.inspect
end
def read_link(element)
link = element.properties[:uri]
link.inspect
end
def indent(text)
text.lines.map do |line|
"#{' ' * @options["indent"]}#{line}"
end.join('')
end
def preambule
head = @options['head']
headinfo = "#{indent(<<~HEAD.rstrip)}\n " if head
<head>
#{head.is_a?(Array) ? head.join("\n") : head}
</head>
HEAD
headinfo ||= " "
@options['preambule'] or <<~TEXT.rstrip
<!DOCTYPE HTML>
<html>
#{headinfo}<body>
TEXT
end
def postambule
@options['postambule'] or <<~TEXT
</body>
</html>
TEXT
end
end
end
end

View File

@ -0,0 +1,459 @@
# frozen_string_literal: true
# Attempt to source a provider for the wide char width calculator
# (TODO)
module MMMD
# Module for managing terminal output
module TextManager
# ANSI SGR escape code for bg color
# @param text [String]
# @param options [Hash]
# @return [String]
def bg(text, options)
color = options['bg']
if color.is_a? Integer
"\e[48;5;#{color}m#{text}\e[49m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[48;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[49m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for fg color
# @param text [String]
# @param options [Hash]
# @return [String]
def fg(text, options)
color = options['fg']
if color.is_a? Integer
"\e[38;5;#{color}m#{text}\e[39m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[38;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[39m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for bold text
# @param text [String]
# @param options [Hash]
# @return [String]
def bold(text, _options)
"\e[1m#{text}\e[22m"
end
# ANSI SGR escape code for italics text
# @param text [String]
# @param options [Hash]
# @return [String]
def italics(text, _options)
"\e[3m#{text}\e[23m"
end
# ANSI SGR escape code for underline text
# @param text [String]
# @param options [Hash]
# @return [String]
def underline(text, _options)
"\e[4m#{text}\e[24m"
end
# ANSI SGR escape code for strikethrough text
# @param text [String]
# @param options [Hash]
# @return [String]
def strikethrough(text, _options)
"\e[9m#{text}\e[29m"
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/( +)/)
output = []
line = ""
length = 0
until words.empty?
word = words.shift
wordlength = smort_length(word)
if wordlength > width
words.prepend(word[width..])
word = word[..width - 1]
end
if length + wordlength + 1 > width
output.append(line.lstrip)
line = word
length = wordlength
next
end
length += wordlength
line += word
end
output.append(line.lstrip)
output.join("\n")
end
# (TODO: smorter stronger better faster)
# SmЯt™ word length
# @param text [String]
# @return [Integer]
def smort_length(text)
text.gsub(/\e\[[^m]+m/, '').length
end
# Left-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def ljust_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
textlength < size ? line + " " * (size - textlength) : line
end.join("\n")
end
# Right-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def rjust_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
textlength < size ? " " * (size - textlength) + line : line
end.join("\n")
end
# Center-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def center_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
if textlength < size
freelength = size - textlength
rightlength = freelength / 2
leftlength = freelength - rightlength
" " * leftlength + line + " " * rightlength
else
line
end
end.join("\n")
end
# Draw a screen-width box around text
# @param text [String]
# @param options [Hash]
# @return [String]
def box(text, options)
size = options[:hsize] - 2
text = wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
"#{ljust_cc(line, size)}" unless line.empty?
end.join("\n")
<<~TEXT
#{'─' * size}╮
#{text}
#{'─' * size}╯
TEXT
end
# Draw a horizontal rule
def hrule(_text, options)
size = options[:hsize]
" #{'─' * (size - 2)} "
end
# Draw text right-justified
def rjust(text, options)
size = options[:hsize]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
rjust_cc(line, size) unless line.empty?
end.join("\n")
end
# Draw text centered
def center(text, options)
size = options[:hsize]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
center_cc(line, size) unless line.empty?
end.join("\n")
end
# Underline the last line of the text piece
def underline_block(text, options)
textlines = text.lines
last = "".match(/()()()/)
textlines.each do |x|
current = x.match(/\A(\s*)(.+?)(\s*)\Z/)
last = current if smort_length(current[2]) > smort_length(last[2])
end
ltxt = last[1]
ctxt = textlines.last.slice(last.offset(2)[0]..last.offset(2)[1] - 1)
rtxt = last[3]
textlines[-1] = [ltxt, underline(ctxt, options), rtxt].join('')
textlines.join("")
end
# Add extra newlines around the text
def extra_newlines(text, options)
size = options[:hsize]
textlines = text.lines
textlines.prepend("#{' ' * size}\n")
textlines.append("\n#{' ' * size}\n")
textlines.join("")
end
# Underline last line edge to edge
def underline_full_block(text, options)
textlines = text.lines
last_line = textlines.last.match(/^.*$/)[0]
textlines[-1] = "#{underline(last_line, options)}\n"
textlines.join("")
end
# Indent all lines
def indent(text, _options)
_indent(text)
end
# Indent all lines (inner)
def _indent(text)
text.lines.map do |line|
" #{line}"
end.join("")
end
# Left overline all lines
def leftline(text, _options)
text.lines.map do |line|
"#{line}"
end.join("")
end
# Bulletpoints
def bullet(text, _options)
"-#{_indent(text)[1..]}"
end
# Numbers
def numbered(text, options)
number = options[:number]
length = number.to_s.length + 1
(length / 4 + 1).times { text = _indent(text) }
"#{number}.#{text[length..]}"
end
end
module Renderers
module PlaintermConstants
DEFAULT_STYLE = {
"PointBlank::DOM::Paragraph" => {
indent: true,
increase_level: true
},
"PointBlank::DOM::Text" => {},
"PointBlank::DOM::SetextHeading1" => {
center: true,
bold: true,
extra_newlines: true,
underline_full_block: true
},
"PointBlank::DOM::SetextHeading2" => {
center: true,
underline_block: true
},
"PointBlank::DOM::ATXHeading1" => {
center: true,
bold: true,
extra_newlines: true,
underline_full_block: true
},
"PointBlank::DOM::ATXHeading2" => {
center: true,
underline_block: true
},
"PointBlank::DOM::ATXHeading3" => {
underline: true,
bold: true
},
"PointBlank::DOM::ATXHeading4" => {
bold: true,
underline: true
},
"PointBlank::DOM::ATXHeading5" => {
underline: true
},
"PointBlank::DOM::ATXHeading6" => {
underline: true
},
"PointBlank::DOM::InlineImage" => {
underline: true
},
"PointBlank::DOM::InlineLink" => {
underline: true
},
"PointBlank::DOM::InlinePre" => {},
"PointBlank::DOM::InlineEmphasis" => {
italics: true
},
"PointBlank::DOM::InlineStrong" => {
bold: true
},
"PointBlank::DOM::ULListElement" => {
bullet: true,
increase_level: true
},
"PointBlank::DOM::OLListElement" => {
numbered: true,
increase_level: true
},
"PointBlank::DOM::QuoteBlock" => {
leftline: true,
increase_level: true
},
"PointBlank::DOM::HorizontalRule" => {
hrule: true
}
}.freeze
DEFAULT_EFFECT_PRIORITY = {
hrule: 10_500,
numbered: 10_000,
leftline: 9500,
bullet: 9000,
indent: 8500,
underline_full_block: 8000,
underline_block: 7500,
extra_newlines: 7000,
center: 6000,
rjust: 5500,
box: 5000,
underline: 4000,
italics: 3500,
bold: 3000,
fg: 2500,
bg: 2000,
strikethrough: 1500
}.freeze
# Class for managing styles and style overrides
class StyleManager
class << self
# Define a default style for specified class
# @param key [String] class name
# @param style [Hash] style
# @return [void]
def define_style(key, style)
@style ||= DEFAULT_STYLE.dup
@style[key] = style
end
# Define an effect priority value
# @param key [String] effect name
# @param priority [Integer] value of the priority
# @return [void]
def define_effect_priority(key, priority)
@effect_priority ||= DEFAULT_EFFECT_PRIORITY.dup
@effect_priority[key] = priority
end
# Get computed style
# @return [Hash]
def style
@style ||= DEFAULT_STYLE.dup
end
# Get computed effect priority
# @return [Hash]
def effect_priority
@effect_priority ||= DEFAULT_EFFECT_PRIORITY.dup
end
end
def initialize(overrides)
@style = self.class.style
@effect_priority = self.class.effect_priority
@style = @style.merge(overrides["style"]) if overrides["style"]
end
attr_reader :style, :effect_priority
end
end
# Primary document renderer
class Plainterm
include ::MMMD::TextManager
# @param input [String]
# @param options [Hash]
def initialize(input, options)
@doc = input
@color_mode = options.fetch("color", true)
@ansi_mode = options.fetch("ansi", true)
style_manager = PlaintermConstants::StyleManager.new(options)
@style = style_manager.style
@effect_priority = style_manager.effect_priority
@effects = @effect_priority.to_a.sort_by(&:last).map(&:first)
@options = options
@options["hsize"] ||= 80
end
# Return rendered text
# @return [String]
def render
_render(@doc, @options)
end
private
def _render(element, options, inline: false, level: 0, index: 0)
modeswitch = element.is_a?(::PointBlank::DOM::LeafBlock) ||
element.is_a?(::PointBlank::DOM::Paragraph)
inline ||= modeswitch
level += calculate_level_increase(element)
text = if element.children.empty?
element.content
else
element.children.map.with_index do |child, index|
_render(child, options, inline: inline,
level: level,
index: index)
end.join(inline ? '' : "\n\n")
end
run_filters(text, element, level: level,
modeswitch: modeswitch,
index: index)
end
def run_filters(text, element, level:, modeswitch:, index:)
element_style = @style[element.class.name]
return text unless element_style
hsize = @options["hsize"] - (4 * level)
text = wordwrap(text, hsize) if modeswitch
params = element_style.dup
params[:hsize] = hsize
params[:number] = index + 1
@effects.each do |effect|
text = method(effect).call(text, params) if element_style[effect]
end
text
end
def calculate_level_increase(element)
level = 0
element_style = @style[element.class.name]
level += 1 if element_style && element_style[:increase_level]
level
end
end
end
end

61
lib/mmmd/util.rb Normal file
View File

@ -0,0 +1,61 @@
# frozen_string_literal: true
require 'json'
module MMMD
# Utils for working with entities in strings
module EntityUtils
ENTITY_DATA = JSON.parse(File.read("#{__dir__}/entities.json"))
# Decode html entities in string
# @param string [String]
# @return [String]
def self.decode_entities(string)
string = string.gsub(/&#\d{1,7};/) do |match|
match[1..-2].to_i.chr("UTF-8")
end
string = string.gsub(/&#[xX][\dA-Fa-f]{1,6};/) do |match|
match[3..-2].to_i(16).chr("UTF-8")
end
string.gsub(/&\w+;/) do |match|
ENTITY_DATA[match] ? ENTITY_DATA[match]["characters"] : match
end
end
# Encode unsafe html entities in string (ASCII-compatible)
# @param string [String]
# @return [String]
# @sg-ignore
def self.encode_entities_ascii(string)
string.gsub("&", "&amp;")
.gsub("<", "&lt;")
.gsub(">", "&gt;")
.gsub('"', "&quot;")
.gsub("'", "&#39;")
.gsub(/[^\x00-\x7F]/) do |match|
"&#x#{match.codepoints[0]};"
end
end
# Encode unsafe html entities in string
# @param string [String]
# @return [String]
# @sg-ignore
def self.encode_entities(string)
string.gsub("&", "&amp;")
.gsub("<", "&lt;")
.gsub(">", "&gt;")
.gsub('"', "&quot;")
.gsub("'", "&#39;")
end
# Encode uri components that may break HTML syntax
# @param string [String]
# @return [String]
def self.encode_uri(string)
string.gsub('"', "%22")
.gsub("'", "%27")
.gsub(" ", "%20")
end
end
end

View File

@ -1,634 +0,0 @@
# frozen_string_literal: true
module RBMark
# Module for representing parsing-related constructs
module Parsing
# Abstract scanner interface implementation
class Scanner
def initialize
@variants = []
end
# Scan text
# @param text [String]
# @return [Array<RBMark::DOM::DOMObject>]
def scan(_text)
raise StandardError, "Abstract method called"
# ...
end
attr_accessor :variants
end
# Line-level scanner for blocks
class LineScanner < Scanner
# (see ::RBMark::Parsing::Scanner#scan)
def scan(text, buffer: "", blocks: [], mode: nil)
prepare
lines = text.lines
lines.each_with_index do |line, index|
buffer += line
ahead = lines.fetch(index + 1, nil)
blocks, buffer, mode = try_begin(line,
blocks,
buffer,
mode,
lookahead: ahead)
if mode&.end?(line, lookahead: ahead, blocks: blocks, buffer: buffer)
blocks, buffer, mode = flush(blocks, buffer, mode)
end
end
flush(blocks, buffer, mode)[0]
end
# Predict mode for given line
# @param line [String]
# @return [Object]
def select_mode(line, **message)
@variants.find do |variant|
variant[0].begin?(line, **message)
end&.at(0)
end
private
# Attempt to open a new mode and, if possible, call :begin to prepare the block
def try_begin(line, blocks, buffer, mode, lookahead: nil)
return blocks, buffer, mode if mode
mode = select_mode(line, lookahead: lookahead,
blocks: blocks,
buffer: buffer)
blocks.append(mode.begin(line)) if mode.respond_to?(:begin)
[blocks, buffer, mode]
end
# Assign self as parent to all variants
# @return [void]
def prepare
@variants.each do |variant|
unless variant[0].is_a? ::RBMark::Parsing::BlockVariant
raise StandardError, "#{variant} is not a BlockVariant"
end
variant[0].parent = self
end
@variants.sort_by!(&:last)
end
# Flush the buffer using given mode
# @param blocks [Array<RBMark::DOM::DOMObject>]
# @param buffer [String]
# @param mode [Object]
# @return [Array(Array<RBMark::DOM::DOMObject>, String, ::RBMark::Parsing::Variant)]
def flush(blocks, buffer, mode)
return blocks, "" if buffer == ""
mode.end(blocks.last, buffer) if mode.respond_to?(:end)
blocks.append(mode.flush(buffer)) if mode.respond_to?(:flush)
if mode.respond_to?(:restructure)
blocks, buffer, mode = mode.restructure(blocks, buffer, mode)
else
buffer = ""
mode = nil
end
[blocks, buffer, mode]
end
end
# Abstract variant interface
class Variant
end
# Abstract block-level variant
class BlockVariant < Variant
# Check if a block begins on this line
# @param line [String]
# @param opts [Hash] options hash
# @option [String, nil] :lookahead next line over
# @option [Array<::RBMark::Parsing::BlockVariant>] :blocks current list of blocks
# @option [String] :buffer current state of buffer
# @return [Boolean]
def begin?(_line, **_opts)
raise StandardError, "Abstract method called"
end
# Check if a block ends on this line
# @param line [String]
# @param opts [Hash] options hash
# @option [String, nil] :lookahead next line over
# @option [Array<::RBMark::Parsing::BlockVariant>] :blocks current list of blocks
# @option [String] :buffer current state of buffer
# @return [Boolean]
def end?(_line, **_opts)
raise StandardError, "Abstract method called"
end
# @!method begin(buffer)
# Open a block to be later filled in by BlockVariant#end
# @param buffer [String]
# @return [::RBMark::DOM::DOMObject]
# @!method end(block, buffer)
# Finalize a block opened by begin
# @param buffer [String]
# @return [void]
# @!method flush(buffer)
# Flush buffer and create a new DOM object
# @param buffer [String]
# @return [::RBMark::DOM::DOMObject]
# @!method restructure(blocks, buffer, mode)
# Restructure current set of blocks (if method is defined)
# @param blocks [Array<::RBMark::DOM::DOMObject>]
# @param buffer [String]
# @param mode [::RBMark::Parsing::Variant]
# @return [Array(Array<RBMark::DOM::DOMObject>, String, ::RBMark::Parsing::Variant)]
attr_accessor :parent
end
# Paragraph breaking variant
class BreakerVariant < BlockVariant
# Check that a paragraph matches the breaker
# @param buffer [String]
# @return [Class, nil]
def match(_buffer)
raise StandardError, "Abstract method called"
end
# @!method preprocess(buffer)
# preprocess buffer
# @param buffer [String]
# @return [String]
end
# Paragraph variant
class ParagraphVariant < BlockVariant
# (see BlockVariant#begin?)
def begin?(line, **_opts)
line.match?(/\S/)
end
# (see BlockVariant#end?)
def end?(line, lookahead: nil, **_opts)
return true if check_paragraph_breakers(line)
if lookahead
return false if check_paragraph_breakers(lookahead)
return false if lookahead.match(/^ {4}/)
!parent.select_mode(lookahead).is_a?(self.class)
else
true
end
end
# (see BlockVariant#flush)
# @sg-ignore
def flush(buffer)
dom_class = nil
breaker = parent.variants.find do |x|
x[0].is_a?(::RBMark::Parsing::BreakerVariant) &&
(dom_class = x[0].match(buffer))
end&.first
buffer = breaker.preprocess(buffer) if breaker.respond_to?(:preprocess)
(dom_class or ::RBMark::DOM::Paragraph).parse(buffer.strip)
end
private
def check_paragraph_breakers(line)
breakers = parent.variants.filter_map do |x|
x[0] if x[0].is_a? ::RBMark::Parsing::BreakerVariant
end
breakers.any? { |x| x.begin?(line, breaks_paragraph: true) }
end
end
# Thematic break variant
class ThematicBreakVariant < BlockVariant
# (see BlockVariant#begin?)
def begin?(line, **_opts)
line.match?(/^(?:[- ]{3,}|[_ ]{3,}|[* ]{3,})$/) &&
line.match?(/^ {0,3}[-_*]/) &&
(
line.count("-") >= 3 ||
line.count("_") >= 3 ||
line.count("*") >= 3
)
end
# (see BlockVariant#end?)
def end?(_line, **_opts)
true
end
# (see BlockVariant#flush)
def flush(_buffer)
::RBMark::DOM::HorizontalRule.new
end
end
# ATX Heading variant
class ATXHeadingVariant < BlockVariant
# (see BlockVariant#begin?)
def begin?(line, **_opts)
line.match?(/^ {0,3}\#{1,6}(?: .*|)$/)
end
# (see BlockVariant#end?)
def end?(_line, **_opts)
true
end
# (see BlockVariant#flush)
def flush(buffer)
lvl, content = buffer.match(/^ {0,3}(\#{1,6})( .*|)$/)[1..2]
content = content.gsub(/( #+|)$/, "")
heading(lvl).parse(content.strip)
end
private
def heading(lvl)
case lvl.length
when 1 then ::RBMark::DOM::Heading1
when 2 then ::RBMark::DOM::Heading2
when 3 then ::RBMark::DOM::Heading3
when 4 then ::RBMark::DOM::Heading4
when 5 then ::RBMark::DOM::Heading5
when 6 then ::RBMark::DOM::Heading6
end
end
end
# Paragraph closing variant
class BlankSeparator < BreakerVariant
# (see BlockVariant#begin?)
def begin?(line, breaks_paragraph: nil, **_opts)
breaks_paragraph &&
line.match?(/^ {0,3}$/)
end
# (see BlockVariant#end?)
def end?(_line, **_opts)
true
end
# (see BreakerVariant#match)
def match(_buffer)
nil
end
end
# Setext heading variant
class SetextHeadingVariant < BreakerVariant
# (see BlockVariant#begin?)
def begin?(line, breaks_paragraph: nil, **_opts)
breaks_paragraph &&
line.match?(/^ {0,3}(?:-+|=+) *$/)
end
# (see BlockVariant#end?)
def end?(_line, **_opts)
true
end
# (see BreakerVariant#match)
def match(buffer)
return nil unless preprocess(buffer).match(/\S/)
heading(buffer.lines.last)
end
# (see BreakerVariant#preprocess)
def preprocess(buffer)
buffer.lines[..-2].join
end
private
def heading(buffer)
case buffer
when /^ {0,3}-+ *$/ then ::RBMark::DOM::Heading2
when /^ {0,3}=+ *$/ then ::RBMark::DOM::Heading1
end
end
end
# Indented literal block variant
class IndentedBlockVariant < BlockVariant
# (see BlockVariant#begin?)
def begin?(line, **_opts)
line.match?(/^(?: {4}|\t)/)
end
# (see BlockVariant#end?)
def end?(_line, lookahead: nil, **_opts)
!lookahead&.match?(/^(?: {4}.*|\s*|\t)$/)
end
# (see BlockVariant#flush)
def flush(buffer)
text = buffer.lines.map { |x| "#{x.chomp[4..]}\n" }.join
block = ::RBMark::DOM::IndentBlock.new
block.content = text # TODO: replace this with inline text
block
end
end
# Fenced code block
class FencedCodeBlock < BlockVariant
# (see BlockVariant#begin?)
def begin?(line, **_opts)
line.match?(/^(?:`{3,}[^`]*|~{3,}.*)$/)
end
# (see BlockVariant#end?)
def end?(line, blocks: nil, buffer: nil, **_opts)
buffer.lines.length > 1 and
line.strip == blocks.last.properties[:expected_closer]
end
# (see BlockVariant#begin)
def begin(buffer)
block = ::RBMark::DOM::CodeBlock.new
block.properties[:expected_closer] = buffer.match(/^(?:`{3,}|~{3,})/)[0]
block.properties[:infostring] = buffer.match(/^(?:`{3,}|~{3,})(.*)$/)[1]
.strip
block
end
# (see BlockVariant#end)
def end(block, buffer)
# TODO: replace this with inline text
block.properties.delete(:expected_closer)
block.content = buffer.lines[1..-2].join
end
end
end
# Module for representing abstract object hierarchy
module DOM
# Abstract container
class DOMObject
class << self
# Hook for initializing variables
# @param subclass [Class]
def inherited(subclass)
super
@subclasses ||= []
@subclasses.append(subclass)
subclass.variants = @variants.dup
subclass.variants ||= []
subclass.atomic_mode = @atomic_mode
subclass.scanner_class = @scanner_class
end
# Add potential sub-element variant
# @param cls [Class] DOMObject subclass
def variant(cls, prio: 1)
unless cls < ::RBMark::Parsing::Variant
raise StandardError, "#{cls} is not a DOMObject subclass"
end
@variants.append([cls, prio])
@subclasses&.each do |subclass|
subclass.variant(cls, prio)
end
end
# Set scanner class
# @param cls [Class] DOMObject subclass
def scanner(cls)
unless cls < ::RBMark::Parsing::Scanner
raise StandardError, "#{cls} is not a Scanner subclass"
end
@scanner_class = cls
@subclasses&.each do |subclass|
subclass.scanner(cls)
end
end
# Prepare scanner and variants
# @return [void]
def prepare
return if @prepared
@scanner = @scanner_class.new
@scanner.variants = @variants.map { |x| [x[0].new, x[1]] }
end
# Parse text from the given context
# @param text [String]
# @return [self]
def parse(text)
prepare unless @atomic_mode
instance = new
if @atomic_mode
instance.content = text
else
instance.append(*@scanner.scan(text))
end
instance
end
# Create a new instance of class or referenced class
# @return [self, Class]
def create
if @alt_for
@alt_for.new
else
new
end
end
# Set the atomic flag
# @return [void]
def atomic
@atomic_mode = true
end
attr_accessor :variants, :scanner_class, :alt_for, :atomic_mode
end
def initialize
@content = nil
@children = []
@properties = {}
end
# Set certain property in the properties hash
# @param properties [Hash] proeprties to update
def property(**properties)
@properties.update(**properties)
end
# Add child to container
# @param child [DOMObject]
def append(*children)
unless children.all? { |x| x.is_a? DOMObject }
raise StandardError, "one of #{children.inspect} is not a DOMObject"
end
@children.append(*children)
end
# Insert a child into the container
# @param child [DOMObject]
# @param index [Integer]
def insert(index, child)
raise StandardError, "not a DOMObject" unless child.is_a? DOMObject
@children.insert(index, child)
end
# Delete a child from container
# @param index [Integer]
def delete_at(index)
@children.delete_at(index)
end
# Get a child from the container
# @param key [Integer]
def [](key)
@children[key]
end
# Set text content of a DOMObject
# @param text [String]
def content=(text)
raise StandardError, "not a String" unless text.is_a? String
@content = text
end
# Get text content of a DOMObject
# @return [String, nil]
attr_reader :content, :children, :properties
end
# Inline text
class Text < DOMObject
end
# Inline preformatted text
class InlinePre < DOMObject
end
# Infline formattable text
class InlineFormattable < DOMObject
atomic
end
# Bold text
class InlineBold < InlineFormattable
end
# Italics text
class InlineItalics < InlineFormattable
end
# Inline italics text (alternative)
class InlineAltItalics < InlineFormattable
end
# Underline text
class InlineUnder < InlineFormattable
end
# Strikethrough text
class InlineStrike < InlineFormattable
end
# Hyperreferenced text
class InlineLink < InlineFormattable
end
# Image
class InlineImage < InlinePre
end
# Linebreak
class InlineBreak < DOMObject
end
# Document root
class Document < DOMObject
scanner ::RBMark::Parsing::LineScanner
variant ::RBMark::Parsing::ATXHeadingVariant
variant ::RBMark::Parsing::ThematicBreakVariant
variant ::RBMark::Parsing::SetextHeadingVariant
variant ::RBMark::Parsing::IndentedBlockVariant
variant ::RBMark::Parsing::FencedCodeBlock
variant ::RBMark::Parsing::BlankSeparator, prio: 9998
variant ::RBMark::Parsing::ParagraphVariant, prio: 9999
end
# Paragraph in a document (separated by 2 newlines)
class Paragraph < InlineFormattable
atomic
end
# Heading level 1
class Heading1 < InlineFormattable
end
# Heading level 2
class Heading2 < Heading1
end
# Heading level 3
class Heading3 < Heading1
end
# Heading level 4
class Heading4 < Heading1
end
# Heading level 5
class Heading5 < Heading1
end
# Heading level 6
class Heading6 < Heading1
end
# Preformatted code block
class CodeBlock < DOMObject
end
# Quote block
class QuoteBlock < Document
end
# Table
class TableBlock < DOMObject
end
# List element
class ListElement < Document
end
# Unordered list
class ULBlock < DOMObject
end
# Ordered list block
class OLBlock < DOMObject
end
# Indent block
class IndentBlock < DOMObject
end
# Horizontal rule
class HorizontalRule < DOMObject
atomic
end
end
end

View File

@ -1,9 +0,0 @@
# frozen_string_literal: true
module RBMark
# Renderers from Markdown to expected output format
module Renderers
end
end
require_relative 'renderers/html'

View File

@ -1,132 +0,0 @@
# frozen_string_literal: true
require 'rbmark'
module RBMark
module Renderers
# HTML Renderer
class HTML
ELEMENT_MAP = {
"RBMark::DOM::InlinePre" => {
tag: "code",
inline: true
},
"RBMark::DOM::InlineBreak" => {
tag: "br",
inline: true
},
"RBMark::DOM::InlineBold" => {
tag: "strong",
inline: true
},
"RBMark::DOM::InlineItalics" => {
tag: "em",
inline: true
},
"RBMark::DOM::InlineUnder" => {
tag: "span",
inline: true,
style: "text-decoration: underline;"
},
"RBMark::DOM::InlineStrike" => {
tag: "s",
inline: true
},
"RBMark::DOM::InlineLink" => {
tag: "link",
href: true,
inline: true
},
"RBMark::DOM::InlineImage" => {
tag: "img",
src: true,
inline: true
},
"RBMark::DOM::ULBlock" => {
tag: "ul"
},
"RBMark::DOM::OLBlock" => {
tag: "ol"
},
"RBMark::DOM::IndentBlock" => {
tag: "pre"
},
"RBMark::DOM::ListElement" => {
tag: "li"
},
"RBMark::DOM::Paragraph" => {
tag: "p"
},
"RBMark::DOM::Heading1" => {
tag: "h1"
},
"RBMark::DOM::Heading2" => {
tag: "h2"
},
"RBMark::DOM::Heading3" => {
tag: "h3"
},
"RBMark::DOM::Heading4" => {
tag: "h4"
},
"RBMark::DOM::Heading5" => {
tag: "h5"
},
"RBMark::DOM::Heading6" => {
tag: "h6"
},
"RBMark::DOM::Document" => {
tag: "main"
},
"RBMark::DOM::CodeBlock" => {
tag: "pre",
outer: {
tag: "code"
}
},
"RBMark::DOM::QuoteBlock" => {
tag: "blockquote"
},
"RBMark::DOM::HorizontalRule" => {
tag: "hr"
},
"RBMark::DOM::Text" => nil
}.freeze
def initialize(dom, options)
@document = dom
@options = options
end
# Render document to HTML
def render
preambule if @options['preambule']
_render(@document, indent = 2)
postambule if @options['postambule']
end
private
def _render(element, indent = 0)
def preambule
@options['preambule'] or <<~TEXT
<!DOCTYPE HTML>
<html>
<head>
#{@document['head']}
</head>
<body>
TEXT
end
def postambule
@options['postambule'] or <<~TEXT
</body>
</html>
TEXT
end
end
end
end

21
mmmd.gemspec Normal file
View File

@ -0,0 +1,21 @@
# frozen_string_literal: true
Gem::Specification.new do |spec|
spec.name = "mmmd"
spec.version = "0.1.2"
spec.summary = "Modular, compliant Markdown processor"
spec.description = <<~DESC
MMMD (short for Mark My Manuscript Down) is a Markdown processor
(as in "parser and translator") with a CLI interface utility and
multiple modes of output (currently HTML and terminal).
DESC
spec.authors = ["Yessiest"]
spec.license = "AGPL-3.0-or-later"
spec.email = "yessiest@text.512mb.org"
spec.homepage = "https://adastra7.net/git/Yessiest/rubymark"
spec.files = Dir["lib/**/*"]
spec.bindir = Dir["bin"]
spec.executables << "mmmdpp"
spec.extra_rdoc_files = Dir["*.md"]
spec.required_ruby_version = ">= 3.0.0"
end

View File

@ -1,21 +0,0 @@
# frozen_string_literal: true
Gem::Specification.new do |s|
s.name = 'rbmark'
s.version = '0.5'
s.summary = <<~SUMMARY
Modular, extensible, HTML-agnostic Markdown parser
SUMMARY
s.description = <<~TEXT
RBMark is a Markdown parser that represents Markdown in a DOM-like
object structure, allowing for other interfaces to produce more
complex translators from Markdown to any given format.
TEXT
s.authors = ['yessiest']
s.email = 'yessiest@text.512mb.org'
s.license = 'Apache-2.0'
s.homepage = 'https://adastra7.net/git/Yessiest/rubymark'
s.files = Dir['lib/**/*.rb'] + Dir['bin/*']
s.required_ruby_version = '>= 3.0.0'
s.executables = ['mdpp']
end

21
security.md Normal file
View File

@ -0,0 +1,21 @@
Security acknowledgements
=========================
While special care has been taken to prevent some of the more common common
vulnerabilities that might arise from using this parser, it does not prevent
certain issues which **which should be acknowledged**.
- It is possible to inject a form of one-click XSS into the website. In
particular, there are no restrictions placed on urls embedded within the links
(as per the description of CommonMark specification). As such, something as
simple as `[test](<javascript:dangerous code here>)` would be more than enough
to employ such an exploit.
- While generally speaking the parser acts stable on most tests, and precents
stray HTML tokens from occuring in the output text where appropriate, due to
the nontrivial nature of the task some form of XSS injection may or may not
occur. If such an incident occurs, please report it to the current maintainer
of the project.
- User input should NOT be trusted when it comes to applying options to
rendering. Some renderers, such as the HTML renderer, allow modifying the
style parameter for rendered tags, which when passed control of to an
untrusted party may become an XSS attack vector.

View File

@ -1,102 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test ATX Heading parsing compliance with CommonMark v0.31.2
class TestATXHeadings < Minitest::Test
def test_simple_heading1
doc = ::RBMark::DOM::Document.parse(<<~DOC)
# ATX Heading level 1
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_simple_heading2
doc = ::RBMark::DOM::Document.parse(<<~DOC)
## ATX Heading level 2
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
end
def test_simple_heading3
doc = ::RBMark::DOM::Document.parse(<<~DOC)
### ATX Heading level 3
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading3, doc.children[0])
end
def test_simple_heading4
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#### ATX Heading level 4
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading4, doc.children[0])
end
def test_simple_heading5
doc = ::RBMark::DOM::Document.parse(<<~DOC)
##### ATX Heading level 5
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading5, doc.children[0])
end
def test_simple_heading6
doc = ::RBMark::DOM::Document.parse(<<~DOC)
###### ATX Heading level 6
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading6, doc.children[0])
end
def test_simple_not_a_heading
doc = ::RBMark::DOM::Document.parse(<<~DOC)
####### NOT a heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_breaking_paragrpah
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Paragraph 1
# ATX Heading level 1
Paragraph 2
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[2])
end
def test_heading_sans_space
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#NOT an ATX heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_heading_escaped
doc = ::RBMark::DOM::Document.parse(<<~DOC)
\\# Escaped ATX heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#### Heading level 4
### Heading level 3
## Heading level 2
# Heading level 1
# NOT a heading
DOC
assert_instance_of(::RBMark::DOM::Heading4, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading3, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[2])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[3])
refute_instance_of(::RBMark::DOM::Heading1, doc.children[4])
end
end

View File

@ -1,147 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test Setext Heading parsing compliance with CommonMark v0.31.2
class TestSetextHeadings < Minitest::Test
def test_simple_heading1
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo *bar*
=========
Foo *bar*
---------
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[1])
end
def test_multiline_span
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo *bar
baz*
====
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
assert_equal(1, doc.children.length)
end
def test_span_inlining
doc = ::RBMark::DOM::Document.parse(<<~DOC)
start
Foo *bar
baz
====
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
skip
end
def test_line_length
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
------------------------------
Foo
=
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
end
def test_content_indent
skip # TODO: implement this
end
def test_marker_indent
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
------------------------------
Foo
=
Foo
=
Foo
=
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[2])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[3])
end
def test_no_internal_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
-- - -
Foo
== =
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
refute_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_block_level_priority
doc = ::RBMark::DOM::Document.parse(<<~DOC)
` Foo
------
`
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[1])
end
def test_paragraph_breaking_only
doc = ::RBMark::DOM::Document.parse(<<~DOC)
> text
------
DOC
skip # TODO: implement this
end
def test_paragraph_breaking_only_lazy_continuation
doc = ::RBMark::DOM::Document.parse(<<~DOC)
> text
continuation line
------
DOC
skip # TODO: implement this
end
def test_headings_back_to_back
doc = ::RBMark::DOM::Document.parse(<<~DOC)
heading1
------
heading2
------
heading3
======
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[2])
end
def test_no_empty_headings
doc = ::RBMark::DOM::Document.parse(<<~DOC)
======
DOC
refute_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_thematic_breaks
doc = ::RBMark::DOM::Document.parse(<<~DOC)
----
----
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
refute_instance_of(::RBMark::DOM::Heading2, doc.children[1])
end
end

View File

@ -1,102 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test ATX Heading parsing compliance with CommonMark v0.31.2
class TestATXHeadings < Minitest::Test
def test_simple_heading1
doc = ::RBMark::DOM::Document.parse(<<~DOC)
# ATX Heading level 1
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_simple_heading2
doc = ::RBMark::DOM::Document.parse(<<~DOC)
## ATX Heading level 2
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
end
def test_simple_heading3
doc = ::RBMark::DOM::Document.parse(<<~DOC)
### ATX Heading level 3
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading3, doc.children[0])
end
def test_simple_heading4
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#### ATX Heading level 4
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading4, doc.children[0])
end
def test_simple_heading5
doc = ::RBMark::DOM::Document.parse(<<~DOC)
##### ATX Heading level 5
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading5, doc.children[0])
end
def test_simple_heading6
doc = ::RBMark::DOM::Document.parse(<<~DOC)
###### ATX Heading level 6
Paragraph
DOC
assert_instance_of(::RBMark::DOM::Heading6, doc.children[0])
end
def test_simple_not_a_heading
doc = ::RBMark::DOM::Document.parse(<<~DOC)
####### NOT a heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_breaking_paragrpah
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Paragraph 1
# ATX Heading level 1
Paragraph 2
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[2])
end
def test_heading_sans_space
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#NOT an ATX heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_heading_escaped
doc = ::RBMark::DOM::Document.parse(<<~DOC)
\\# Escaped ATX heading
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
end
def test_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
#### Heading level 4
### Heading level 3
## Heading level 2
# Heading level 1
# NOT a heading
DOC
assert_instance_of(::RBMark::DOM::Heading4, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading3, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[2])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[3])
refute_instance_of(::RBMark::DOM::Heading1, doc.children[4])
end
end

View File

@ -1,97 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test Setext Heading parsing compliance with CommonMark v0.31.2
class TestSetextHeadings < Minitest::Test
def test_simple_indent
doc = ::RBMark::DOM::Document.parse(<<~DOC)
text
indented code block
without space mangling
int main() {
printf("Hello world!\\n");
}
DOC
assert_instance_of(::RBMark::DOM::IndentBlock, doc.children[1])
end
def test_list_item_precedence
skip # TODO: implement this
end
def test_numbered_list_item_precednce
skip # TODO: implement this
end
def test_check_indent_contents
skip # TODO: yet again please implement this at some point thanks
end
def test_long_chunk
doc = ::RBMark::DOM::Document.parse(<<~DOC)
text
indented code block
without space mangling
int main() {
printf("Hello world!\\n");
}
there are many space changes here and blank lines that
should *NOT* affect the way this is parsed
DOC
assert_instance_of(::RBMark::DOM::IndentBlock, doc.children[1])
end
def test_does_not_interrupt_paragraph
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Paragraph begins here
paragraph does the stupid wacky shit that somebody thinks is very funny
paragraph keeps doing that shit
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
assert_equal(1, doc.children.length)
end
def test_begins_at_first_sight_of_four_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
text
This is an indent block
This is a paragraph
DOC
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[0])
assert_instance_of(::RBMark::DOM::IndentBlock, doc.children[1])
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[2])
end
def test_interrupts_all_other_blocks
doc = ::RBMark::DOM::Document.parse(<<~DOC)
# Heading
foo
Heading
------
foo
----
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
assert_instance_of(::RBMark::DOM::IndentBlock, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[2])
assert_instance_of(::RBMark::DOM::IndentBlock, doc.children[3])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[4])
end
def test_check_blank_lines_contents
skip # TODO: PLEASE I FUCKING BEG YOU IMPLEMENT THIS
end
def test_check_contents_trailing_spaces
skip # TODO: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
end
end

View File

@ -1,147 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test Setext Heading parsing compliance with CommonMark v0.31.2
class TestSetextHeadings < Minitest::Test
def test_simple_heading1
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo *bar*
=========
Foo *bar*
---------
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[1])
end
def test_multiline_span
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo *bar
baz*
====
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[0])
assert_equal(1, doc.children.length)
end
def test_span_inlining
doc = ::RBMark::DOM::Document.parse(<<~DOC)
start
Foo *bar
baz
====
DOC
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
skip
end
def test_line_length
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
------------------------------
Foo
=
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
end
def test_content_indent
skip # TODO: implement this
end
def test_marker_indent
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
------------------------------
Foo
=
Foo
=
Foo
=
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[2])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[3])
end
def test_no_internal_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
-- - -
Foo
== =
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
refute_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_block_level_priority
doc = ::RBMark::DOM::Document.parse(<<~DOC)
` Foo
------
`
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Paragraph, doc.children[1])
end
def test_paragraph_breaking_only
doc = ::RBMark::DOM::Document.parse(<<~DOC)
> text
------
DOC
skip # TODO: implement this
end
def test_paragraph_breaking_only_lazy_continuation
doc = ::RBMark::DOM::Document.parse(<<~DOC)
> text
continuation line
------
DOC
skip # TODO: implement this
end
def test_headings_back_to_back
doc = ::RBMark::DOM::Document.parse(<<~DOC)
heading1
------
heading2
------
heading3
======
DOC
assert_instance_of(::RBMark::DOM::Heading2, doc.children[0])
assert_instance_of(::RBMark::DOM::Heading2, doc.children[1])
assert_instance_of(::RBMark::DOM::Heading1, doc.children[2])
end
def test_no_empty_headings
doc = ::RBMark::DOM::Document.parse(<<~DOC)
======
DOC
refute_instance_of(::RBMark::DOM::Heading1, doc.children[0])
end
def test_thematic_breaks
doc = ::RBMark::DOM::Document.parse(<<~DOC)
----
----
DOC
refute_instance_of(::RBMark::DOM::Heading2, doc.children[0])
refute_instance_of(::RBMark::DOM::Heading2, doc.children[1])
end
end

View File

@ -1,127 +0,0 @@
# frozen_string_literal: true
require 'minitest/autorun'
require_relative '../lib/rbmark'
# Test thematic break parsing compliance with CommonMark v0.31.2
class TestThematicBreaks < Minitest::Test
def test_simple
doc = ::RBMark::DOM::Document.parse(<<~DOC)
---
***
___
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[2])
end
def test_simple_invalid
doc = ::RBMark::DOM::Document.parse(<<~DOC)
+++
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
doc = ::RBMark::DOM::Document.parse(<<~DOC)
===
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
end
def test_simple_less_characters
doc = ::RBMark::DOM::Document.parse(<<~DOC)
--
**
__
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[2])
end
def test_indentation
doc = ::RBMark::DOM::Document.parse(<<~DOC)
***
***
***
***
***
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[2])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[3])
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[4])
end
def test_indentation_mixed_classes
doc = ::RBMark::DOM::Document.parse(<<~DOC)
Foo
***
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children.last)
end
def test_line_length
doc = ::RBMark::DOM::Document.parse(<<~DOC)
_________________________________
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
end
def test_mixed_spaces
doc = ::RBMark::DOM::Document.parse(<<~DOC)
- - -
** * ** * ** * **
- - - -
- - - -
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[2])
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[3])
end
def test_mixed_characters
doc = ::RBMark::DOM::Document.parse(<<~DOC)
_ _ _ _ a
a------
---a---
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[2])
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[3])
end
def test_mixed_markets
doc = ::RBMark::DOM::Document.parse(<<~DOC)
*-*
DOC
refute_instance_of(::RBMark::DOM::HorizontalRule, doc.children[0])
end
def test_interrupt_list
doc = ::RBMark::DOM::Document.parse(<<~DOC)
- foo
***
- bar
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
end
def test_interrupt_paragraph
doc = ::RBMark::DOM::Document.parse(<<~DOC)
foo
***
bar
DOC
assert_instance_of(::RBMark::DOM::HorizontalRule, doc.children[1])
end
end

21
view_structure.rb Normal file
View File

@ -0,0 +1,21 @@
# frozen_string_literal: true
require_relative 'lib/mmmd/blankshell.rb'
structure = PointBlank::DOM::Document.parse(File.read(ARGV[0]))
def red(string)
"\033[31m#{string}\033[0m"
end
def yellow(string)
"\033[33m#{string}\033[0m"
end
def prettyprint(doc, indent = 0)
closed = doc.properties[:closed]
puts "#{yellow(doc.class.name.gsub(/\w+::DOM::/,""))}#{red(closed ? "(c)" : "")}: #{doc.content.inspect}"
doc.children.each do |child|
print red("#{" " * indent} - ")
prettyprint(child, indent + 4)
end
end
prettyprint(structure)