Compare commits

...

No commits in common. "master" and "a022377f08680f09fd53cc10685038b8a17abe97" have entirely different histories.

20 changed files with 357 additions and 6261 deletions

4
.gitignore vendored
View File

@ -1,4 +0,0 @@
/lib/example.md
/*.gem
/.yardoc
test*

22
.solargraph.yml Normal file
View File

@ -0,0 +1,22 @@
---
include:
- "**/*.rb"
exclude:
- spec/**/*
- test/**/*
- vendor/**/*
- ".bundle/**/*"
require: ["minitest"]
domains: []
reporters:
- rubocop
- require_not_found
formatter:
rubocop:
cops: safe
except: []
only: []
extra_args: []
require_paths: []
plugins: []
max_files: 5000

View File

@ -1,59 +0,0 @@
MMMD (Mark My Message Down)
============
(Originally titled Rubymark)
Modular, compliant Markdown parser in Ruby
Installation
------------
This package is available as a gem over at
[rubygems.org](https://rubygems.org/gems/mmmd).
Installing it is as simple as executing `gem install mmmd`
Usage
-----
This package is generally intended as a library, but it also
includes a CLI tool which permits the usage of the library
for simple document translation tasks.
Examples:
```sh
# Render the file in a terminal-oriented format
$ mmmdpp file.md -
# Read the markdown contents directly from input, output to stdout
$ external-program | mmmdpp - -
# Render file.md to a complete webpage
$ mmmdpp -r HTML file.md file.html
# Render file.md into a complete webpage and add extra tags to head and
# wrap all images with a figure tag with figcaption containing image title
$ mmmdpp -r HTML -o '"head": ["<meta charset=\"UTF-8\">", "<style>img { max-width: 90%; }</style>"]' -o '"mapping"."PointBlank::DOM::InlineImage".figcaption: true' - -
# Render file.md into a set of HTML tags, without anything extra
$ mmmdpp -r HTML -o '"nowrap": true' file.md file.html
```
A lot more usage options are documented on the Wiki page for the project
License
-------
Copyright 2025 yessiest@text.512mb.org
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,173 +0,0 @@
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'io/console/size'
require 'optionparser'
require 'json'
require 'mmmd'
class ParserError < StandardError
end
class OptionNavigator
def initialize
@options = {}
end
# Read a definition
# @param define [String]
def read_definition(define)
locstring, value = deconstruct(define)
assign(locstring, JSON.parse(value))
end
attr_reader :options
private
def check_unescaped(str, index)
return true if index.zero?
reverse_index = index - 1
count = 0
while str[reverse_index] == "\\"
break if reverse_index.zero?
count += 1
reverse_index -= 1
end
count.even?
end
def find_unescaped(str, pattern, index)
found = str.index(pattern, index)
return nil unless found
until check_unescaped(str, found)
index = found + 1
found = str.index(pattern, index)
return nil unless found
end
found
end
def deconstruct(locstring)
parts = []
buffer = ""
part = nil
until locstring.empty?
case locstring[0]
when '"'
raise ParserError, 'separator missing' unless buffer.empty?
closepart = find_unescaped(locstring, '"', 1)
raise ParserError, 'unclosed string' unless closepart
buffer = locstring[0..closepart]
part = buffer[1..-2]
locstring = locstring[closepart + 1..]
when '.'
parts.append(part)
buffer = ""
part = nil
locstring = locstring[1..]
when '['
raise ParserError, 'separator missing' unless buffer.empty?
closepart = find_unescaped(locstring, ']', 1)
raise ParserError, 'unclosed index' unless closepart
buffer = locstring[0..closepart]
part = locstring[1..-2].to_i
locstring = locstring.delete_prefix(buffer)
when ':'
locstring = locstring.delete_prefix(':')
break
else
raise ParserError, 'separator missing' unless buffer.empty?
buffer = locstring.match(/^[\w_]+/)[0]
part = buffer.to_sym
locstring = locstring.delete_prefix(buffer)
end
end
parts.append(part) if part
[parts, locstring]
end
def assign(keys, value)
current = @options
while keys.length > 1
current_key = keys.shift
unless current[current_key]
next_key = keys.first
case next_key
when Integer
current[current_key] = []
when String
current[current_key] = {}
when Symbol
current[current_key] = {}
end
end
current = current[current_key]
end
current[keys.shift] = value
end
end
options = {
include: [],
nav: OptionNavigator.new
}
parser = OptionParser.new do |opts|
opts.banner = "Usage: mmmdpp [OPTIONS] (input|-) (output|-)"
opts.on("-r", "--renderer [STRING]", String,
"Specify renderer to use for this document") do |renderer|
options[:renderer] = renderer
end
opts.on("-i", "--include [STRING]", String,
"Script to execute before rendering.\
May be specified multiple times.") do |inc|
options[:include].append(inc)
end
opts.on("-e", "--extension [STRING]", String,
"Enable extension") do |inc|
options[:include].append("#{__dir__}/../lib/mmmd/extensions/#{inc}.rb")
end
opts.on("-o", "--option [STRING]", String,
"Add option string. Can be repeated. Format: <key>: <JSON value>\n"\
"<key>: (<\"string\">|<symbol>|<[integer]>)"\
"[.(<\"string\"|<symbol>|<[integer]>[...]]\n"\
"Example: \"style\".\"CodeBlock\".literal.[0]: 50") do |value|
options[:nav].read_definition(value) if value
end
end
parser.parse!
unless ARGV[1]
warn parser.help
exit 1
end
Renderers = {
"HTML" => -> { ::MMMD::Renderers::HTML },
"Plainterm" => -> { ::MMMD::Renderers::Plainterm }
}.freeze
options[:include].each { |name| Kernel.load(name) }
renderer_opts = options[:nav].options
renderer_opts["hsize"] ||= IO.console_size[1]
input = ARGV[0] == "-" ? $stdin.read : File.read(ARGV[0])
output = ARGV[1] == "-" ? $stdout : File.open(ARGV[1], "w")
doc = MMMD.parse(input)
rclass = Renderers[options[:renderer] || "Plainterm"]
raise StandardError, "unknown renderer: #{options[:renderer]}" unless rclass
renderer = rclass.call.new(doc, renderer_opts)
output.puts(renderer.render)
output.close

View File

@ -1,14 +0,0 @@
# frozen_string_literal: true
require_relative 'mmmd/blankshell'
require_relative 'mmmd/renderers'
# Extensible, multi-format markdown processor
module MMMD
# Parse a Markdown document into a DOM form
# @param doc [String]
# @return [::PointBlank::DOM::Document]
def self.parse(doc)
::PointBlank::DOM::Document.parse(doc)
end
end

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,355 +0,0 @@
# frozen_string_literal: true
require_relative '../blankshell'
module PointBlank
module Parsing
# Table overlay
class TableParser < ::PointBlank::Parsing::NullParser
# (see ::PointBlank::Parsing::NullParser#begin?)
def self.begin?(line)
check_line(line) && !check_separator(line)
end
# Check that a line is a separator
# @param line [String]
# @return [Boolean]
def self.check_separator(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?(?:---+|===+):?$/) }
end
# Check that a line is an actual table line
# @param line [String]
# @return [Boolean]
def self.check_line(line)
line.match?(/^\A {0,3}\S/) &&
find_unescaped(line, "|") &&
line.match?(/[^|]+\|/)
end
# Find the first occurence of an unescaped pattern
# @param string [String]
# @param pattern [Regexp, String]
# @return [Integer, nil]
def self.find_unescaped(string, pattern)
initial = 0
while (index = string.index(pattern, initial))
return index if check_unescaped(index, string)
initial = index + 1
end
nil
end
# Check that the symbol at this index is not escaped
# @param index [Integer]
# @param string [String]
# @return [nil, Integer]
def self.check_unescaped(index, string)
return index if index.zero?
count = 0
index -= 1
while index >= 0 && string[index] == "\\"
count += 1
index -= 1
end
(count % 2).zero?
end
# (see ::PointBlank::Parsing::NullParser#close)
def close(block, lazy: false)
return ::PointBlank::DOM::Paragraph unless @correct
nil
end
# (see ::PointBlank::Parsing::NullParser#consume)
def consume(line, _parent = nil, lazy: false)
return [nil, nil] if lazy
return [nil, nil] unless check_line(line)
unless @attempted
@enclosed = true if line.match?(/^\s*\|.+?\|\s*$/)
@attempted = true
end
@correct ||= check_separator(line)
[line, nil]
end
attr_reader :enclosed
private
def check_separator(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?===+:?$/) }
end
def check_line(line)
!self.class.find_unescaped(line, "|").nil? &&
line.match?(/[^|]+\|/)
end
end
# Table row
class TableRowParser < ::PointBlank::Parsing::NullParser
# (see ::PointBlank::Parsing::NullParser#begin?)
def self.begin?(line)
check_line(line) && !check_separator(line)
end
# Check that a line is a separator
# @param line [String]
# @return [Boolean]
def self.check_separator(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?(?:---+|===+):?$/) }
end
# Check that a line is an actual table line
# @param line [String]
# @return [Boolean]
def self.check_line(line)
line.match?(/^\A {0,3}\S/) &&
find_unescaped(line, "|") &&
line.match?(/[^|]+\|/)
end
# Find the first occurence of an unescaped pattern
# @param string [String]
# @param pattern [Regexp, String]
# @return [Integer, nil]
def self.find_unescaped(string, pattern)
initial = 0
while (index = string.index(pattern, initial))
return index if check_unescaped(index, string)
initial = index + 1
end
nil
end
# Check that the symbol at this index is not escaped
# @param index [Integer]
# @param string [String]
# @return [nil, Integer]
def self.check_unescaped(index, string)
return index if index.zero?
count = 0
index -= 1
while index >= 0 && string[index] == "\\"
count += 1
index -= 1
end
(count % 2).zero?
end
def consume(line, parent = nil, lazy: false)
line = line.gsub(/[\\\s]+$/, '')
if parent.parser.enclosed
line = line
.strip
.delete_prefix("|")
.delete_suffix("|")
end
return [nil, nil] if @consumed && !check_separator(line)
@consumed = check_header(line) || check_separator(line)
push("|#{line}|\n")
[line, nil]
end
private
def check_separator(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?---+:?$/) }
end
def check_header(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?===+:?$/) }
end
def check_line(line)
!self.class.find_unescaped(line, "|").nil? &&
line.match?(/[^|]+\|/)
end
end
# Table Row overlay (decides the type of row used)
class TableRowOverlay < NullOverlay
# (see ::PointBlank::Parsing::NullOverlay#tokenize)
def process(block, lazy: false)
output = check_underlines(block.content.lines.last)
block.content = block.content.lines[0..-2].join("")
output
end
private
# Check which type of row this particular row should be
def check_underlines(line)
if check_header(line)
::PointBlank::DOM::TableHeaderRow
else
::PointBlank::DOM::TableRow
end
end
# Check if the line is a header
def check_header(line)
line.split("|")
.reject { |p| p.strip.empty? }
.all? { |p| p.strip.match?(/^:?===+:?$/) }
end
end
# Table column separator
class TableColumnInline < NullInline
# (see ::PointBlank::Parsing::NullInline#tokenize)
def self.tokenize(string, *_lookaround)
iterate_tokens(string, /[|\n]/) do |_before, text, matched|
next text unless matched
sym = text[0]
[sym, self, sym == '|' ? :open : :wrap]
end
end
# (see ::PointBlank::Parsing::NullInline#forward_walk)
def self.forward_walk(parts)
buffer = []
current = []
bin_idx = 0
skip_first = true
parts.each_with_index do |part, idx|
next current.append(part) unless part.is_a?(Array) &&
part[1] == self
next (skip_first = false) if skip_first
if part.last == :open
buffer.append([]) if buffer.length < bin_idx + 1
buffer[bin_idx] += current + ["\n"]
bin_idx += 1
else
bin_idx = 0
skip_first = true
end
current = []
end
[build(merge_lines(buffer.first)),
buffer[1..].map { |x| build(merge_lines(x)) }]
end
# Merge line runs so that the content looks correct
# @param current [Array<String, Array>]
def self.merge_lines(current)
result = []
current.each do |part|
next result.append(part) unless part.is_a? String
if result.last.is_a? String
result[-1] += part.lstrip.gsub(/ +\n?/," ")
else
result.append(part.lstrip.gsub(/ +\n?/," "))
end
end
result[-1] = result.last.rstrip if result.last.is_a? String
result
end
end
# Header row table column separator
# (exists because of a bug in handling parser_for)
class TableHeaderColumnInline < TableColumnInline
end
end
module DOM
# Table column
class TableColumn < ::PointBlank::DOM::InlineElement
define_parser ::PointBlank::Parsing::TableColumnInline
end
# Table column root (virtual)
class TableColumnRoot < ::PointBlank::DOM::InlineRoot
define_scanner ::PointBlank::Parsing::StackScanner
define_child TableColumn
end
# Table column
class TableHeaderColumn < ::PointBlank::DOM::InlineElement
define_parser ::PointBlank::Parsing::TableHeaderColumnInline
end
# Table header column root (virtual)
class TableHeaderColumnRoot < ::PointBlank::DOM::InlineRoot
define_scanner ::PointBlank::Parsing::StackScanner
define_child TableHeaderColumn
end
# Table header row
class TableHeaderRow < ::PointBlank::DOM::DOMObject
define_parser ::PointBlank::Parsing::TableRowParser
define_conversion ::PointBlank::DOM::TableHeaderColumnRoot
end
# Table row
class TableRow < ::PointBlank::DOM::DOMObject
define_parser ::PointBlank::Parsing::TableRowParser
define_overlay ::PointBlank::Parsing::TableRowOverlay
define_conversion ::PointBlank::DOM::TableColumnRoot
end
# Table
class Table < ::PointBlank::DOM::DOMObject
define_parser ::PointBlank::Parsing::TableParser
define_child ::PointBlank::DOM::TableRow, 300
define_child ::PointBlank::DOM::TableHeaderRow
end
# Document extension
::PointBlank::DOM::Block.class_eval do
define_child ::PointBlank::DOM::Table, 1500
end
Block.subclasses.map(&:upsource)
end
end
# Touch to do autoloading
MMMD::Renderers::HTML.yield_self
module MMMD
module Renderers
module HTMLConstants
if defined? MapManager
MapManager.define_mapping "PointBlank::DOM::Table", {
tag: "table"
}
MapManager.define_mapping "PointBlank::DOM::TableRow", {
tag: "tr"
}
MapManager.define_mapping "PointBlank::DOM::TableHeaderRow", {
tag: "tr"
}
MapManager.define_mapping "PointBlank::DOM::TableColumn", {
tag: "td"
}
MapManager.define_mapping "PointBlank::DOM::TableHeaderColumn", {
tag: "th"
}
end
end
end
end

View File

@ -1,11 +0,0 @@
# frozen_string_literal: true
$LOAD_PATH.append(__dir__)
module MMMD
# Renderers from Markdown to expected output format
module Renderers
autoload :HTML, 'renderers/html'
autoload :Plainterm, 'renderers/plainterm'
end
end

View File

@ -1,379 +0,0 @@
# frozen_string_literal: true
require_relative "../util"
module MMMD
module Renderers
module HTMLConstants
ELEMENT_MAP = {
"PointBlank::DOM::InlinePre" => {
tag: "code",
style: "white-space: pre;"
},
"PointBlank::DOM::InlineBreak" => {
tag: "br"
},
"PointBlank::DOM::InlineStrong" => {
tag: "strong"
},
"PointBlank::DOM::InlineEmphasis" => {
tag: "em"
},
"PointBlank::DOM::InlineUnder" => {
tag: "span",
style: "text-decoration: underline;"
},
"PointBlank::DOM::InlineStrike" => {
tag: "s"
},
"PointBlank::DOM::InlineLink" => {
tag: "a",
href: true,
title: true
},
"PointBlank::DOM::InlineImage" => {
tag: "img",
src: true,
inline: true,
alt: true,
title: true
},
"PointBlank::DOM::ULBlock" => {
tag: "ul"
},
"PointBlank::DOM::OLBlock" => {
tag: "ol"
},
"PointBlank::DOM::IndentBlock" => {
tag: "pre",
codeblock: true
},
"PointBlank::DOM::ULListElement" => {
tag: "li"
},
"PointBlank::DOM::OLListElement" => {
tag: "li"
},
"PointBlank::DOM::Paragraph" => {
tag: "p"
},
"PointBlank::DOM::SetextHeading1" => {
tag: "h1"
},
"PointBlank::DOM::SetextHeading2" => {
tag: "h2"
},
"PointBlank::DOM::ATXHeading1" => {
tag: "h1"
},
"PointBlank::DOM::ATXHeading2" => {
tag: "h2"
},
"PointBlank::DOM::ATXHeading3" => {
tag: "h3"
},
"PointBlank::DOM::ATXHeading4" => {
tag: "h4"
},
"PointBlank::DOM::ATXHeading5" => {
tag: "h5"
},
"PointBlank::DOM::ATXHeading6" => {
tag: "h6"
},
"PointBlank::DOM::Document" => {
tag: "main"
},
"PointBlank::DOM::CodeBlock" => {
tag: "pre",
outer: {
tag: "code"
},
codeblock: true
},
"PointBlank::DOM::QuoteBlock" => {
tag: "blockquote"
},
"PointBlank::DOM::HorizontalRule" => {
tag: "hr",
inline: true
},
"PointBlank::DOM::Text" => {
sanitize: true
},
"PointBlank::DOM::InlineAutolink" => {
tag: "a",
href: true
}
}.freeze
# Class for managing styles and style overrides
class MapManager
class << self
# Define a default mapping for specified class
# @param key [String] class name
# @param mapping [Hash] mapping
# @return [void]
def define_mapping(key, mapping)
@mapping ||= ELEMENT_MAP.dup
@mapping[key] = mapping
end
# Get computed mapping
# @return [Hash]
def mapping
@mapping ||= ELEMENT_MAP.dup
end
end
def initialize(overrides)
@mapping = self.class.mapping
overrides["mapping"]&.each do |key, value|
next unless @mapping[key]
@mapping[key] = @mapping[key].merge(value)
end
end
attr_reader :mapping
end
end
# HTML Renderer
class HTML
def initialize(dom, options)
@document = dom
@options = options
@options["linewrap"] ||= 80
@options["init_level"] ||= 2
@options["indent"] ||= 2
mapmanager = HTMLConstants::MapManager.new(options)
@mapping = mapmanager.mapping
return unless @options["nowrap"]
@options["init_level"] = 0
@mapping.delete("PointBlank::DOM::Document")
end
# Render document to HTML
def render
text = _render(@document, @options, level: @options["init_level"])
@options["init_level"].times { text = indent(text) }
if @options["nowrap"]
remove_pre_spaces(text)
else
[
preambule,
remove_pre_spaces(text),
postambule
].join("\n")
end
end
private
# Find and remove extra spaces inbetween preformatted text
# @param string [String]
# @return [String]
def remove_pre_spaces(string)
output = []
buffer = []
open = nil
string.lines.each do |line|
opentoken = line.match?(/<pre>/)
closetoken = line.match?(/<\/pre>/)
if closetoken
open = false
buffer = strip_leading_spaces_in_buffer(buffer)
output.append(*buffer)
buffer = []
end
(open ? buffer : output).append(line)
open = true if opentoken && !closetoken
end
output.append(*buffer) unless buffer.empty?
output.join('')
end
# Strip leading spaces in the buffer
# @param lines [Array<String>]
# @return [Array<String>]
def strip_leading_spaces_in_buffer(buffer)
minprefix = buffer.map { |x| x.match(/^ */)[0] }
.min_by(&:length)
buffer.map do |line|
line.delete_prefix(minprefix)
end
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/( +|<[^>]+>)/)
output = []
line = ""
length = 0
until words.empty?
word = words.shift
wordlength = word.length
if length + wordlength + 1 > width
output.append(line.lstrip)
line = word
length = wordlength
next
end
length += wordlength
line += word
end
output.append(line.lstrip)
output.join("\n")
end
def _render(element, options, inline: false, level: 0, literaltext: false)
modeswitch = figure_out_modeswitch(element) unless inline
inline ||= modeswitch
level += 1 unless inline
text = if element.children.empty?
element.content
else
literal = @mapping[element.class.name]
&.fetch(:inline, false) ||
literaltext
element.children.map do |child|
_render(child, options, inline: inline,
level: level,
literaltext: literal)
end.join(inline ? '' : "\n")
end
run_filters(text, element, level: level,
inline: inline,
modeswitch: modeswitch,
literaltext: literaltext)
end
def figure_out_modeswitch(element)
element.is_a?(::PointBlank::DOM::LeafBlock) ||
element.is_a?(::PointBlank::DOM::Paragraph) ||
element.is_a?(::PointBlank::DOM::InlineElement)
end
def run_filters(text, element, level:, inline:, modeswitch:,
literaltext:)
element_style = @mapping[element.class.name]
return text unless element_style
return text if literaltext
codeblock = element_style[:codeblock]
hsize = @options["linewrap"] - (level * @options["indent"])
text = wordwrap(text, hsize) if modeswitch && !codeblock
if element_style[:sanitize]
text = MMMD::EntityUtils.encode_entities(text)
end
if element_style[:inline]
innerclose(element, element_style, text)
else
openclose(text, element, element_style,
codeblock ? false : inline)
end
end
def openclose(text, element, element_style, inline)
opentag, closetag = construct_tags(element_style, element)
if inline
opentag + text + closetag
else
[opentag,
indent(text.rstrip),
closetag].join("\n")
end
end
def innerclose(element, style, text)
props = element.properties
tag = ""
tag += "<figure>" if style[:figcaption]
tag += "<#{style[:tag]}"
tag += " style=#{style[:style].inspect}" if style[:style]
tag += " href=#{read_link(element)}" if style[:href]
tag += " alt=#{text.inspect}" if style[:alt]
tag += " src=#{read_link(element)}" if style[:src]
tag += " title=#{read_title(element)}" if style[:title] && props[:title]
tag += ">"
if style[:figcaption]
tag += "<figcaption>#{text}</figcaption></figure>"
end
if style[:outer]
outeropen, outerclose = construct_tags(style[:outer], element)
tag = outeropen + tag + outerclose
end
tag
end
def construct_tags(style, element)
return ["", ""] unless style && style[:tag]
props = element.properties
opentag = "<#{style[:tag]}"
opentag += "<figure>#{opentag}" if style[:figcaption]
closetag = "</#{style[:tag]}>"
if style[:figcaption]
closetag += "<figcaption>#{text}</figcaption></figure>"
end
opentag += " style=#{style[:style].inspect}" if style[:style]
opentag += " href=#{read_link(element)}" if style[:href]
opentag += " src=#{read_link(element)}" if style[:src]
opentag += " title=#{read_title(element)}" if style[:title] &&
props[:title]
opentag += ">"
if style[:outer]
outeropen, outerclose = construct_tags(style[:outer], element)
opentag = outeropen + opentag
closetag += outerclose
end
[opentag, closetag]
end
def read_title(element)
title = element.properties[:title]
title = ::MMMD::EntityUtils.encode_entities(title)
title.inspect
end
def read_link(element)
link = element.properties[:uri]
link.inspect
end
def indent(text)
text.lines.map do |line|
"#{' ' * @options['indent']}#{line}"
end.join('')
end
def preambule
head = @options['head']
headinfo = "#{indent(<<~HEAD.rstrip)}\n " if head
<head>
#{indent(head.is_a?(Array) ? head.join("\n") : head)}
</head>
HEAD
headinfo ||= " "
@options['preambule'] or <<~TEXT.rstrip
<!DOCTYPE HTML>
<html>
#{headinfo}<body>
TEXT
end
def postambule
@options['postambule'] or <<~TEXT
</body>
</html>
TEXT
end
end
end
end

View File

@ -1,463 +0,0 @@
# frozen_string_literal: true
# Attempt to source a provider for the wide char width calculator
# (TODO)
module MMMD
# Module for managing terminal output
module TextManager
# ANSI SGR escape code for bg color
# @param text [String]
# @param options [Hash]
# @return [String]
def bg(text, options)
color = options['bg']
if color.is_a? Integer
"\e[48;5;#{color}m#{text}\e[49m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[48;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[49m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for fg color
# @param text [String]
# @param options [Hash]
# @return [String]
def fg(text, options)
color = options['fg']
if color.is_a? Integer
"\e[38;5;#{color}m#{text}\e[39m"
elsif color.is_a? String and color.match?(/\A#[A-Fa-f0-9]{6}\Z/)
vector = color.scan(/[A-Fa-f0-9]{2}/).map { |x| x.to_i(16) }
"\e[38;2;#{vector[0]};#{vector[1]};#{vector[2]}\e[39m"
else
Kernel.warn "WARNING: Invalid color - #{color}"
text
end
end
# ANSI SGR escape code for bold text
# @param text [String]
# @param options [Hash]
# @return [String]
def bold(text, _options)
"\e[1m#{text}\e[22m"
end
# ANSI SGR escape code for italics text
# @param text [String]
# @param options [Hash]
# @return [String]
def italics(text, _options)
"\e[3m#{text}\e[23m"
end
# ANSI SGR escape code for underline text
# @param text [String]
# @param options [Hash]
# @return [String]
def underline(text, _options)
"\e[4m#{text}\e[24m"
end
# ANSI SGR escape code for strikethrough text
# @param text [String]
# @param options [Hash]
# @return [String]
def strikethrough(text, _options)
"\e[9m#{text}\e[29m"
end
# Word wrapping algorithm
# @param text [String]
# @param width [Integer]
# @return [String]
def wordwrap(text, width)
words = text.split(/( +)/)
output = []
line = ""
length = 0
until words.empty?
word = words.shift
wordlength = smort_length(word)
if wordlength > width
words.prepend(word[width..])
word = word[..width - 1]
end
if length + wordlength + 1 > width
output.append(line.lstrip)
line = word
length = wordlength
next
end
length += wordlength
line += word
end
output.append(line.lstrip)
output.join("\n")
end
# (TODO: smorter stronger better faster)
# SmЯt™ word length
# @param text [String]
# @return [Integer]
def smort_length(text)
text.gsub(/\e\[[^m]+m/, '').length
end
# Left-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def ljust_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
textlength < size ? line + " " * (size - textlength) : line
end.join("\n")
end
# Right-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def rjust_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
textlength < size ? " " * (size - textlength) + line : line
end.join("\n")
end
# Center-justify a line while ignoring terminal control codes
# @param text [String]
# @param size [Integer]
# @return [String]
def center_cc(text, size)
text.lines.map do |line|
textlength = smort_length(line)
if textlength < size
freelength = size - textlength
rightlength = freelength / 2
leftlength = freelength - rightlength
" " * leftlength + line + " " * rightlength
else
line
end
end.join("\n")
end
# Draw a screen-width box around text
# @param text [String]
# @param options [Hash]
# @return [String]
def box(text, options)
size = options[:hsize] - 2
text = wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
"#{ljust_cc(line, size)}" unless line.empty?
end.join("\n")
<<~TEXT
#{'─' * size}╮
#{text}
#{'─' * size}╯
TEXT
end
# Draw a horizontal rule
def hrule(_text, options)
size = options[:hsize]
" #{'─' * (size - 2)} "
end
# Draw text right-justified
def rjust(text, options)
size = options[:hsize]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
rjust_cc(line, size) unless line.empty?
end.join("\n")
end
# Draw text centered
def center(text, options)
size = options[:hsize]
wordwrap(text, (size * 0.8).floor).lines.filter_map do |line|
center_cc(line, size) unless line.empty?
end.join("\n")
end
# Underline the last line of the text piece
def underline_block(text, options)
textlines = text.lines
last = "".match(/()()()/)
textlines.each do |x|
current = x.match(/\A(\s*)(.+?)(\s*)\Z/)
last = current if smort_length(current[2]) > smort_length(last[2])
end
ltxt = last[1]
ctxt = textlines.last.slice(last.offset(2)[0]..last.offset(2)[1] - 1)
rtxt = last[3]
textlines[-1] = [ltxt, underline(ctxt, options), rtxt].join('')
textlines.join("")
end
# Add extra newlines around the text
def extra_newlines(text, options)
size = options[:hsize]
textlines = text.lines
textlines.prepend("#{' ' * size}\n")
textlines.append("\n#{' ' * size}\n")
textlines.join("")
end
# Underline last line edge to edge
def underline_full_block(text, options)
textlines = text.lines
last_line = textlines.last.match(/^.*$/)[0]
textlines[-1] = "#{underline(last_line, options)}\n"
textlines.join("")
end
# Indent all lines
def indent(text, _options)
_indent(text)
end
# Indent all lines (inner)
def _indent(text)
text.lines.map do |line|
" #{line}"
end.join("")
end
# Left overline all lines
def leftline(text, _options)
text.lines.map do |line|
"#{line}"
end.join("")
end
# Bulletpoints
def bullet(text, _options)
"-#{_indent(text)[1..]}"
end
# Numbers
def numbered(text, options)
number = options[:number]
length = number.to_s.length + 1
(length / 4 + 1).times { text = _indent(text) }
"#{number}.#{text[length..]}"
end
end
module Renderers
module PlaintermConstants
DEFAULT_STYLE = {
"PointBlank::DOM::Paragraph" => {
indent: true,
increase_level: true
},
"PointBlank::DOM::Text" => {},
"PointBlank::DOM::SetextHeading1" => {
center: true,
bold: true,
extra_newlines: true,
underline_full_block: true
},
"PointBlank::DOM::SetextHeading2" => {
center: true,
underline_block: true
},
"PointBlank::DOM::ATXHeading1" => {
center: true,
bold: true,
extra_newlines: true,
underline_full_block: true
},
"PointBlank::DOM::ATXHeading2" => {
center: true,
underline_block: true
},
"PointBlank::DOM::ATXHeading3" => {
underline: true,
bold: true
},
"PointBlank::DOM::ATXHeading4" => {
bold: true,
underline: true
},
"PointBlank::DOM::ATXHeading5" => {
underline: true
},
"PointBlank::DOM::ATXHeading6" => {
underline: true
},
"PointBlank::DOM::InlineImage" => {
underline: true
},
"PointBlank::DOM::InlineLink" => {
underline: true
},
"PointBlank::DOM::InlinePre" => {},
"PointBlank::DOM::InlineEmphasis" => {
italics: true
},
"PointBlank::DOM::InlineStrong" => {
bold: true
},
"PointBlank::DOM::ULListElement" => {
bullet: true,
increase_level: true
},
"PointBlank::DOM::OLListElement" => {
numbered: true,
increase_level: true
},
"PointBlank::DOM::QuoteBlock" => {
leftline: true,
increase_level: true
},
"PointBlank::DOM::HorizontalRule" => {
hrule: true
}
}.freeze
DEFAULT_EFFECT_PRIORITY = {
hrule: 10_500,
numbered: 10_000,
leftline: 9500,
bullet: 9000,
indent: 8500,
underline_full_block: 8000,
underline_block: 7500,
extra_newlines: 7000,
center: 6000,
rjust: 5500,
box: 5000,
underline: 4000,
italics: 3500,
bold: 3000,
fg: 2500,
bg: 2000,
strikethrough: 1500
}.freeze
# Class for managing styles and style overrides
class StyleManager
class << self
# Define a default style for specified class
# @param key [String] class name
# @param style [Hash] style
# @return [void]
def define_style(key, style)
@style ||= DEFAULT_STYLE.dup
@style[key] = style
end
# Define an effect priority value
# @param key [String] effect name
# @param priority [Integer] value of the priority
# @return [void]
def define_effect_priority(key, priority)
@effect_priority ||= DEFAULT_EFFECT_PRIORITY.dup
@effect_priority[key] = priority
end
# Get computed style
# @return [Hash]
def style
@style ||= DEFAULT_STYLE.dup
end
# Get computed effect priority
# @return [Hash]
def effect_priority
@effect_priority ||= DEFAULT_EFFECT_PRIORITY.dup
end
end
def initialize(overrides)
@style = self.class.style
@effect_priority = self.class.effect_priority
overrides["style"]&.each do |key, value|
next unless @style[key]
@style[key] = @style[key].merge(value)
end
end
attr_reader :style, :effect_priority
end
end
# Primary document renderer
class Plainterm
include ::MMMD::TextManager
# @param input [String]
# @param options [Hash]
def initialize(input, options)
@doc = input
@color_mode = options.fetch("color", true)
@ansi_mode = options.fetch("ansi", true)
style_manager = PlaintermConstants::StyleManager.new(options)
@style = style_manager.style
@effect_priority = style_manager.effect_priority
@effects = @effect_priority.to_a.sort_by(&:last).map(&:first)
@options = options
@options["hsize"] ||= 80
end
# Return rendered text
# @return [String]
def render
_render(@doc, @options)
end
private
def _render(element, options, inline: false, level: 0, index: 0)
modeswitch = element.is_a?(::PointBlank::DOM::LeafBlock) ||
element.is_a?(::PointBlank::DOM::Paragraph)
inline ||= modeswitch
level += calculate_level_increase(element)
text = if element.children.empty?
element.content
else
element.children.map.with_index do |child, index|
_render(child, options, inline: inline,
level: level,
index: index)
end.join(inline ? '' : "\n\n")
end
run_filters(text, element, level: level,
modeswitch: modeswitch,
index: index)
end
def run_filters(text, element, level:, modeswitch:, index:)
element_style = @style[element.class.name]
return text unless element_style
hsize = @options["hsize"] - (4 * level)
text = wordwrap(text, hsize) if modeswitch
params = element_style.dup
params[:hsize] = hsize
params[:number] = index + 1
@effects.each do |effect|
text = method(effect).call(text, params) if element_style[effect]
end
text
end
def calculate_level_increase(element)
level = 0
element_style = @style[element.class.name]
level += 1 if element_style && element_style[:increase_level]
level
end
end
end
end

View File

@ -1,61 +0,0 @@
# frozen_string_literal: true
require 'json'
module MMMD
# Utils for working with entities in strings
module EntityUtils
ENTITY_DATA = JSON.parse(File.read("#{__dir__}/entities.json"))
# Decode html entities in string
# @param string [String]
# @return [String]
def self.decode_entities(string)
string = string.gsub(/&#\d{1,7};/) do |match|
match[1..-2].to_i.chr("UTF-8")
end
string = string.gsub(/&#[xX][\dA-Fa-f]{1,6};/) do |match|
match[3..-2].to_i(16).chr("UTF-8")
end
string.gsub(/&\w+;/) do |match|
ENTITY_DATA[match] ? ENTITY_DATA[match]["characters"] : match
end
end
# Encode unsafe html entities in string (ASCII-compatible)
# @param string [String]
# @return [String]
# @sg-ignore
def self.encode_entities_ascii(string)
string.gsub("&", "&amp;")
.gsub("<", "&lt;")
.gsub(">", "&gt;")
.gsub('"', "&quot;")
.gsub("'", "&#39;")
.gsub(/[^\x00-\x7F]/) do |match|
"&#x#{match.codepoints[0]};"
end
end
# Encode unsafe html entities in string
# @param string [String]
# @return [String]
# @sg-ignore
def self.encode_entities(string)
string.gsub("&", "&amp;")
.gsub("<", "&lt;")
.gsub(">", "&gt;")
.gsub('"', "&quot;")
.gsub("'", "&#39;")
end
# Encode uri components that may break HTML syntax
# @param string [String]
# @return [String]
def self.encode_uri(string)
string.gsub('"', "%22")
.gsub("'", "%27")
.gsub(" ", "%20")
end
end
end

217
markdown.rb Normal file
View File

@ -0,0 +1,217 @@
## Filter-based Markdown translator.
#
module Markdown
## Superclass that defines behaviour of all translators
# @abstract Don't use directly - it only defins the ability to chain translators
class AbstractTranslator
attr_accessor :input
attr_accessor :output
def initialize()
@chain = []
end
def +(nextTranslator)
@chain.append nextTranslator
return self
end
def to_html
output = @output
@chain.each { |x|
x = x.new(output) if x.class == Class
x.to_html
output = x.output
}
return output
end
end
module_function
def html_highlighter; @html_highlighter end
def html_highlighter= v; @html_highlighter = v end
## Translator for linear tags in Markdown.
# A linear tag is any tag that starts anywhere on the line, and closes on the same exact line.
class LinearTagTranslator < AbstractTranslator
def initialize(text)
@input = text
@output = text
super()
end
def to_html
@output = @input
# Newline
.sub(/\s{2}[\n\r]/,"<br/>")
# Inline code (discord style)
.gsub(/(?<!\\)``(.*?[^\\])``/) {
code = Regexp.last_match[1]
"<code>#{code.gsub /[*`~_!\[]/,"\\\\\\0"}</code>"
}
# Inline code (Markdown style)
.gsub(/(?<!\\)`(.*?[^\\])`/) {
code = Regexp.last_match[1]
"<code>#{code.gsub /[*`~_!\[]/,"\\\\\\0"}</code>"
}
# Bold-italics
.gsub(/(?<!\\)\*\*\*(.*?[^\\])\*\*\*/,"<i><b>\\1</b></i>")
# Bold
.gsub(/(?<!\\)\*\*(.*?[^\\])\*\*/,"<b>\\1</b>")
# Italics
.gsub(/(?<!\\)\*(.*?[^\\])\*/,"<i>\\1</i>")
# Strikethrough
.gsub(/(?<!\\)~~(.*?[^\\])~~/,"<s>\\1</s>")
# Underline
.gsub(/(?<!\\)__(.*?[^\\])__/,"<span style=\"text-decoration: underline\">\\1</span>")
# Image
.gsub(/(?<!\\)!\[(.*)\]\((.*)\)/,"<img src=\"\\2\" alt=\"\\1\" />")
# Link
.gsub(/(?<!\\)\[(.*)\]\((.*)\)/,"<a href=\"\\2\">\\1</a>")
super
end
end
## Translator for linear leftmost tags.
# Leftmost linear tags open on the leftmost end of the string, and close once the line ends. These tags do not need to be explicitly closed.
class LeftmostTagTranslator < AbstractTranslator
def initialize(text)
@input = text
@output = text
super()
end
def to_html
# Headers
@output = @input.split("\n").map do |x|
x.gsub(/^(?<!\\)(\#{1,4})([^\n\r]*)/) {
level,content = Regexp.last_match[1..2]
"<h#{level.length}>"+content+"</h#{level.length}>"
}.gsub(/^\-{3,}/,"<hr>")
end.join("\n")
super
end
end
## Translator for code blocks in markdown
# Code blocks can have syntax highlighting. This class implements an attribute for providing a syntax highlighter, one handler per requested output.
class CodeBlockTranslator < AbstractTranslator
def initialize(text)
@input = text
@output = text
super()
end
def to_html
@output = @input.gsub(/(?:\n|^)```([\w_-]*)([\s\S]+?)```/) {
language,code = Regexp.last_match[1..2]
code = Markdown::html_highlighter.call(language,code) if Markdown::html_highlighter
"<pre><code>#{code.gsub /[|#*`~_!\[]/,"\\\\\\0"}</code></pre>"
}
super()
end
end
## Translator for quotes in Markdown.
# These deserve their own place in hell. As if the "yaml with triangle brackets instead of spaces" syntax wasn't horrible enough, each quote is its own markdown context.
class QuoteTranslator < AbstractTranslator
def initialize(text)
if text.is_a? Array then
@lines = text
elsif text.is_a? String then
@lines = text.split("\n")
end
@output = text
super()
end
def input= (v)
@lines = v.split("\n")
@output = v
end
def input
@lines.join("\n")
end
def to_html
stack = []
range = []
@lines.each_with_index { |x,index|
if x.match /^\s*> ?/ then
range[0] = index if not range[0]
range[1] = index
else
stack.append(range[0]..range[1]) if range[0] and range[1]
range = []
end
}
stack.append(range[0]..range[1]) if range[0] and range[1]
stack.reverse.each { |r|
@lines[r.begin] = "<blockquote>\n"+@lines[r.begin]
@lines[r.end] = @lines[r.end]+"\n</blockquote>"
@lines[r] = @lines[r].map { |line|
line.sub /^(\s*)> ?/,"\\1 "
}
@lines[r] = QuoteTranslator.new(@lines[r]).to_html
}
@output = @lines.join("\n")
super
end
end
## Table parser
# translates tables from a format in markdown to an html table
class TableTranslator < AbstractTranslator
def initialize(text)
@input = text
@output = text
super()
end
def to_html
lines = @output.split("\n")
table_testline = -1
table_start = -1
table_column_count = 0
tables = []
cur_table = []
lines.each_with_index { |line,index|
if (table_start != -1) and (line.match /^\s*\|([^\|]*\|){#{table_column_count-1}}$/) then
if (table_testline == -1) then
if (line.match /^\s*\|(\-*\|){#{table_column_count-1}}$/) then
table_testline = 1
else
table_start = -1
cur_table = []
end
else
cur_table.push (line.split("|").filter_map { |x| x.strip if x.match /\S+/ })
end
elsif (table_start != -1) then
obj = {table: cur_table, start: table_start, end: index}
tables.push(obj)
table_start = -1
cur_table = []
table_testline = -1
table_column_count = 0
end
if (table_start == -1) and (line.start_with? /\s*\|/ ) and (line.match /^\s*\|.*\|/) then
table_start = index
table_column_count = line.count "|"
cur_table.push (line.split("|").filter_map { |x| x.strip if x.match /\S+/ })
end
}
if cur_table != [] then
obj = {table: cur_table, start:table_start, end: lines.count-1}
tables.push(obj)
end
tables.reverse.each { |x|
lines[x[:start]..x[:end]] = (x[:table].map do |a2d|
(a2d.map { |x| (x.start_with? "#") ? " <th>"+x.sub(/^#\s+/,"")+"</th>" : " <td>"+x+"</td>"}).prepend(" <tr>").append(" </tr>")
end).flatten.prepend("<table>").append("</table>")
}
@output = lines.join("\n")
super()
end
end
# Backslash cleaner
# Cleans excessive backslashes after the translation
class BackslashTranslator < AbstractTranslator
def initialize(text)
@input = text
@output = text
end
def to_html
@output = @input.gsub(/\\(.)/,"\\1")
end
end
end

View File

@ -1,21 +0,0 @@
# frozen_string_literal: true
Gem::Specification.new do |spec|
spec.name = "mmmd"
spec.version = "0.1.3"
spec.summary = "Modular, compliant Markdown processor"
spec.description = <<~DESC
MMMD (short for Mark My Manuscript Down) is a Markdown processor
(as in "parser and translator") with a CLI interface utility and
multiple modes of output (currently HTML and terminal).
DESC
spec.authors = ["Yessiest"]
spec.license = "AGPL-3.0-or-later"
spec.email = "yessiest@text.512mb.org"
spec.homepage = "https://adastra7.net/git/Yessiest/rubymark"
spec.files = Dir["lib/**/*"]
spec.bindir = Dir["bin"]
spec.executables << "mmmdpp"
spec.extra_rdoc_files = Dir["*.md"]
spec.required_ruby_version = ">= 3.0.0"
end

118
test.rb Normal file
View File

@ -0,0 +1,118 @@
require_relative "markdown"
puts Markdown::LinearTagTranslator.new(<<CODE
*Italics*
**Bold**
***Bolitalics***
__underline__
__underline plus ***bolitalics***__
___invalid underline___
~~strikethrough ~~
`code that ignores ***all*** __Markdown__ [tags](https://nevergonnagiveyouup)`
me: google en passant
them: [holy hell!](https://google.com/q?=en+passant)
CODE
).to_html
puts Markdown::LeftmostTagTranslator.new(<<CODE
# Header v1
## Header v2
### Header v3
#### Header v4
##### Invalid header
#### Not a header
*** Also #### Not a header ***
CODE
).to_html
puts Markdown::QuoteTranslator.new(<<CODE
> Quote begins
>
> yea
> # header btw
> > nextlevel quote
> > more quote
> > those are quotes
> > yes
> > > third level quote
> > > yes
> > second level again
> > > third level again
> > second level oioioi
> >
> > > third
> > >
> > >
> > >
>
>
>
> fin
CODE
).to_html
puts Markdown::CodeBlockTranslator.new(<<CODE
```markdown
shmarkshmark
# pee pee
# piss
**ass**
__cock__
cock__
piss__
`shmark shmark`
```
CODE
).to_html
test = (Markdown::CodeBlockTranslator.new(<<TEXT
# Markdown garbage gallery
## Header level 2
### Header level 3
#### Header level 4
__[Underlined Link](https://google.com)__
__**unreal shitworks**__
split
---
![Fucking image idk](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftse3.explicit.bing.net%2Fth%3Fid%3DOIP.qX1HmpFNHyaTfXv-SLnAJgHaDD%26pid%3DApi&f=1&ipt=dc0e92fdd701395eda76714338060dcf91c7ff9e228f108d8af6e1ba3decd1c2&ipo=images)
> Here's a bunch of shit i guess lmao idk
```markdown
test
test
test
|1|2|3|
|-|-|-|
|a|b|c|
| uneven rows | test | yes |
|-|-|-|
| sosiska | dinozavri | suda pihaem |
| sosiska 2 | vitalya 2 | brat 2 |
*** test ***
piss
cock
__cock__
# hi
```
> ok
> here i go pissing
> ***time to take a piss***
> > pissing
> > "what the hell are you doing"
> > i'm taking a pieeees
> > "why areyou not jomping at me thats what yourshupposed to do
> > I might do it focking later
> > ok
> # bug
> __cum__
__mashup__
| # sosiska | sosiska | suda pihaem |
|-|-|-|
| # 2 | chuvak ya ukral tvayu sardelku ))0)))0))))))) | __blya ((9((9((9)__ |
| # azazaz lalka sasI | test | test |
TEXT
)+Markdown::QuoteTranslator+Markdown::LeftmostTagTranslator+Markdown::LinearTagTranslator+Markdown::TableTranslator+Markdown::BackslashTranslator)
.to_html
write = File.new("/tmp/test.html","w")
write.write(test)
write.close

View File

@ -1,21 +0,0 @@
# frozen_string_literal: true
require_relative 'lib/mmmd/blankshell.rb'
structure = PointBlank::DOM::Document.parse(File.read(ARGV[0]))
def red(string)
"\033[31m#{string}\033[0m"
end
def yellow(string)
"\033[33m#{string}\033[0m"
end
def prettyprint(doc, indent = 0)
closed = doc.properties[:closed]
puts "#{yellow(doc.class.name.gsub(/\w+::DOM::/,""))}#{red(closed ? "(c)" : "")}: #{doc.content.inspect}"
doc.children.each do |child|
print red("#{" " * indent} - ")
prettyprint(child, indent + 4)
end
end
prettyprint(structure)

View File

@ -1,278 +0,0 @@
Architecture of madness
=======================
Prelude
-------
It needs to be stressed that making the parser modular while keeping it
relatively simple was a laborous undertaking. There has not been a standard
more hostile towards the people who dare attempt to implement it than
CommonMark. It should also be noted, that despite it being titled a
"Standard" in this document, it is less widely adopted than the Github
Flavored Markdown syntax. Github Flavored Markdown, however, is only but
a mere subset of this parser's model, albeit requiring a few extensions.
Current state (as of March 02, 2025)
------------------------------------
This parser processes text in what can be boiled down to three phases.
- Block/Line phase
- Overlay phase
- Inline phase
It should be noted that all phases have their own related parser
classes, and a shared behaviour system, where each parser takes control
at some point, and may win ambiguous cases by having higher priority
(see `#define_child`, `#define_overlay` methods for priority parameter)
### Block/Line phase ###
The first phase breaks down blocks, line by line, into block structures.
Blocks (preferably inherited from the Block class) can contain other blocks.
(i.e. QuoteBlock, ULBlock, OLBlock). Other blocks (known as leaf blocks)
may not contain anything else (except inline content, more on that later).
Blocks are designed to be parsed independently. This means that it *should*
be possible to tear out any standard block and make it not get parsed.
This, however, isn't thoroughly tested for.
Blocks as proper, real classes have a certain lifecycle to follow when
being constructed:
1. Open condition
- A block needs to find its first marker on the current line to open
(see `#begin?` method)
- Once it's open, it's immediately initialized and fed the line it just
read (but now as an object, not as a class) (see `#consume` method)
2. Marker/Line consumption
- While it should be kept open, the block parser instance will
keep reading inupt through `#consume` method, returning a pair
of modified line (after consuming its tokens from it) and
a boolean value indicating permission of lazy continuation
(if it's a block like a QuoteBlock or ULBlock that can be lazily
overflowed).
Every line the parser needs to record needs to be pushed
through the `#push` method.
3. Closure
- If the current line no longer belongs to the current block
(if the block should have been closed on the previous line),
it simply needs to `return` a pair of `nil`, and a boolean value for
permission of lazy continuation
- If a block should be closed on the current line, it should capture it,
keep track of the "closed" state, then `return` `nil` on the next call
of `#consume`
- Once a block is closed, it:
1. Receives its content from the parser
2. Parser receives the "close" method call
3. (optional) Parser may have a callable method `#applyprops`. If
it exists, it gets called with the current constructed block.
4. (optional) All overlays assigned to this block's class are
processed on the contents of this block (more on that in
Overlay phase)
5. (optional) Parser may return a different class, which
the current block should be cast into (Overlays may change
the class as well)
6. (optional) If a block can respond to `#parse_inner` method, it
will get called, allowing the block to parse its own contents.
- After this point, the block is no longer touched until the document
fully gets processed.
4. Inline processing
- (Applies only to Paragraph and any child of LeafBlock)
When the document gets fully processed, the contents of the current
block are taken, assigned to an InlineRoot instance, and then parsed
in Inline mode
5. Completion
- The resulting document is then returned.
While there is a lot of functionality available in desgining blocks, it is
not necessary for the simplest of the block kinds available. The simplest
example of a block parser is likely the ThematicBreakParser class, which
implements the only 2 methods needed for a block parser to function.
While parsing text, a block may use additional info:
- In consume method: `lazy` hasharg, if the current line is being processed
in lazy continuation mode (likely only ever matters for Paragraph); and
`parent` - the parent block containing this block.
Block interpretations are tried in decreasing order of their priority
value, as applied using the `#define_child` method.
For blocks to be properly indexed, they need to be a valid child or
a valid descendant (meaning reachable through child chain) of the
Document class.
### Overlay phase ###
Overlay phase doesn't start at some specific point in time. Rather,
Overlay phase happens for every block individually - when that block
closes.
Overlay mechanism can be applied to any DOMObject type, so long as its
close method is called at some point (this may not be of interest to
people that do not implement custom syntax, as it generally translates
to "only block level elements get their overlays processed")
Overlay mechanism provides the ability to perform some action on the block
right after it gets closed and right before it gets interpreted by the
inline phase. Overlays may do the following:
- Change the block's class
(by returning a class from the `#process` method)
- Change the block's content (by directly editing it)
- Change the block's properties (by modifying its `properties` hash)
Overlay interpretations are tried in decreasing order of their priority
value, as defined using the `#define_overlay` method.
### Inline phase ###
Once all blocks have been processed, and all overlays have been applied
to their respective block types, the hook in the Document class's
`#parser` method executes inline parsing phase of all leaf blocks
(descendants of the `Leaf` class) and paragraphs.
The outer class encompassing all inline children of a block is
`InlineRoot`. As such, if an inline element is to ever appear within the
text, it needs to be reachable as a child or a descendant of InlineRoot.
Inline parsing works in three parts:
- First, the contens are tokenized (every parser marks its own tokens)
- Second, the forward walk procedure is called
- Third, the reverse walk procedure is called
This process is repeated for every group of parsers with equal priority.
At one point in time, only all the parsers of equal priority may run in
the same step. Then, the process goes to the next step, of parsers of
higher priority value. As counter-intuitive as this is, this means that
it goes to the parsers of _lower_ priority.
At the very end of the process, the remaining strings are concatenated
within the mixed array of inlines and strings, and turned into Text
nodes, after which the contents of the array are appended as children to
the root node.
This process is recursively applied to all elements which may have child
elements. This is ensured when an inline parser calls the "build"
utility method.
The inline parser is a class that implements static methods `tokenize`
and either `forward_walk` or `reverse_walk`. Both may be implemented at
the same time, but this isn't advisable.
The tokenization process is characterized by calling every parser in the
current group with every string in tokens array using the `tokenize`
method. It is expected that the parser breaks the string down into an
array of other strings and tokens. A token is an array where the first
element is the literal text representation of the token, the second
value is the class of the parser, and the _last_ value (_not third_) is
the `:close` or `:open` symbol (though functionally it may hold any
symbol value). Any additional information the parser may need in later
stages may be stored between the last element and the second element.
Example:
Input:
"_this _is a string of_ tokens_"
Output:
[["_", ::PointBlank::Parsing::EmphInline, :open],
"this ",
["_", ::PointBlank::Parsing::EmphInline, :open],
"is a string of",
["_", ::PointBlank::Parsing::EmphInline, :close],
" tokens",
["_", ::PointBlank::Parsing::EmphInline, :close]]
The forward walk is characterized by calling parsers which implement the
`#forward_walk` method. When the main class encounters an opening token
in `forward_walk`, it will call the `#forward_walk` method of the class
that represents this token. It is expected that the parser class will
then attempt to build the first available occurence of the inline
element it represents, after which it will return the array of all
tokens and strings that it was passed where the first element will be
the newly constructed inline element. If it is unable to close the
block, it should simply return the original contents, unmodified.
Example:
Original text:
this is outside the inline `this is inside the inline` and this
is right after the inline `and this is the next inline`
Input:
[["`", ::PointBlank::Parsing::CodeInline, :open],
"this is inside the inline"
["`", ::PointBlank::Parsing::CodeInline, :close],
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
Output:
[<::PointBlank::DOM::InlineCode
@content = "this is inside the inline">,
" and this is right after the inline ",
["`", ::PointBlank::Parsing::CodeInline, :open],
"and this is the next inline"
["`", ::PointBlank::Parsing::CodeInline, :close]]
The reverse walk is characterized by calling parsers which implement the
`#reverse_walk` method when the main class encounters a closing token
for this class (the one that contains the `:close` symbol in the last
position of the token information array). After that the main class will
call the parser's `#reverse_walk` method with the current list of
tokens, inlines and strings. It is expected that the parser will then
collect all the blocks, strings and inlines that fit within the block
closed by the last element in the list, and once it encounters the
appropriate opening token for the closing token in the last position of
the array, it will then replace the elements fitting within that inline
with a class containing all the collected elements. If it is unable to
find a matching opening token for the closing token in the last
position, it should simply return the original contents, unmodified.
Example:
Original text:
blah blah something something lots of text before the emphasis
_this is emphasized `and this is an inline` but it's still
emphasized_
Input:
["blah blah something something lots of text before the emphasis",
["_", ::PointBlank::Parsing::EmphInline, :open],
"this is emphasized",
<::PointBlank::DOM::InlineCode,
@content = "and this is an inline">,
" but it's still emphasized",
["_", ::PointBlank::Parsing::EmphInline, :close]]
Output:
["blah blah something something lots of text before the emphasis",
<::PointBlank::DOM::InlineEmphasis,
children = [...,
<::PointBlank::DOM::InlineCode ...>
...]>]
Both `#forward_walk` and `#reverse_walk` are not restricted to making
just the changes discussed above, and can arbitrarily modify the token
arrays. That, however, should be done with great care, so as to not
accidentally break compatibility with other parsers.
To ensure that the collected tokens in the `#reverse_walk` and
`#forward_walk` are processes correctly, the colllected arrays of
tokens, blocks and inlines should be built into an object that
represents this parser using the `build` method (it will automatically
attempt to find the correct class to construct using the
`#define_parser` directive in the DOMObject subclass definition)

View File

@ -1,82 +0,0 @@
HTML renderer
=============
HTML renderer does exactly what it says on the tin - renders markdown to HTML.
It offers the ability to modify the way output is generated, as well as tags
which are used for every block.
Global options
--------------
Global options are applied to the root of the configuration hash for the
renderer. They can be applied using the following pattern via command
line:
```
$ mmmdpp -o '"nameOfGlobalOption": <value, in JSON element form>' ...
# i.e.
$ mmmdpp -o '"linewrap": 65' ...
```
Following global options can be provided:
- `linewrap` - line wrapping, in number of text columns (80 by default)
- `init_level` - initial indent level of generated text (2 by default)
- `indent` - number of spaces per indent (2 by default)
- `nowrap` - do not output wrapping code, only direct translations of
markdown elements (false by default)
- `head` - array of head elements to add to the default template
output ([] by default)
- `preambule` - text contents to embed before the translation output
(part of template containing head element by default)
- `postambule` - text contents to embed after the translation output
(part of template by default)
Per-class overrides
-------------------
Applying a per-class override via command line options works like this:
```
# see the following paragraph for all known block classes
$ mmmdpp -o '"mapping"."Block::ClassName".override: <value, in JSON element form>' ...
# i.e.
$ mmmdpp -o '"mapping"."PointBlank::DOM::Paragraph".inline: true' ...
```
For library usage, these options roughly translate to the following hash, passed
as the second argument to object initializer:
```
{
"mapping" => {
"PointBlank::DOM::Paragraph" => {
inline: true
}
}
}
```
Following options can be applied to every class element:
- `tag` - name of the tag this class should be mapped to. (i.e.
'PointBlank::DOM::Paragraph' => 'p', 'PointBlank::DOM::InlineEmphasis' =>
'em')
- `sanitize` - sanitize entities in text. shouldn't really be used anywhere
outside of text
- `inline` - the tag should be considered self-closing
- `codeblock` - special case. disables wordwrap, makes the block uninlined
regardless of containing tags.
- `figcaption` - wrap tag into a `<figure>` tag. text contained in the tag is
copied into a caption.
- `outer` - hash of parameters for an outer wrapping tag. can be defined
recursively. all options in this list apply.
- `href` - add a href attribute. works only on links and classes containing
a `:uri` attribute
- `title` - add a title attribute. works only on classes that have a
`:title` attribute
- `style` - define an inline CSS style to embed into the tag.
- `src` - add an src attribute. works only on images and classes containing
a `:uri` attribute
- `alt` - add an alt attribute. works only on classes that have a
`:title` attribute

View File

@ -1,131 +0,0 @@
Plainterm renderer
==================
Plainterm renderer renders markdown documents into a prettier format for reading
in the terminal. It uses certain control codes to make it work. Styles applied
to various elements can be changed.
Applicable global options
-------------------------
Global options are applied to the root of the configuration hash for the
renderer. They can be applied using the following pattern via command
line:
```
$ mmmdpp -o '"nameOfGlobalOption": <value, in JSON element form>' ...
# i.e.
$ mmmdpp -o '"hsize": 65' ...
```
- `style` - override the style of certain elements. See "Style overrides"
- `hsize` - horizontal size to align the contents to. Automatically
set to the width of the current terminal by default, or, if unavailable,
to 80 columns.
Style overrides
---------------
Style overrides provide per-class overrides for element style. It's essentially
a stylesheet applied to the element.
Applying a style override via command line options works like this:
```
# see the following paragraph for all known block classes
$ mmmdpp -o '"style"."Block::ClassName".override: <value, in JSON element form>' ...
# i.e.
$ mmmdpp -o '"style"."PointBlank::DOM::Paragraph".indent: false' ...
```
For library usage, these options roughly translate to the following hash, passed
as the second argument to object initializer:
```
{
"style" => {
"PointBlank::DOM::Paragraph" => {
indent: false
}
}
}
```
Applicable style overrides:
- `indent` (boolean) - increase indentation
- `increase_level` (boolean) - decrease horizontal space occupied (needed with
indent)
- `center` (boolean) - center text
- `bold` (boolean) - render text in bold
- `italics` (boolean) - render text in italics
- `strikethrough` (boolean) - render text with strikethrough
- `bg` (text, #RGB) - set color background
- `fg` (text, #RGB) - set color foreground
- `box` (boolean) - render contents in an ascii box
- `rjust` (boolean) - right-justify text
- `extra_newlines` (boolean) - add extra newlines around the text block
- `underline_block` (boolean) - underline text block, from left visual boundary
of text to right visual boundary of text
- `underline_full_block` (boolean) - underline text block, from left border to
right border
- `bullet` (boolean) - add bullet to block, used for bullet lists
- `numbered` (boolean) - add numbered point to block, used for ordered lists
- `leftline` (boolean) - draw a line on the left side of the block, top to
bottom
Style defaults
--------------
These are the defaults applied to each class of text block
- `PointBlank::DOM::Paragraph`:
- `indent`
- `increase_level`
- `PointBlank::DOM::Text`:
- none applied by default
- `PointBlank::DOM::SetextHeading1` (underline style of heading in markdown,
level 1)
- `center`
- `bold`
- `extra_newlines`
- `underline_full_block`
- `PointBlank::DOM::SetextHeading2` (underline style of heading in markdown,
level 2)
- `center`
- `underline_block`
- `PointBlank::DOM::ATXHeading1` (hash-symbol prefix style of heading,
level 1)
- (same as SetextHeading1)
- `PointBlank::DOM::ATXHeading2` (hash-symbol heading, level 2)
- (same as SetextHeading2)
- `PointBlank::DOM::ATXHeading3` (hash-symbol heading, level 3)
- `underline`
- `bold`
- `PointBlank::DOM::ATXHeading4` (hash-symbol heading, level 4)
- `bold`
- `underline`
- `PointBlank::DOM::ATXHeading5` (hash-symbol heading, level 5)
- `underline`
- `PointBlank::DOM::ATXHeading6` (hash-symbol heading, level 6)
- `underline`
- `PointBlank::DOM::InlineImage` (image link)
- `underline`
- `PointBlank::DOM::InlineLink` (link)
- `underline`
- `PointBlank::DOM::InlinePre` (inline code)
- none by default
- `PointBlank::DOM::InlineEmphasis`
- `italics`
- `PointBlank::DOM::InlineStrong` (strong emphasis)
- `bold`
- `PointBlank::DOM::ULListElement` (element of an unordered list)
- `bullet`
- `increase_level`
- `PointBlank::DOM::OLListElement` (element of an ordered list)
- `numbered`
- `increase_level`
- `PointBlank::DOM::QuoteBlock`
- `leftline`
- `increase_level`
- `PointBlank::DOM::HorizontalRule`
- `hrule`

View File

@ -1,21 +0,0 @@
Security acknowledgements
=========================
While special care has been taken to prevent some of the more common common
vulnerabilities that might arise from using this parser, it does not prevent
certain issues which **which should be acknowledged**.
- It is possible to inject a form of one-click XSS into the website. In
particular, there are no restrictions placed on urls embedded within the links
(as per the description of CommonMark specification). As such, something as
simple as `[test](<javascript:dangerous code here>)` would be more than enough
to employ such an exploit.
- While generally speaking the parser acts stable on most tests, and precents
stray HTML tokens from occuring in the output text where appropriate, due to
the nontrivial nature of the task some form of XSS injection may or may not
occur. If such an incident occurs, please report it to the current maintainer
of the project.
- User input should NOT be trusted when it comes to applying options to
rendering. Some renderers, such as the HTML renderer, allow modifying the
style parameter for rendered tags, which when passed control of to an
untrusted party may become an XSS attack vector.