Documentation

NQuadsParser
in package
implements Parser, QuadIterator Uses TmpStreamParserTrait

Parses only n-quads and n-triples but does it fast (thanks to parsing in chunks and extensive use of regullar expressions).

Tags
author

zozlak

Interfaces, Classes, Traits and Enums

Parser
QuadIterator

Table of Contents

BLANKNODE  = '(_:[^\\s<.]+)'
BLANKNODE1_STRICT  = '_:'
BLANKNODE2_STRICT  = '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
BLANKNODE3_STRICT  = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'
BLANKNODE4_STRICT  = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
COMMENT  = '\\s*(?>#.*)?'
COMMENT2  = '\\s*#.*'
COMMENT2_STRICT  = '\\s*#[^\\x0D\\x0A]*'
COMMENT_STRICT  = '\\s*(?>#[^\\x0D\\x0A]*)?'
EOL  = '[\\x0D\\x0A]+'
IRIREF  = '<([^>]+)>'
IRIREF_STRICT  = '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
LANGTAG  = '@([-a-zA-Z0-9]+)'
LANGTAG_STRICT  = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
LITERAL  = '"((?>[^"]|\\")*)"'
LITERAL_STRICT  = '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
MODE_QUADS  = 2
MODE_QUADS_STAR  = 4
MODE_TRIPLES  = 1
MODE_TRIPLES_STAR  = 3
READ_BUF_SIZE  = 8096
STAR_END  = '%\\G\\s*>>%'
STAR_START  = '%\\G\\s*<<%'
UCHAR  = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
$dataFactory  : DataFactory
$input  : StreamInterface
$level  : int
Recursion level of the start parser
$line  : string
Input line
$linesBuffer  : SplQueue
$mode  : int
$offset  : int
Character offset within a parsed line (used by the star parser)
$quads  : Generator
$readBuffer  : string
$regexp  : string
$regexpCommentLine  : string
$regexpGraph  : string
$regexpLineEnd  : string
$regexpObjGraph  : string
$regexpPred  : string
$regexpSbjPred  : string
$tmpStream  : resource|null
$unescapeMap  : array<string|int, mixed>
See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
__construct()  : mixed
Creates the parser.
__destruct()  : mixed
current()  : Quad
key()  : mixed
next()  : void
parse()  : QuadIterator
parseStream()  : QuadIterator
rewind()  : void
valid()  : bool
closeTmpStream()  : void
makeQuad()  : Quad
Converts regex matches array into a Quad.
parseStar()  : Quad
quadGenerator()  : Generator<string|int, Quad>
readLine()  : string
starQuadGenerator()  : Generator<string|int, Quad>
unescape()  : string

Constants

BLANKNODE2_STRICT

public mixed BLANKNODE2_STRICT = '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'

BLANKNODE3_STRICT

public mixed BLANKNODE3_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'

BLANKNODE4_STRICT

public mixed BLANKNODE4_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'

COMMENT2_STRICT

public mixed COMMENT2_STRICT = '\\s*#[^\\x0D\\x0A]*'

COMMENT_STRICT

public mixed COMMENT_STRICT = '\\s*(?>#[^\\x0D\\x0A]*)?'

IRIREF_STRICT

public mixed IRIREF_STRICT = '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'

LANGTAG_STRICT

public mixed LANGTAG_STRICT = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'

LITERAL_STRICT

public mixed LITERAL_STRICT = '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'

UCHAR

public mixed UCHAR = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'

Properties

$level

Recursion level of the start parser

private int $level

$offset

Character offset within a parsed line (used by the star parser)

private int $offset

$unescapeMap

See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR

private array<string|int, mixed> $unescapeMap

Methods

__construct()

Creates the parser.

public __construct(DataFactory $dataFactory[, bool $strict = false ][, int $mode = self::MODE_QUADS_STAR ]) : mixed

Parser can work in four different modes according to $strict and $ntriples parameter values.

When $strict = true regular expressions following strictly n-triples/n-quads formal definition are used (see https://www.w3.org/TR/n-quads/#sec-grammar and https://www.w3.org/TR/n-triples/#n-triples-grammar). When $strict = false simplified regular expressions are used. Simplified variants provide a little faster parsing and are (much) easier to debug. All data which are valid according to the strict syntax can be properly parsed in the simplified mode, therefore until you need to check the input is 100% correct RDF, you may just stick to simplified mode.

Parameters
$dataFactory : DataFactory

factory to be used to generate RDF terms.

$strict : bool = false

should strict RDF syntax be enforced?

$mode : int = self::MODE_QUADS_STAR

parsing mode - one of modes listed below. It's worth noting that \quickRdfIo\NQuadsParser::MODE_QUADS_STAR is able to parse all others and there should be no significant performance difference between different parsing modes. They main reason for using non-default one is to assure the input data follow a given format.

  • \quickRdfIo\NQuadsParser::MODE_TRIPLES,
  • \quickRdfIo\NQuadsParser::MODE_QUADS,
  • \quickRdfIo\NQuadsParser::MODE_TRIPLES_STAR
  • \quickRdfIo\NQuadsParser::MODE_QUADS_STAR
Return values
mixed

__destruct()

public __destruct() : mixed
Return values
mixed

makeQuad()

Converts regex matches array into a Quad.

private makeQuad(array<string|int, ?string> &$matches) : Quad
Parameters
$matches : array<string|int, ?string>
Return values
Quad

readLine()

private readLine() : string
Tags
throws
LogicException
Return values
string

unescape()

private unescape(string $value) : string
Parameters
$value : string
Return values
string

Search results