Documentation

NQuadsParser
in package
implements ParserInterface, QuadIteratorInterface uses TmpStreamParserTrait, StreamSkipBomTrait

Parses only n-quads and n-triples but does it fast (thanks to parsing in chunks and extensive use of regullar expressions).

Tags
author

zozlak

Table of Contents

Interfaces

ParserInterface
QuadIteratorInterface

Constants

BLANKNODE  = '(_:[^\s<.]+)'
BLANKNODE1_STRICT  = '_:'
BLANKNODE2_STRICT  = '[0-9_:A-Za-z\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0370}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
BLANKNODE3_STRICT  = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}.]'
BLANKNODE4_STRICT  = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
COMMENT  = '\s*(?>#.*)?'
COMMENT2  = '\s*#.*'
COMMENT2_STRICT  = '\s*#[^\x0D\x0A]*'
COMMENT_STRICT  = '\s*(?>#[^\x0D\x0A]*)?'
EOL  = '[\x0D\x0A]+'
IRIREF  = '<([^>]+)>'
IRIREF_STRICT  = '<((?>[^\x{00}-\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
LANGTAG  = '@([-a-zA-Z0-9]+)'
LANGTAG_STRICT  = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
LITERAL  = '"((?>[^\x{22}\x{5C}\x{0A}\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
MODE_QUADS  = 2
MODE_QUADS_STAR  = 4
MODE_TRIPLES  = 1
MODE_TRIPLES_STAR  = 3
READ_BUF_SIZE  = 8096
STAR_END  = '%\G\s*>>%'
STAR_START  = '%\G\s*<<%'
UCHAR  = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'

Properties

$bomUtf8  : mixed
$dataFactory  : DataFactoryInterface
$input  : StreamInterface
$invalidBoms2B  : mixed
$invalidBoms3B  : mixed
$invalidBoms4B  : mixed
$level  : int
Recursion level of the start parser
$line  : string
Input line
$linesBuffer  : SplQueue<string|int, string>
$mode  : int
$offset  : int
Character offset within a parsed line (used by the star parser)
$quads  : Generator<string|int, QuadInterface>
$readBuffer  : string
$regexp  : string
$regexpCommentLine  : string
$regexpGraph  : string
$regexpLineEnd  : string
$regexpObjGraph  : string
$regexpPred  : string
$regexpSbjPred  : string
$tmpStream  : resource|null
$unescapeMap  : array<string, string>
See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR

Methods

__construct()  : mixed
Creates the parser.
__destruct()  : mixed
current()  : QuadInterface
key()  : mixed
next()  : void
parse()  : QuadIteratorInterface
parseStream()  : QuadIteratorInterface
rewind()  : void
valid()  : bool
closeTmpStream()  : void
makeQuad()  : QuadInterface
Converts regex matches array into a Quad.
parseStar()  : QuadInterface
quadGenerator()  : Generator<string|int, QuadInterface>
readLine()  : string
skipBom()  : void
starQuadGenerator()  : Generator<string|int, QuadInterface>
unescape()  : string

Constants

BLANKNODE2_STRICT

public mixed BLANKNODE2_STRICT = '[0-9_:A-Za-z\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0370}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'

BLANKNODE3_STRICT

public mixed BLANKNODE3_STRICT = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}.]'

BLANKNODE4_STRICT

public mixed BLANKNODE4_STRICT = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'

COMMENT2_STRICT

public mixed COMMENT2_STRICT = '\s*#[^\x0D\x0A]*'

COMMENT_STRICT

public mixed COMMENT_STRICT = '\s*(?>#[^\x0D\x0A]*)?'

IRIREF_STRICT

public mixed IRIREF_STRICT = '<((?>[^\x{00}-\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'

LANGTAG_STRICT

public mixed LANGTAG_STRICT = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'

LITERAL

public mixed LITERAL = '"((?>[^\x{22}\x{5C}\x{0A}\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'

UCHAR

public mixed UCHAR = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'

Properties

$invalidBoms2B

private mixed $invalidBoms2B = ["\xef\xff" => "UTF-16 BE", "\xff\xfe" => "UTF-16 LE"]

$invalidBoms3B

private mixed $invalidBoms3B = ["+/v" => "UTF-7", "\xf7dL" => "UTF-1", "\x0e\xfe\xff" => "SCSU", "\xfb\xee(" => "BOCU-1"]

$invalidBoms4B

private mixed $invalidBoms4B = ["\x00\x00\xfe\xff" => "UTF-32 BE", "\xff\xfe\x00\x00" => "UTF-32 LE", "\xddsfs" => "UTF-EBCDIC", "\x841\x953" => "GB18030"]

$level

Recursion level of the start parser

private int $level

$linesBuffer

private SplQueue<string|int, string> $linesBuffer

$offset

Character offset within a parsed line (used by the star parser)

private int $offset

$unescapeMap

See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR

private array<string, string> $unescapeMap

Methods

__construct()

Creates the parser.

public __construct(DataFactoryInterface $dataFactory[, bool $strict = false ][, int $mode = self::MODE_QUADS_STAR ]) : mixed

Parser can work in four different modes according to $strict and $ntriples parameter values.

When $strict = true regular expressions following strictly n-triples/n-quads formal definition are used (see https://www.w3.org/TR/n-quads/#sec-grammar and https://www.w3.org/TR/n-triples/#n-triples-grammar). When $strict = false simplified regular expressions are used. Simplified variants provide a little faster parsing and are (much) easier to debug. All data which are valid according to the strict syntax can be properly parsed in the simplified mode, therefore until you need to check the input is 100% correct RDF, you may just stick to simplified mode.

Parameters
$dataFactory : DataFactoryInterface

factory to be used to generate RDF terms.

$strict : bool = false

should strict RDF syntax be enforced?

$mode : int = self::MODE_QUADS_STAR

parsing mode - one of modes listed below. It's worth noting that \quickRdfIo\NQuadsParser::MODE_QUADS_STAR is able to parse all others and there should be no significant performance difference between different parsing modes. They main reason for using non-default one is to assure the input data follow a given format.

  • \quickRdfIo\NQuadsParser::MODE_TRIPLES,
  • \quickRdfIo\NQuadsParser::MODE_QUADS,
  • \quickRdfIo\NQuadsParser::MODE_TRIPLES_STAR
  • \quickRdfIo\NQuadsParser::MODE_QUADS_STAR

makeQuad()

Converts regex matches array into a Quad.

private makeQuad(array<string|int, string|null> &$matches) : QuadInterface
Parameters
$matches : array<string|int, string|null>
Return values
QuadInterface

readLine()

private readLine() : string
Tags
throws
LogicException
Return values
string

skipBom()

private skipBom(StreamInterface $stream) : void
Parameters
$stream : StreamInterface

unescape()

private unescape(string $value) : string
Parameters
$value : string
Return values
string

        
On this page

Search results