Documentation

NQuadsParser
in package
implements ParserInterface, QuadIteratorInterface Uses TmpStreamParserTrait

Parses only n-quads and n-triples but does it fast (thanks to parsing in chunks and extensive use of regullar expressions).

Tags
author

zozlak

Interfaces, Classes, Traits and Enums

ParserInterface
QuadIteratorInterface

Table of Contents

BLANKNODE  = '(_:[^\\s<.]+)'
BLANKNODE1_STRICT  = '_:'
BLANKNODE2_STRICT  = '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
BLANKNODE3_STRICT  = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'
BLANKNODE4_STRICT  = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
COMMENT  = '\\s*(?>#.*)?'
COMMENT2  = '\\s*#.*'
COMMENT2_STRICT  = '\\s*#[^\\x0D\\x0A]*'
COMMENT_STRICT  = '\\s*(?>#[^\\x0D\\x0A]*)?'
EOL  = '[\\x0D\\x0A]+'
IRIREF  = '<([^>]+)>'
IRIREF_STRICT  = '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
LANGTAG  = '@([-a-zA-Z0-9]+)'
LANGTAG_STRICT  = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
LITERAL  = '"((?>[^"]|\\")*)"'
LITERAL_STRICT  = '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
MODE_QUADS  = 2
MODE_QUADS_STAR  = 4
MODE_TRIPLES  = 1
MODE_TRIPLES_STAR  = 3
READ_BUF_SIZE  = 8096
STAR_END  = '%\\G\\s*>>%'
STAR_START  = '%\\G\\s*<<%'
UCHAR  = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
$dataFactory  : DataFactoryInterface
$input  : StreamInterface
$level  : int
Recursion level of the start parser
$line  : string
Input line
$linesBuffer  : SplQueue<string|int, string>
$mode  : int
$offset  : int
Character offset within a parsed line (used by the star parser)
$quads  : Generator<string|int, QuadInterface>
$readBuffer  : string
$regexp  : string
$regexpCommentLine  : string
$regexpGraph  : string
$regexpLineEnd  : string
$regexpObjGraph  : string
$regexpPred  : string
$regexpSbjPred  : string
$tmpStream  : resource|null
$unescapeMap  : array<string, string>
See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
__construct()  : mixed
Creates the parser.
__destruct()  : mixed
current()  : QuadInterface
key()  : mixed
next()  : void
parse()  : QuadIteratorInterface
parseStream()  : QuadIteratorInterface
rewind()  : void
valid()  : bool
closeTmpStream()  : void
makeQuad()  : QuadInterface
Converts regex matches array into a Quad.
parseStar()  : QuadInterface
quadGenerator()  : Generator<string|int, QuadInterface>
readLine()  : string
starQuadGenerator()  : Generator<string|int, QuadInterface>
unescape()  : string

Constants

BLANKNODE1_STRICT

public mixed BLANKNODE1_STRICT = '_:'
Tags

BLANKNODE2_STRICT

public mixed BLANKNODE2_STRICT = '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
Tags

BLANKNODE3_STRICT

public mixed BLANKNODE3_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'
Tags

BLANKNODE4_STRICT

public mixed BLANKNODE4_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
Tags

COMMENT2_STRICT

public mixed COMMENT2_STRICT = '\\s*#[^\\x0D\\x0A]*'
Tags

COMMENT_STRICT

public mixed COMMENT_STRICT = '\\s*(?>#[^\\x0D\\x0A]*)?'
Tags

IRIREF_STRICT

public mixed IRIREF_STRICT = '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
Tags

LANGTAG_STRICT

public mixed LANGTAG_STRICT = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
Tags

LITERAL_STRICT

public mixed LITERAL_STRICT = '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
Tags

MODE_TRIPLES_STAR

public mixed MODE_TRIPLES_STAR = 3
Tags

UCHAR

public mixed UCHAR = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
Tags

Properties

$level

Recursion level of the start parser

private int $level
Tags

$linesBuffer

private SplQueue<string|int, string> $linesBuffer
Tags

$offset

Character offset within a parsed line (used by the star parser)

private int $offset
Tags

$regexpCommentLine

private string $regexpCommentLine
Tags

$unescapeMap

See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR

private array<string, string> $unescapeMap
Tags

Methods

__construct()

Creates the parser.

public __construct(DataFactoryInterface $dataFactory[, bool $strict = false ][, int $mode = self::MODE_QUADS_STAR ]) : mixed

Parser can work in four different modes according to $strict and $ntriples parameter values.

When $strict = true regular expressions following strictly n-triples/n-quads formal definition are used (see https://www.w3.org/TR/n-quads/#sec-grammar and https://www.w3.org/TR/n-triples/#n-triples-grammar). When $strict = false simplified regular expressions are used. Simplified variants provide a little faster parsing and are (much) easier to debug. All data which are valid according to the strict syntax can be properly parsed in the simplified mode, therefore until you need to check the input is 100% correct RDF, you may just stick to simplified mode.

Parameters
$dataFactory : DataFactoryInterface

factory to be used to generate RDF terms.

$strict : bool = false

should strict RDF syntax be enforced?

$mode : int = self::MODE_QUADS_STAR

parsing mode - one of modes listed below. It's worth noting that \quickRdfIo\NQuadsParser::MODE_QUADS_STAR is able to parse all others and there should be no significant performance difference between different parsing modes. They main reason for using non-default one is to assure the input data follow a given format.

  • \quickRdfIo\NQuadsParser::MODE_TRIPLES,
  • \quickRdfIo\NQuadsParser::MODE_QUADS,
  • \quickRdfIo\NQuadsParser::MODE_TRIPLES_STAR
  • \quickRdfIo\NQuadsParser::MODE_QUADS_STAR
Tags
Return values
mixed

__destruct()

public __destruct() : mixed
Tags
Return values
mixed

key()

public key() : mixed
Tags
Return values
mixed

next()

public next() : void
Tags
Return values
void

rewind()

public rewind() : void
Tags
Return values
void

valid()

public valid() : bool
Tags
Return values
bool

makeQuad()

Converts regex matches array into a Quad.

private makeQuad(array<string|int, ?string> &$matches) : QuadInterface
Parameters
$matches : array<string|int, ?string>
Tags
Return values
QuadInterface

readLine()

private readLine() : string
Tags
throws
LogicException
Return values
string

unescape()

private unescape(string $value) : string
Parameters
$value : string
Tags
Return values
string

        

Search results