NQuadsParser
in package
implements
ParserInterface, QuadIteratorInterface
uses
TmpStreamParserTrait, StreamSkipBomTrait
Parses only n-quads and n-triples but does it fast (thanks to parsing in chunks and extensive use of regullar expressions).
Tags
Table of Contents
Interfaces
Constants
- BLANKNODE = '(_:[^\s<.]+)'
- BLANKNODE1_STRICT = '_:'
- BLANKNODE2_STRICT = '[0-9_:A-Za-z\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0370}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
- BLANKNODE3_STRICT = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}.]'
- BLANKNODE4_STRICT = '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
- COMMENT = '\s*(?>#.*)?'
- COMMENT2 = '\s*#.*'
- COMMENT2_STRICT = '\s*#[^\x0D\x0A]*'
- COMMENT_STRICT = '\s*(?>#[^\x0D\x0A]*)?'
- EOL = '[\x0D\x0A]+'
- IRIREF = '<([^>]+)>'
- IRIREF_STRICT = '<((?>[^\x{00}-\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
- LANGTAG = '@([-a-zA-Z0-9]+)'
- LANGTAG_STRICT = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
- LITERAL = '"((?>[^\x{22}\x{5C}\x{0A}\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
- MODE_QUADS = 2
- MODE_QUADS_STAR = 4
- MODE_TRIPLES = 1
- MODE_TRIPLES_STAR = 3
- READ_BUF_SIZE = 8096
- STAR_END = '%\G\s*>>%'
- STAR_START = '%\G\s*<<%'
- UCHAR = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
Properties
- $bomUtf8 : mixed
- $dataFactory : DataFactoryInterface
- $input : StreamInterface
- $invalidBoms2B : mixed
- $invalidBoms3B : mixed
- $invalidBoms4B : mixed
- $level : int
- Recursion level of the start parser
- $line : string
- Input line
- $linesBuffer : SplQueue<string|int, string>
- $mode : int
- $offset : int
- Character offset within a parsed line (used by the star parser)
- $quads : Generator<string|int, QuadInterface>
- $readBuffer : string
- $regexp : string
- $regexpCommentLine : string
- $regexpGraph : string
- $regexpLineEnd : string
- $regexpObjGraph : string
- $regexpPred : string
- $regexpSbjPred : string
- $tmpStream : resource|null
- $unescapeMap : array<string, string>
- See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
Methods
- __construct() : mixed
- Creates the parser.
- __destruct() : mixed
- current() : QuadInterface
- key() : mixed
- next() : void
- parse() : QuadIteratorInterface
- parseStream() : QuadIteratorInterface
- rewind() : void
- valid() : bool
- closeTmpStream() : void
- makeQuad() : QuadInterface
- Converts regex matches array into a Quad.
- parseStar() : QuadInterface
- quadGenerator() : Generator<string|int, QuadInterface>
- readLine() : string
- skipBom() : void
- starQuadGenerator() : Generator<string|int, QuadInterface>
- unescape() : string
Constants
BLANKNODE
public
mixed
BLANKNODE
= '(_:[^\s<.]+)'
BLANKNODE1_STRICT
public
mixed
BLANKNODE1_STRICT
= '_:'
BLANKNODE2_STRICT
public
mixed
BLANKNODE2_STRICT
= '[0-9_:A-Za-z\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0370}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
BLANKNODE3_STRICT
public
mixed
BLANKNODE3_STRICT
= '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}.]'
BLANKNODE4_STRICT
public
mixed
BLANKNODE4_STRICT
= '[-0-9_:A-Za-z\x{00B7}\x{00C0}-\x{00D6}\x{00D8}-\x{00F6}\x{00F8}-\x{02FF}\x{0300}-\x{037D}\x{037F}-\x{1FFF}\x{200C}-\x{200D}\x{203F}-\x{2040}\x{2070}-\x{218F}\x{2C00}-\x{2FEF}\x{3001}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFFD}\x{10000}-\x{EFFFF}]'
COMMENT
public
mixed
COMMENT
= '\s*(?>#.*)?'
COMMENT2
public
mixed
COMMENT2
= '\s*#.*'
COMMENT2_STRICT
public
mixed
COMMENT2_STRICT
= '\s*#[^\x0D\x0A]*'
COMMENT_STRICT
public
mixed
COMMENT_STRICT
= '\s*(?>#[^\x0D\x0A]*)?'
EOL
public
mixed
EOL
= '[\x0D\x0A]+'
IRIREF
public
mixed
IRIREF
= '<([^>]+)>'
IRIREF_STRICT
public
mixed
IRIREF_STRICT
= '<((?>[^\x{00}-\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
LANGTAG
public
mixed
LANGTAG
= '@([-a-zA-Z0-9]+)'
LANGTAG_STRICT
public
mixed
LANGTAG_STRICT
= '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
LITERAL
public
mixed
LITERAL
= '"((?>[^\x{22}\x{5C}\x{0A}\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
MODE_QUADS
public
mixed
MODE_QUADS
= 2
MODE_QUADS_STAR
public
mixed
MODE_QUADS_STAR
= 4
MODE_TRIPLES
public
mixed
MODE_TRIPLES
= 1
MODE_TRIPLES_STAR
public
mixed
MODE_TRIPLES_STAR
= 3
READ_BUF_SIZE
public
mixed
READ_BUF_SIZE
= 8096
STAR_END
public
mixed
STAR_END
= '%\G\s*>>%'
STAR_START
public
mixed
STAR_START
= '%\G\s*<<%'
UCHAR
public
mixed
UCHAR
= '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
Properties
$bomUtf8
private
mixed
$bomUtf8
= ""
$dataFactory
private
DataFactoryInterface
$dataFactory
$input
private
StreamInterface
$input
$invalidBoms2B
private
mixed
$invalidBoms2B
= ["\xef\xff" => "UTF-16 BE", "\xff\xfe" => "UTF-16 LE"]
$invalidBoms3B
private
mixed
$invalidBoms3B
= ["+/v" => "UTF-7", "\xf7dL" => "UTF-1", "\x0e\xfe\xff" => "SCSU", "\xfb\xee(" => "BOCU-1"]
$invalidBoms4B
private
mixed
$invalidBoms4B
= ["\x00\x00\xfe\xff" => "UTF-32 BE", "\xff\xfe\x00\x00" => "UTF-32 LE", "\xddsfs" => "UTF-EBCDIC", "\x841\x953" => "GB18030"]
$level
Recursion level of the start parser
private
int
$level
$line
Input line
private
string
$line
$linesBuffer
private
SplQueue<string|int, string>
$linesBuffer
$mode
private
int
$mode
$offset
Character offset within a parsed line (used by the star parser)
private
int
$offset
$quads
private
Generator<string|int, QuadInterface>
$quads
$readBuffer
private
string
$readBuffer
$regexp
private
string
$regexp
$regexpCommentLine
private
string
$regexpCommentLine
$regexpGraph
private
string
$regexpGraph
$regexpLineEnd
private
string
$regexpLineEnd
$regexpObjGraph
private
string
$regexpObjGraph
$regexpPred
private
string
$regexpPred
$regexpSbjPred
private
string
$regexpSbjPred
$tmpStream
private
resource|null
$tmpStream
$unescapeMap
See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
private
array<string, string>
$unescapeMap
Methods
__construct()
Creates the parser.
public
__construct(DataFactoryInterface $dataFactory[, bool $strict = false ][, int $mode = self::MODE_QUADS_STAR ]) : mixed
Parser can work in four different modes according to $strict
and $ntriples
parameter values.
When $strict = true
regular expressions following strictly n-triples/n-quads
formal definition are used (see https://www.w3.org/TR/n-quads/#sec-grammar and
https://www.w3.org/TR/n-triples/#n-triples-grammar). When $strict = false
simplified regular expressions are used. Simplified variants provide a little
faster parsing and are (much) easier to debug. All data which are valid according
to the strict syntax can be properly parsed in the simplified mode, therefore
until you need to check the input is 100% correct RDF, you may just stick to
simplified mode.
Parameters
- $dataFactory : DataFactoryInterface
-
factory to be used to generate RDF terms.
- $strict : bool = false
-
should strict RDF syntax be enforced?
- $mode : int = self::MODE_QUADS_STAR
-
parsing mode - one of modes listed below. It's worth noting that \quickRdfIo\NQuadsParser::MODE_QUADS_STAR is able to parse all others and there should be no significant performance difference between different parsing modes. They main reason for using non-default one is to assure the input data follow a given format.
- \quickRdfIo\NQuadsParser::MODE_TRIPLES,
- \quickRdfIo\NQuadsParser::MODE_QUADS,
- \quickRdfIo\NQuadsParser::MODE_TRIPLES_STAR
- \quickRdfIo\NQuadsParser::MODE_QUADS_STAR
__destruct()
public
__destruct() : mixed
current()
public
current() : QuadInterface
Return values
QuadInterfacekey()
public
key() : mixed
next()
public
next() : void
parse()
public
parse(string $input) : QuadIteratorInterface
Parameters
- $input : string
Return values
QuadIteratorInterfaceparseStream()
public
parseStream(resource|StreamInterface $input) : QuadIteratorInterface
Parameters
- $input : resource|StreamInterface
Return values
QuadIteratorInterfacerewind()
public
rewind() : void
valid()
public
valid() : bool
Return values
boolcloseTmpStream()
private
closeTmpStream() : void
makeQuad()
Converts regex matches array into a Quad.
private
makeQuad(array<string|int, string|null> &$matches) : QuadInterface
Parameters
- $matches : array<string|int, string|null>
Return values
QuadInterfaceparseStar()
private
parseStar(int $line) : QuadInterface
Parameters
- $line : int
Return values
QuadInterfacequadGenerator()
private
quadGenerator() : Generator<string|int, QuadInterface>
Tags
Return values
Generator<string|int, QuadInterface>readLine()
private
readLine() : string
Tags
Return values
stringskipBom()
private
skipBom(StreamInterface $stream) : void
Parameters
- $stream : StreamInterface
starQuadGenerator()
private
starQuadGenerator() : Generator<string|int, QuadInterface>
Tags
Return values
Generator<string|int, QuadInterface>unescape()
private
unescape(string $value) : string
Parameters
- $value : string