NQuadsParser
in package
implements
ParserInterface, QuadIteratorInterface
Uses
TmpStreamParserTrait
Parses only n-quads and n-triples but does it fast (thanks to parsing in chunks and extensive use of regullar expressions).
Tags
Interfaces, Classes, Traits and Enums
Table of Contents
- BLANKNODE = '(_:[^\\s<.]+)'
- BLANKNODE1_STRICT = '_:'
- BLANKNODE2_STRICT = '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
- BLANKNODE3_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'
- BLANKNODE4_STRICT = '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
- COMMENT = '\\s*(?>#.*)?'
- COMMENT2 = '\\s*#.*'
- COMMENT2_STRICT = '\\s*#[^\\x0D\\x0A]*'
- COMMENT_STRICT = '\\s*(?>#[^\\x0D\\x0A]*)?'
- EOL = '[\\x0D\\x0A]+'
- IRIREF = '<([^>]+)>'
- IRIREF_STRICT = '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
- LANGTAG = '@([-a-zA-Z0-9]+)'
- LANGTAG_STRICT = '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
- LITERAL = '"((?>[^"]|\\")*)"'
- LITERAL_STRICT = '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
- MODE_QUADS = 2
- MODE_QUADS_STAR = 4
- MODE_TRIPLES = 1
- MODE_TRIPLES_STAR = 3
- READ_BUF_SIZE = 8096
- STAR_END = '%\\G\\s*>>%'
- STAR_START = '%\\G\\s*<<%'
- UCHAR = '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
- $dataFactory : DataFactoryInterface
- $input : StreamInterface
- $level : int
- Recursion level of the start parser
- $line : string
- Input line
- $linesBuffer : SplQueue<string|int, string>
- $mode : int
- $offset : int
- Character offset within a parsed line (used by the star parser)
- $quads : Generator<string|int, QuadInterface>
- $readBuffer : string
- $regexp : string
- $regexpCommentLine : string
- $regexpGraph : string
- $regexpLineEnd : string
- $regexpObjGraph : string
- $regexpPred : string
- $regexpSbjPred : string
- $tmpStream : resource|null
- $unescapeMap : array<string, string>
- See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
- __construct() : mixed
- Creates the parser.
- __destruct() : mixed
- current() : QuadInterface
- key() : mixed
- next() : void
- parse() : QuadIteratorInterface
- parseStream() : QuadIteratorInterface
- rewind() : void
- valid() : bool
- closeTmpStream() : void
- makeQuad() : QuadInterface
- Converts regex matches array into a Quad.
- parseStar() : QuadInterface
- quadGenerator() : Generator<string|int, QuadInterface>
- readLine() : string
- starQuadGenerator() : Generator<string|int, QuadInterface>
- unescape() : string
Constants
BLANKNODE
public
mixed
BLANKNODE
= '(_:[^\\s<.]+)'
Tags
BLANKNODE1_STRICT
public
mixed
BLANKNODE1_STRICT
= '_:'
Tags
BLANKNODE2_STRICT
public
mixed
BLANKNODE2_STRICT
= '[0-9_:A-Za-z\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0370}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
Tags
BLANKNODE3_STRICT
public
mixed
BLANKNODE3_STRICT
= '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}.]'
Tags
BLANKNODE4_STRICT
public
mixed
BLANKNODE4_STRICT
= '[-0-9_:A-Za-z\\x{00B7}\\x{00C0}-\\x{00D6}\\x{00D8}-\\x{00F6}\\x{00F8}-\\x{02FF}\\x{0300}-\\x{037D}\\x{037F}-\\x{1FFF}\\x{200C}-\\x{200D}\\x{203F}-\\x{2040}\\x{2070}-\\x{218F}\\x{2C00}-\\x{2FEF}\\x{3001}-\\x{D7FF}\\x{F900}-\\x{FDCF}\\x{FDF0}-\\x{FFFD}\\x{10000}-\\x{EFFFF}]'
Tags
COMMENT
public
mixed
COMMENT
= '\\s*(?>#.*)?'
Tags
COMMENT2
public
mixed
COMMENT2
= '\\s*#.*'
Tags
COMMENT2_STRICT
public
mixed
COMMENT2_STRICT
= '\\s*#[^\\x0D\\x0A]*'
Tags
COMMENT_STRICT
public
mixed
COMMENT_STRICT
= '\\s*(?>#[^\\x0D\\x0A]*)?'
Tags
EOL
public
mixed
EOL
= '[\\x0D\\x0A]+'
Tags
IRIREF
public
mixed
IRIREF
= '<([^>]+)>'
Tags
IRIREF_STRICT
public
mixed
IRIREF_STRICT
= '<((?>[^\\x{00}-\\x{20}<>"{}|^`\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)>'
Tags
LANGTAG
public
mixed
LANGTAG
= '@([-a-zA-Z0-9]+)'
Tags
LANGTAG_STRICT
public
mixed
LANGTAG_STRICT
= '@([a-zA-Z]+(?>-[a-zA-Z0-9]+)*)'
Tags
LITERAL
public
mixed
LITERAL
= '"((?>[^"]|\\")*)"'
Tags
LITERAL_STRICT
public
mixed
LITERAL_STRICT
= '"((?>[^\\x{22}\\x{5C}\\x{0A}\\x{0D}]|\\\\[tbnrf"\'\\\\]|\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8})*)"'
Tags
MODE_QUADS
public
mixed
MODE_QUADS
= 2
Tags
MODE_QUADS_STAR
public
mixed
MODE_QUADS_STAR
= 4
Tags
MODE_TRIPLES
public
mixed
MODE_TRIPLES
= 1
Tags
MODE_TRIPLES_STAR
public
mixed
MODE_TRIPLES_STAR
= 3
Tags
READ_BUF_SIZE
public
mixed
READ_BUF_SIZE
= 8096
Tags
STAR_END
public
mixed
STAR_END
= '%\\G\\s*>>%'
Tags
STAR_START
public
mixed
STAR_START
= '%\\G\\s*<<%'
Tags
UCHAR
public
mixed
UCHAR
= '\\\\u[0-9A-Fa-f]{4}|\\\\U[0-9A-Fa-f]{8}'
Tags
Properties
$dataFactory
private
DataFactoryInterface
$dataFactory
Tags
$input
private
StreamInterface
$input
Tags
$level
Recursion level of the start parser
private
int
$level
Tags
$line
Input line
private
string
$line
Tags
$linesBuffer
private
SplQueue<string|int, string>
$linesBuffer
Tags
$mode
private
int
$mode
Tags
$offset
Character offset within a parsed line (used by the star parser)
private
int
$offset
Tags
$quads
private
Generator<string|int, QuadInterface>
$quads
Tags
$readBuffer
private
string
$readBuffer
Tags
$regexp
private
string
$regexp
Tags
$regexpCommentLine
private
string
$regexpCommentLine
Tags
$regexpGraph
private
string
$regexpGraph
Tags
$regexpLineEnd
private
string
$regexpLineEnd
Tags
$regexpObjGraph
private
string
$regexpObjGraph
Tags
$regexpPred
private
string
$regexpPred
Tags
$regexpSbjPred
private
string
$regexpSbjPred
Tags
$tmpStream
private
resource|null
$tmpStream
Tags
$unescapeMap
See https://www.w3.org/TR/n-quads/#grammar-production-ECHAR
private
array<string, string>
$unescapeMap
Tags
Methods
__construct()
Creates the parser.
public
__construct(DataFactoryInterface $dataFactory[, bool $strict = false ][, int $mode = self::MODE_QUADS_STAR ]) : mixed
Parser can work in four different modes according to $strict
and $ntriples
parameter values.
When $strict = true
regular expressions following strictly n-triples/n-quads
formal definition are used (see https://www.w3.org/TR/n-quads/#sec-grammar and
https://www.w3.org/TR/n-triples/#n-triples-grammar). When $strict = false
simplified regular expressions are used. Simplified variants provide a little
faster parsing and are (much) easier to debug. All data which are valid according
to the strict syntax can be properly parsed in the simplified mode, therefore
until you need to check the input is 100% correct RDF, you may just stick to
simplified mode.
Parameters
- $dataFactory : DataFactoryInterface
-
factory to be used to generate RDF terms.
- $strict : bool = false
-
should strict RDF syntax be enforced?
- $mode : int = self::MODE_QUADS_STAR
-
parsing mode - one of modes listed below. It's worth noting that \quickRdfIo\NQuadsParser::MODE_QUADS_STAR is able to parse all others and there should be no significant performance difference between different parsing modes. They main reason for using non-default one is to assure the input data follow a given format.
- \quickRdfIo\NQuadsParser::MODE_TRIPLES,
- \quickRdfIo\NQuadsParser::MODE_QUADS,
- \quickRdfIo\NQuadsParser::MODE_TRIPLES_STAR
- \quickRdfIo\NQuadsParser::MODE_QUADS_STAR
Tags
Return values
mixed —__destruct()
public
__destruct() : mixed
Tags
Return values
mixed —current()
public
current() : QuadInterface
Tags
Return values
QuadInterface —key()
public
key() : mixed
Tags
Return values
mixed —next()
public
next() : void
Tags
Return values
void —parse()
public
parse(string $input) : QuadIteratorInterface
Parameters
- $input : string
Tags
Return values
QuadIteratorInterface —parseStream()
public
parseStream(mixed $input) : QuadIteratorInterface
Parameters
- $input : mixed
Tags
Return values
QuadIteratorInterface —rewind()
public
rewind() : void
Tags
Return values
void —valid()
public
valid() : bool
Tags
Return values
bool —closeTmpStream()
private
closeTmpStream() : void
Tags
Return values
void —makeQuad()
Converts regex matches array into a Quad.
private
makeQuad(array<string|int, ?string> &$matches) : QuadInterface
Parameters
- $matches : array<string|int, ?string>
Tags
Return values
QuadInterface —parseStar()
private
parseStar() : QuadInterface
Tags
Return values
QuadInterface —quadGenerator()
private
quadGenerator() : Generator<string|int, QuadInterface>
Tags
Return values
Generator<string|int, QuadInterface> —readLine()
private
readLine() : string
Tags
Return values
string —starQuadGenerator()
private
starQuadGenerator() : Generator<string|int, QuadInterface>
Tags
Return values
Generator<string|int, QuadInterface> —unescape()
private
unescape(string $value) : string
Parameters
- $value : string