grafeno.pipeline module¶
The pipeline module allows the user to write full pipelines of experiments in a dict, which can then be loaded and run by the library with one function call:
from grafeno import pipeline
experiment = {
'text': 'Colorless green ideas sleep furiously.',
'parser': 'freeling',
'transformers': [ 'all' ],
'linearizers': [ 'triplets' ]
}
result = pipeline.run(experiment)
print(result)
Pipeline Formatting¶
The following attributes for the pipeline dict are supported.
Note
The pipeline is designed so that it can be easily serialized and loaded from a string format such as YAML, making repeatable experiments as easy as writing into a text file what operations to perform, and with what arguments.
Input
Input to the pipeline is required. It can be an already constructed graph, otherwise text, parser and transformers will be needed.
graph: a
Graph
text: a raw natural language text.
- parser: what parser to use to process the text. Possible values are:
freeling
: http://nlp.lsi.upc.edu/freeling/node/1spacy
: https://spacy.io
Note
This is just a shortcut for using as first transformer a module named
<parser_type>_parser
. This allows parsers to be changed easily and independently from the rest of the pipeline.Warning
To use a specific parser, it must be installed and available to grafeno. For freeling, the analyze executable must be in the path, in the case of spacy, the module must be importable.
transformers: list of transformer names to use (see
grafeno.transformers
)transformer_args: dict of arguments for the transformers
Operation
- operations: a list of dicts, each with an
op
attribute with the operation name, and the rest of the arguments to be used as parameters for the operation.
Output
A text if a linearizers attribute is present, otherwise the raw graph obtained is returned.
- linearizers: list of linearizer names to use (see
grafeno.linearizers
) - linearizer_args: dict of arguments for the linearizers
See also
Some pre-built pipelines can be found in the config
directory,
written in YAML: Pre-built Pipelines.
-
grafeno.pipeline.
run
(pipeline)¶ Run a complete pipeline of graph operations.
Parameters: pipeline : dict
The pipeline description.
Returns: The result from running the pipeline with the provided arguments.