grafeno.pipeline module

The pipeline module allows the user to write full pipelines of experiments in a dict, which can then be loaded and run by the library with one function call:

from grafeno import pipeline

experiment = {
    'text': 'Colorless green ideas sleep furiously.',
    'parser': 'freeling', 
    'transformers': [ 'all' ],
    'linearizers': [ 'triplets' ]
}

result = pipeline.run(experiment)
print(result)

Pipeline Formatting

The following attributes for the pipeline dict are supported.

Note

The pipeline is designed so that it can be easily serialized and loaded from a string format such as YAML, making repeatable experiments as easy as writing into a text file what operations to perform, and with what arguments.

Input

Input to the pipeline is required. It can be an already constructed graph, otherwise text, parser and transformers will be needed.

  • graph: a Graph

  • text: a raw natural language text.

  • parser: what parser to use to process the text. Possible values are:

    Note

    This is just a shortcut for using as first transformer a module named <parser_type>_parser. This allows parsers to be changed easily and independently from the rest of the pipeline.

    Warning

    To use a specific parser, it must be installed and available to grafeno. For freeling, the analyze executable must be in the path, in the case of spacy, the module must be importable.

  • transformers: list of transformer names to use (see grafeno.transformers)

  • transformer_args: dict of arguments for the transformers

Operation

  • operations: a list of dicts, each with an op attribute with the operation name, and the rest of the arguments to be used as parameters for the operation.

Output

A text if a linearizers attribute is present, otherwise the raw graph obtained is returned.

  • linearizers: list of linearizer names to use (see grafeno.linearizers)
  • linearizer_args: dict of arguments for the linearizers

See also

Some pre-built pipelines can be found in the config directory, written in YAML: Pre-built Pipelines.

grafeno.pipeline.run(pipeline)

Run a complete pipeline of graph operations.

Parameters:

pipeline : dict

The pipeline description.

Returns:

The result from running the pipeline with the provided arguments.