Python Scripting

Introduction

Python is widely popular programming language, which is mostly due to its simple learning curve and lots of different libraries design specifically for working with data and analysis.

Inside NetFlow you can create your own python scripts (and even more complex objects) which can further personalize your own workspace and processes.

Within NetFlow you can use Python for:

Parsing flat files.
Transforming data.
Generating files (reports, visualizations, …)
Training and using ML models.

In its most basic form all you have to do is implement a python function called do_work. This function is then automatically executed by the NetFlow and Python Server. Inputs and outputs are integrated with NetFlow Process context.

Input arguments for this function are prepared by NetFlow and can be either:

one or more Pandas DataFrames
file object
additional argument passed as an Dictionary.

Output of the function should be one of the following:

Dictionary of Pandas DataFrames.
Dictionary of files (encoded as base64 strings).

In below example a function do_work takes file argument as an input and returns Dictionary of Pandas DataFrames as an output. This result is then stored inside NetFlow and can be further used in other Node Types. This is a basic example of a parser script.

PY

import pandas as pd

def do_work(file, parameters):

    df = pd.read_csv(file, sep=";")
    for key, value in parameters:
        df[key] = value

    return {
        "csv_parsed" : df
    }

Types of Python scripts

Parser scripts

Parser scripts are used to extract tabular data out of files. Before you can access files with Python script files needs to be imported into process. This can be done by:

Uploading a file with a Fill Form node.
Reading files from hard drive/ blob storage with Folder Browser node.
Reading files from Sharepoint site with Sharepoint Connector (to do) node.

Once files are read into process they can be processed (parsed) with a python script. To do so add one of the following nodes into process:

Python Parser (to do) → returns one table per file.
Python Parser Multiple Node → returns multiple tables per one file (in a dictionary).

If there are multiple files to be processed in one instance, a python script is executed once per every file. All results (tables) are then merged and can be used further in a process.

Transformations scripts

Transformation scripts are used to transform and/or create new tables from existing tables that are available in the process. General use case for python transformations is when a special transformation is needed on the table which is not available on other node types. In this case a custom python script can be written and executed inside the process.

It is recommended that Python Script General node is used for transformation scripts. This node allows to return both transformed tables and potential generated files.

Report scripts

Report scripts are used to generate different files using python script. This could be either a simple text (csv) files with some data or a much more complex pdf files with text, images, tables,… Generated files became part of the process and users can directly download then, or use them latter in the process pipeline.

It is recommended that Python Script General node is used for reporting scripts. This node allows to return both transformed tables and potential generated files.

Tips for developing python scripts

Before you upload Python script to NetFlow is is recommended that the script is already tested and verified that it works correctly. You can also develop script inside NetFlow from scratch but debugging options are currently limited to NetFlow Debugger Environment. Therefore it is recommended that you have a simple local testing environment set up.

Here is an example of a simple local setup. Create a new folder MyProject and three files into it:

main.py : Where the main script will be written, together with do_work function required by NetFlow
test.py : Where a test of the main script will be done. This simulates the NetFlow executing do_work function.
demo_data.csv : And example of a flat file, which you want to parse and/or transform.

In main.py write your script and implement do_work function for NetFlow. At the end you will copy the content of this file into NetFlow. Here is an example:

PY

import pandas as pd

def do_work(file, parameters=dict()):
    """
    Function that will be executed by NetFlow. I would recommend
    that the this function is "lead" (i.e does not contain a lot of code).
    Instead write the main functionality in a separate function(s) and call 
    it from here.
    """
    return parse_custom_file(file, parameters)

def parse_custom_file(file, parameters):
    """
    Function that parse given file.
    """
    df = pd.read_csv(file, sep=";")
    for key, value in parameters:
        df[key] = value

    return {
        "csv_parsed" : df
    }

In test.py you import do_work method from main.py and call it on your test data. Here is an example:

PY

# This file is used to test the python script 
# to be used inside NetFlow

from main import do_work

# Create a parameters dict. Inside NetFlow this is generated
# by NetFlow itself.
parameters = {
    "Filename" : "demo_data.csv",
    "ParentFolder" : "MyProject"
            }
# Open the file in binary mode and call do_work function.
# It is important, that file is opened in binary mode!
with open('demo_data.csv', mode='rb') as f:
    result = do_work(f, parameters)

This example considers parsing scripts only. But the same set up would work for any other script type. You just have to set up two files. One where main script code is and the other one for testing.