|
|
# The Target Learning Pipeline
|
|
|
|
|
|
The process for target learning is separated into two separate steps:
|
|
|
1. Pre-processing
|
|
|
2. Learning
|
|
|
|
|
|
In the following a short description of the two steps is given, the important python files and scrips are listed. IT is explained how the python tool Sacred is used to precisely document experiment runs.
|
|
|
|
|
|
Home folder of the project is on the target-learning branch in the directory: `.../Tools/PythonTargetLearing/`.
|
|
|
|
|
|
## Pre-processing
|
|
|
|
|
|
The pre-processing takes trajectories and converts these into density maps in csv file format which will then be used in the second step for learning.
|
|
|
|
|
|
The pre-processing methods from `preprocessing/preprocessing.py` are called from a jupyter notebook to execute the individual steps. An example notebook can be found under `PythonTargetLearning/notebooks`. The separate steps are:
|
|
|
|
|
|
* Parsing in the trajectory file
|
|
|
* conversion to density maps
|
|
|
* filtering out duplicate density maps
|
|
|
* filtering out unwanted distributions e.g. 50/50
|
|
|
* creation of one single output file, containing a matrix in which each row contains the density map for one time step.
|
|
|
* ...
|
|
|
|
|
|
## Learning
|
|
|
|
|
|
The machine learning can be started from `scripts/main.py`. The user has to set the correct input and output directories. The creation and setup of the random forest is done in `rf/regression.py`. The main file requires
|
|
|
|
|
|
* a config.json file (e.g. under scripts `experiment.json`) defining the division into training and testing data sets and other experiment parameters
|
|
|
* number of cores for the parallel execution of the learning process
|
|
|
* location for the output files
|
|
|
|
|
|
Example: To execute an experiment simply write e.g. `python scripts\main.py with scripts\t_junction\hybrid.json "number_of_cores=4" --force --filestorage ../runs/hybrid`. (Note: --force is required for the sacred library.)
|
|
|
|
|
|
### Sacred File Storage System
|
|
|
|
|
|
The python library sacred is used to document all machine learning runs. The code for the file storage is under `utils/sacred/`. All methods in which results have to be stored are annotated with `@ingredient.capture`. For these methods sacred appends the method parameters `_run`,`_log` and `_config`. The run parameter saves all runtime computation results such as errors of the random forest. The log parameter writes to a standard logging output file. The config parameter **#TODO finish**
|
|
|
|
|
|
Sacred will in turn create an output folder containing the following files:
|
|
|
|
|
|
* ... |