Parcourir la source

Project initialization

Jérôme BUISINE il y a 4 ans
Parent
commit
dc0463b6b5

+ 25 - 0
.gitignore

@@ -0,0 +1,25 @@
+# project data
+data/*
+saved_models/*
+threshold_map/*
+models_info/*
+custom_norm/*
+learned_zones/*
+corr_indices/*
+.ipynb_checkpoints
+
+# simulate_models.csv
+
+fichiersSVD_light
+
+.python-version
+__pycache__
+
+# by default avoid model files and png files
+saved_models/*.h5
+*.png
+!saved_models/*.png
+.vscode
+
+# simulate models .csv file
+simulate_models*.csv

+ 9 - 0
LICENSE

@@ -0,0 +1,9 @@
+MIT License
+Copyright (c) 2019 prise-3d
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+

+ 181 - 0
README.md

@@ -0,0 +1,181 @@
+# Noise detection using SVM
+
+## Requirements
+
+```
+pip install -r requirements.txt
+```
+
+Generate all needed data for each metrics (which requires the the whole dataset. In order to get it, you need to contact us).
+
+```bash
+python generate_all_data.py --metric all
+```
+
+For noise detection, many metrics are available:
+- lab
+- mscn
+- mscn_revisited
+- low_bits_2
+- low_bits_4
+- low_bits_5
+- low_bits_6
+- low_bits_4_shifted_2
+
+You can also specify metric you want to compute and image step to avoid some images:
+```bash
+python generate_all_data.py --metric mscn --step 50
+```
+
+- **step**: keep only image if image id % 50 == 0 (assumption is that keeping spaced data will let model better fit).
+
+## How to use
+
+### Multiple directories and scripts are available:
+
+
+- **fichiersSVD_light/\***: all scene files information (zones of each scene, SVD descriptor files information and so on...).
+- **train_model.py**: script which is used to run specific model available.
+- **data/\***: folder which will contain all *.train* & *.test* files in order to train model.
+- **saved_models/*.joblib**: all scikit learn models saved.
+- **models_info/***: all markdown files generated to get quick information about model performance and prediction. This folder contains also **model_comparisons.csv** obtained after running runAll_maxwell.sh script.
+- **modules/\***: contains all modules usefull for the whole project (such as configuration variables)
+
+### Scripts for generating data files
+
+Two scripts can be used for generating data in order to fit model:
+- **generate_data_model.py**: zones are specified and stayed fixed for each scene
+- **generate_data_model_random.py**: zones are chosen randomly (just a number of zone is specified)
+- **generate_data_model_random_maxwell.py**: zones are chosen randomly (just a number of zone is specified). Only maxwell scene are used.
+
+
+**Remark**: Note here that all python script have *--help* command.
+
+```
+python generate_data_model.py --help
+
+python generate_data_model.py --output xxxx --interval 0,20  --kind svdne --scenes "A, B, D" --zones "0, 1, 2" --percent 0.7 --sep: --rowindex 1 --custom custom_min_max_filename
+```
+
+Parameters explained:
+- **output**: filename of data (which will be split into two parts, *.train* and *.test* relative to your choices).
+- **interval**: the interval of data you want to use from SVD vector.
+- **kind**: kind of data ['svd', 'svdn', 'svdne']; not normalize, normalize vector only and normalize together.
+- **scenes**: scenes choice for training dataset.
+- **zones**: zones to take for training dataset.
+- **percent**: percent of data amount of zone to take (choose randomly) of zone
+- **sep**: output csv file seperator used
+- **rowindex**: if 1 then row will be like that 1:xxxxx, 2:xxxxxx, ..., n:xxxxxx
+- **custom**: specify if you want your data normalized using interval and not the whole singular values vector. If it is, the value of this parameter is the output filename which will store the min and max value found. This file will be usefull later to make prediction with model (optional parameter).
+
+### Train model
+
+This is an example of how to train a model
+
+```bash
+python train_model.py --data 'data/xxxxx.train' --output 'model_file_to_save' --choice 'model_choice'
+```
+
+Expected values for the **choice** parameter are ['svm_model', 'ensemble_model', 'ensemble_model_v2'].
+
+### Predict image using model
+
+Now we have a model trained, we can use it with an image as input:
+
+```bash
+python predict_noisy_image_svd.py --image path/to/image.png --interval "x,x" --model saved_models/xxxxxx.joblib --metric 'lab' --mode 'svdn' --custom 'min_max_filename'
+```
+
+- **metric**: metric choice need to be one of the listed above.
+- **custom**: specify filename with custom min and max from your data interval. This file was generated using **custom** parameter of one of the **generate_data_model\*.py** script (optional parameter).
+
+The model will return only 0 or 1:
+- 1 means noisy image is detected.
+- 0 means image seem to be not noisy.
+
+All SVD metrics developed need:
+- Name added into *metric_choices_labels* global array variable of **modules/utils/config.py** file.
+- A specification of how you compute the metric into *get_svd_data* method of **modules/utils/data_type.py** file.
+
+### Predict scene using model
+
+Now we have a model trained, we can use it with an image as input:
+
+```bash
+python prediction_scene.py --data path/to/xxxx.csv --model saved_model/xxxx.joblib --output xxxxx --scene xxxx
+```
+**Remark**: *scene* parameter expected need to be the correct name of the Scene.
+
+### Visualize data
+
+All scripts with names **display_\*.py** are used to display data information or results.
+
+Just use --help option to get more information.
+
+### Simulate model on scene
+
+All scripts named **predict_seuil_expe\*.py** are used to simulate model prediction during rendering process. Do not forget the **custom** parameter filename if necessary.
+
+Once you have simulation done. Checkout your **threshold_map/%MODEL_NAME%/simulation\_curves\_zones\_\*/** folder and use it with help of **display_simulation_curves.py** script.
+
+## Others scripts
+
+### Test model on all scene data
+
+In order to see if a model well generalized, a bash script is available:
+
+```bash
+bash testModelByScene.sh '100' '110' 'saved_models/xxxx.joblib' 'svdne' 'lab'
+```
+
+Parameters list:
+- 1: Begin of interval of data from SVD to use
+- 2: End of interval of data from SVD to use
+- 3: Model we want to test
+- 4: Kind of data input used by trained model
+- 5: Metric used by model
+
+
+### Get treshold map
+
+Main objective of this project is to predict as well as a human the noise perception on a photo realistic image. Human threshold is available from training data. So a script was developed to give the predicted treshold from model and compare predicted treshold from the expected one.
+
+```bash
+python predict_seuil_expe.py --interval "x,x" --model 'saved_models/xxxx.joblib' --mode ["svd", "svdn", "svdne"] --metric ['lab', 'mscn', ...] --limit_detection xx --custom 'custom_min_max_filename'
+```
+
+Parameters list:
+- **model**: mode file saved to use
+- **interval**: the interval of data you want to use from SVD vector.
+- **mode**: kind of data ['svd', 'svdn', 'svdne']; not normalize, normalize vector only and normalize together.
+- **limit_detection**: number of not noisy images found to stop and return threshold (integer).
+- **custom**: custom filename where min and max values are stored (optional parameter).
+
+### Display model performance information
+
+Another script was developed to display into Mardown format the performance of a model.
+
+The content will be divised into two parts:
+- Predicted performance on all scenes
+- Treshold maps obtained from model on each scenes
+
+The previous script need to already have ran to obtain and display treshold maps on this markdown file.
+
+```bash
+python save_model_result_in_md.py --interval "xx,xx" --model saved_models/xxxx.joblib --mode ["svd", "svdn", "svdne"] --metric ['lab', 'mscn']
+```
+
+Parameters list:
+- **model**: mode file saved to use
+- **interval**: the interval of data you want to use from SVD vector.
+- **mode**: kind of data ['svd', 'svdn', 'svdne']; not normalize, normalize vector only and normalize together.
+
+Markdown file with all information is saved using model name into **models_info** folder.
+
+### Others...
+
+All others bash scripts are used to combine and run multiple model combinations...
+
+## License
+
+[The MIT license](https://github.com/prise-3d/Thesis-NoiseDetection-metrics/blob/master/LICENSE)

Fichier diff supprimé car celui-ci est trop grand
+ 355 - 0
analysis/.ipynb


Fichier diff supprimé car celui-ci est trop grand
+ 388 - 0
analysis/wavelet_filters_analysis.ipynb


+ 120 - 0
display_simulation_curves.py

@@ -0,0 +1,120 @@
+import numpy as np
+import pandas as pd
+
+import matplotlib.pyplot as plt
+import os, sys, argparse
+
+from modules.utils.data import get_svd_data
+
+from modules.utils import config as cfg
+
+learned_zones_folder = cfg.learned_zones_folder
+models_name          = cfg.models_names_list
+label_freq           = 6
+
+def display_curves(folder_path, model_name):
+    """
+    @brief Method used to display simulation given .csv files
+    @param folder_path, folder which contains all .csv files obtained during simulation
+    @param model_name, current name of model
+    @return nothing
+    """
+
+    for name in models_name:
+        if name in model_name:
+            data_filename = model_name
+            learned_zones_folder_path = os.path.join(learned_zones_folder, data_filename)
+
+    data_files = [x for x in os.listdir(folder_path) if '.png' not in x]
+
+    scene_names = [f.split('_')[3] for f in data_files]
+
+    for id, f in enumerate(data_files):
+
+        print(scene_names[id])
+        path_file = os.path.join(folder_path, f)
+
+        scenes_zones_used_file_path = os.path.join(learned_zones_folder_path, scene_names[id] + '.csv')
+
+        zones_used = []
+
+        with open(scenes_zones_used_file_path, 'r') as f:
+            zones_used = [int(x) for x in f.readline().split(';') if x != '']
+
+        print(zones_used)
+
+        df = pd.read_csv(path_file, header=None, sep=";")
+
+        fig=plt.figure(figsize=(35, 22))
+        fig.suptitle("Detection simulation for " + scene_names[id] + " scene", fontsize=20)
+
+        for index, row in df.iterrows():
+
+            row = np.asarray(row)
+
+            threshold = row[2]
+            start_index = row[3]
+            step_value = row[4]
+
+            counter_index = 0
+
+            current_value = start_index
+
+            while(current_value < threshold):
+                counter_index += 1
+                current_value += step_value
+
+            fig.add_subplot(4, 4, (index + 1))
+            plt.plot(row[5:])
+
+            if index in zones_used:
+                ax = plt.gca()
+                ax.set_facecolor((0.9, 0.95, 0.95))
+
+            # draw vertical line from (70,100) to (70, 250)
+            plt.plot([counter_index, counter_index], [-2, 2], 'k-', lw=2, color='red')
+
+            if index % 4 == 0:
+                plt.ylabel('Not noisy / Noisy', fontsize=20)
+
+            if index >= 12:
+                plt.xlabel('Samples per pixel', fontsize=20)
+
+            x_labels = [id * step_value + start_index for id, val in enumerate(row[5:]) if id % label_freq == 0]
+
+            x = [v for v in np.arange(0, len(row[5:])+1) if v % label_freq == 0]
+
+            plt.xticks(x, x_labels, rotation=45)
+            plt.ylim(-1, 2)
+
+        plt.savefig(os.path.join(folder_path, scene_names[id] + '_simulation_curve.png'))
+        #plt.show()
+
+def main():
+
+    parser = argparse.ArgumentParser(description="Display simulations curves from simulation data")
+
+    parser.add_argument('--folder', type=str, help='Folder which contains simulations data for scenes')
+    parser.add_argument('--model', type=str, help='Name of the model used for simulations')
+
+    args = parser.parse_args()
+
+    p_folder = args.folder
+
+    if args.model:
+        p_model = args.model
+    else:
+        # find p_model from folder if model arg not given (folder path need to have model name)
+        if p_folder.split('/')[-1]:
+            p_model = p_folder.split('/')[-1]
+        else:
+            p_model = p_folder.split('/')[-2]
+    
+    print(p_model)
+
+    display_curves(p_folder, p_model)
+
+    print(p_folder)
+
+if __name__== "__main__":
+    main()

+ 74 - 0
generateAndTrain_maxwell_custom.sh

@@ -0,0 +1,74 @@
+#! bin/bash
+
+if [ -z "$1" ]
+  then
+    echo "No argument supplied"
+    echo "Need of vector size"
+    exit 1
+fi
+
+if [ -z "$2" ]
+  then
+    echo "No argument supplied"
+    echo "Need of metric information"
+    exit 1
+fi
+
+result_filename="models_info/models_comparisons.csv"
+VECTOR_SIZE=200
+size=$1
+metric=$2
+
+# selection of four scenes (only maxwell)
+scenes="A, D, G, H"
+
+half=$(($size/2))
+start=-$half
+for counter in {0..4}; do
+    end=$(($start+$size))
+
+    if [ "$end" -gt "$VECTOR_SIZE" ]; then
+        start=$(($VECTOR_SIZE-$size))
+        end=$(($VECTOR_SIZE))
+    fi
+
+    if [ "$start" -lt "0" ]; then
+        start=$((0))
+        end=$(($size))
+    fi
+
+    for nb_zones in {4,6,8,10,12}; do
+
+        echo $start $end
+
+        for mode in {"svd","svdn","svdne"}; do
+            for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
+
+                FILENAME="data/${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                MODEL_NAME="${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                CUSTOM_MIN_MAX_FILENAME="N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}_min_max"
+
+                echo $FILENAME
+
+                # only compute if necessary (perhaps server will fall.. Just in case)
+                if grep -q "${MODEL_NAME}" "${result_filename}"; then
+
+                    echo "${MODEL_NAME} results already generated..."
+                else
+                    python generate_data_model_random.py --output ${FILENAME} --interval "${start},${end}" --kind ${mode} --metric ${metric} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 40 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
+
+                    #python predict_seuil_expe_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric} --limit_detection '2' --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python save_model_result_in_md_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric}
+                fi
+            done
+        done
+    done
+
+    if [ "$counter" -eq "0" ]; then
+        start=$(($start+50-$half))
+    else
+        start=$(($start+50))
+    fi
+
+done

+ 74 - 0
generateAndTrain_maxwell_custom_center.sh

@@ -0,0 +1,74 @@
+#! bin/bash
+
+if [ -z "$1" ]
+  then
+    echo "No argument supplied"
+    echo "Need of vector size"
+    exit 1
+fi
+
+if [ -z "$2" ]
+  then
+    echo "No argument supplied"
+    echo "Need of metric information"
+    exit 1
+fi
+
+result_filename="models_info/models_comparisons.csv"
+VECTOR_SIZE=200
+size=$1
+metric=$2
+
+# selection of four scenes (only maxwell)
+scenes="A, D, G, H"
+
+half=$(($size/2))
+start=-$half
+for counter in {0..4}; do
+    end=$(($start+$size))
+
+    if [ "$end" -gt "$VECTOR_SIZE" ]; then
+        start=$(($VECTOR_SIZE-$size))
+        end=$(($VECTOR_SIZE))
+    fi
+
+    if [ "$start" -lt "0" ]; then
+        start=$((0))
+        end=$(($size))
+    fi
+
+    for nb_zones in {4,6,8,10,12}; do
+
+        echo $start $end
+
+        for mode in {"svd","svdn","svdne"}; do
+            for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
+
+                FILENAME="data/${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                MODEL_NAME="${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                CUSTOM_MIN_MAX_FILENAME="N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}_min_max"
+
+                echo $FILENAME
+
+                # only compute if necessary (perhaps server will fall.. Just in case)
+                if grep -q "${MODEL_NAME}" "${result_filename}"; then
+
+                    echo "${MODEL_NAME} results already generated..."
+                else
+                    python generate_data_model_random_center.py --output ${FILENAME} --interval "${start},${end}" --kind ${mode} --metric ${metric} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 10 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
+
+                    #python predict_seuil_expe_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric} --limit_detection '2' --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python save_model_result_in_md_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric}
+                fi
+            done
+        done
+    done
+
+    if [ "$counter" -eq "0" ]; then
+        start=$(($start+50-$half))
+    else
+        start=$(($start+50))
+    fi
+
+done

+ 74 - 0
generateAndTrain_maxwell_custom_split.sh

@@ -0,0 +1,74 @@
+#! bin/bash
+
+if [ -z "$1" ]
+  then
+    echo "No argument supplied"
+    echo "Need of vector size"
+    exit 1
+fi
+
+if [ -z "$2" ]
+  then
+    echo "No argument supplied"
+    echo "Need of metric information"
+    exit 1
+fi
+
+result_filename="models_info/models_comparisons.csv"
+VECTOR_SIZE=200
+size=$1
+metric=$2
+
+# selection of four scenes (only maxwell)
+scenes="A, D, G, H"
+
+half=$(($size/2))
+start=-$half
+for counter in {0..4}; do
+    end=$(($start+$size))
+
+    if [ "$end" -gt "$VECTOR_SIZE" ]; then
+        start=$(($VECTOR_SIZE-$size))
+        end=$(($VECTOR_SIZE))
+    fi
+
+    if [ "$start" -lt "0" ]; then
+        start=$((0))
+        end=$(($size))
+    fi
+
+    for nb_zones in {4,6,8,10,12}; do
+
+        echo $start $end
+
+        for mode in {"svd","svdn","svdne"}; do
+            for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
+
+                FILENAME="data/${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                MODEL_NAME="${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                CUSTOM_MIN_MAX_FILENAME="N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}_min_max"
+
+                echo $FILENAME
+
+                # only compute if necessary (perhaps server will fall.. Just in case)
+                if grep -q "${MODEL_NAME}" "${result_filename}"; then
+
+                    echo "${MODEL_NAME} results already generated..."
+                else
+                    python generate_data_model_random_split.py --output ${FILENAME} --interval "${start},${end}" --kind ${mode} --metric ${metric} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 10 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
+
+                    #python predict_seuil_expe_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric} --limit_detection '2' --custom ${CUSTOM_MIN_MAX_FILENAME}
+                    python save_model_result_in_md_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric}
+                fi
+            done
+        done
+    done
+
+    if [ "$counter" -eq "0" ]; then
+        start=$(($start+50-$half))
+    else
+        start=$(($start+50))
+    fi
+
+done

+ 223 - 0
generate_all_data.py

@@ -0,0 +1,223 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Fri Sep 14 21:02:42 2018
+
+@author: jbuisine
+"""
+
+from __future__ import print_function
+import sys, os, getopt
+import numpy as np
+import random
+import time
+import json
+
+from modules.utils.data import get_svd_data
+from PIL import Image
+from ipfml import processing, metrics, utils
+from skimage import color
+
+from modules.utils import config as cfg
+
+# getting configuration information
+config_filename         = cfg.config_filename
+zone_folder             = cfg.zone_folder
+min_max_filename        = cfg.min_max_filename_extension
+
+# define all scenes values
+scenes_list             = cfg.scenes_names
+scenes_indexes          = cfg.scenes_indices
+choices                 = cfg.normalization_choices
+path                    = cfg.dataset_path
+zones                   = cfg.zones_indices
+seuil_expe_filename     = cfg.seuil_expe_filename
+
+metric_choices          = cfg.metric_choices_labels
+output_data_folder      = cfg.output_data_folder
+
+generic_output_file_svd = '_random.csv'
+
+def generate_data_svd(data_type, mode):
+    """
+    @brief Method which generates all .csv files from scenes
+    @param data_type,  metric choice
+    @param mode, normalization choice
+    @return nothing
+    """
+
+    scenes = os.listdir(path)
+    # remove min max file from scenes folder
+    scenes = [s for s in scenes if min_max_filename not in s]
+
+    # keep in memory min and max data found from data_type
+    min_val_found = sys.maxsize
+    max_val_found = 0
+
+    data_min_max_filename = os.path.join(path, data_type + min_max_filename)
+
+    # go ahead each scenes
+    for id_scene, folder_scene in enumerate(scenes):
+
+        print(folder_scene)
+        scene_path = os.path.join(path, folder_scene)
+
+        config_file_path = os.path.join(scene_path, config_filename)
+
+        with open(config_file_path, "r") as config_file:
+            last_image_name = config_file.readline().strip()
+            prefix_image_name = config_file.readline().strip()
+            start_index_image = config_file.readline().strip()
+            end_index_image = config_file.readline().strip()
+            step_counter = int(config_file.readline().strip())
+
+        # getting output filename
+        output_svd_filename = data_type + "_" + mode + generic_output_file_svd
+
+        # construct each zones folder name
+        zones_folder = []
+        svd_output_files = []
+
+        # get zones list info
+        for index in zones:
+            index_str = str(index)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+
+            current_zone = "zone"+index_str
+            zones_folder.append(current_zone)
+
+            zone_path = os.path.join(scene_path, current_zone)
+            svd_file_path = os.path.join(zone_path, output_svd_filename)
+
+            # add writer into list
+            svd_output_files.append(open(svd_file_path, 'w'))
+
+
+        current_counter_index = int(start_index_image)
+        end_counter_index = int(end_index_image)
+
+
+        while(current_counter_index <= end_counter_index):
+
+            current_counter_index_str = str(current_counter_index)
+
+            while len(start_index_image) > len(current_counter_index_str):
+                current_counter_index_str = "0" + current_counter_index_str
+
+            img_path = os.path.join(scene_path, prefix_image_name + current_counter_index_str + ".png")
+
+            current_img = Image.open(img_path)
+            img_blocks = processing.divide_in_blocks(current_img, (200, 200))
+
+            for id_block, block in enumerate(img_blocks):
+
+                ###########################
+                # Metric computation part #
+                ###########################
+
+                data = get_svd_data(data_type, block)
+
+                ##################
+                # Data mode part #
+                ##################
+
+                # modify data depending mode
+                if mode == 'svdne':
+
+                    # getting max and min information from min_max_filename
+                    with open(data_min_max_filename, 'r') as f:
+                        min_val = float(f.readline())
+                        max_val = float(f.readline())
+
+                    data = utils.normalize_arr_with_range(data, min_val, max_val)
+
+                if mode == 'svdn':
+                    data = utils.normalize_arr(data)
+
+                # save min and max found from dataset in order to normalize data using whole data known
+                if mode == 'svd':
+
+                    current_min = data.min()
+                    current_max = data.max()
+
+                    if current_min < min_val_found:
+                        min_val_found = current_min
+
+                    if current_max > max_val_found:
+                        max_val_found = current_max
+
+                # now write data into current writer
+                current_file = svd_output_files[id_block]
+
+                # add of index
+                current_file.write(current_counter_index_str + ';')
+
+                for val in data:
+                    current_file.write(str(val) + ";")
+
+                current_file.write('\n')
+
+            start_index_image_int = int(start_index_image)
+            print(data_type + "_" + mode + "_" + folder_scene + " - " + "{0:.2f}".format((current_counter_index - start_index_image_int) / (end_counter_index - start_index_image_int)* 100.) + "%")
+            sys.stdout.write("\033[F")
+
+            current_counter_index += step_counter
+
+        for f in svd_output_files:
+            f.close()
+
+        print('\n')
+
+    # save current information about min file found
+    if mode == 'svd':
+        with open(data_min_max_filename, 'w') as f:
+            f.write(str(min_val_found) + '\n')
+            f.write(str(max_val_found) + '\n')
+
+    print("%s_%s : end of data generation\n" % (data_type, mode))
+
+
+def main():
+
+    # default value of p_step
+    p_step = 1
+
+    # TODO : use of argparse
+    if len(sys.argv) <= 1:
+        print('Run with default parameters...')
+        print('python generate_all_data.py --metric all')
+        print('python generate_all_data.py --metric lab')
+        print('python generate_all_data.py --metric lab')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "hms", ["help=", "metric="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python generate_all_data.py --metric all')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python generate_all_data.py --metric all')
+            sys.exit()
+        elif o in ("-m", "--metric"):
+            p_metric = a
+
+            if p_metric != 'all' and p_metric not in metric_choices:
+                assert False, "Invalid metric choice"
+        else:
+            assert False, "unhandled option"
+
+    # generate all or specific metric data
+    if p_metric == 'all':
+        for m in metric_choices:
+            generate_data_svd(m, 'svd')
+            generate_data_svd(m, 'svdn')
+            generate_data_svd(m, 'svdne')
+    else:
+        generate_data_svd(p_metric, 'svd')
+        generate_data_svd(p_metric, 'svdn')
+        generate_data_svd(p_metric, 'svdne')
+
+if __name__== "__main__":
+    main()

+ 6 - 0
generate_all_simulate_curves.sh

@@ -0,0 +1,6 @@
+for file in "threshold_map"/*; do
+
+    echo ${file}
+
+    python display_simulation_curves.py --folder ${file}
+done

+ 276 - 0
generate_data_model.py

@@ -0,0 +1,276 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Fri Sep 14 21:02:42 2018
+
+@author: jbuisine
+"""
+
+from __future__ import print_function
+import sys, os, argparse
+import numpy as np
+import random
+import time
+import json
+
+from PIL import Image
+from ipfml import processing, metrics, utils
+
+from modules.utils import config as cfg
+from modules.utils import data as dt
+
+# getting configuration information
+config_filename         = cfg.config_filename
+learned_folder          = cfg.learned_zones_folder
+min_max_filename        = cfg.min_max_filename_extension
+
+# define all scenes values
+scenes_list             = cfg.scenes_names
+scenes_indexes          = cfg.scenes_indices
+choices                 = cfg.normalization_choices
+path                    = cfg.dataset_path
+zones                   = cfg.zones_indices
+seuil_expe_filename     = cfg.seuil_expe_filename
+
+renderer_choices        = cfg.renderer_choices
+normalization_choices   = cfg.normalization_choices
+metric_choices          = cfg.metric_choices_labels
+output_data_folder      = cfg.output_data_folder
+custom_min_max_folder   = cfg.min_max_custom_folder
+min_max_ext             = cfg.min_max_filename_extension
+zones_indices           = cfg.zones_indices
+
+generic_output_file_svd = '_random.csv'
+
+min_value_interval = sys.maxsize
+max_value_interval = 0
+
+def construct_new_line(path_seuil, interval, line, choice, each, norm):
+    begin, end = interval
+
+    line_data = line.split(';')
+    seuil = line_data[0]
+    metrics = line_data[begin+1:end+1]
+
+    metrics = [float(m) for id, m in enumerate(metrics) if id % each == 0 ]
+
+    if norm:
+        if choice == 'svdne':
+            metrics = utils.normalize_arr_with_range(metrics, min_value_interval, max_value_interval)
+        if choice == 'svdn':
+            metrics = utils.normalize_arr(metrics)
+
+    with open(path_seuil, "r") as seuil_file:
+        seuil_learned = int(seuil_file.readline().strip())
+
+    if seuil_learned > int(seuil):
+        line = '1'
+    else:
+        line = '0'
+
+    for idx, val in enumerate(metrics):
+        line += ';'
+        line += str(val)
+    line += '\n'
+
+    return line
+
+def get_min_max_value_interval(_scenes_list, _interval, _metric):
+
+    global min_value_interval, max_value_interval
+
+    scenes = os.listdir(path)
+
+    # remove min max file from scenes folder
+    scenes = [s for s in scenes if min_max_filename not in s]
+
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take care of maxwell scenes
+        if folder_scene in _scenes_list:
+
+            scene_path = os.path.join(path, folder_scene)
+
+            zones_folder = []
+            # create zones list
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zones_folder.append("zone"+index_str)
+
+            for id_zone, zone_folder in enumerate(zones_folder):
+                zone_path = os.path.join(scene_path, zone_folder)
+                data_filename = _metric + "_svd" + generic_output_file_svd
+                data_file_path = os.path.join(zone_path, data_filename)
+
+                # getting number of line and read randomly lines
+                f = open(data_file_path)
+                lines = f.readlines()
+
+                # check if user select current scene and zone to be part of training data set
+                for line in lines:
+
+                    begin, end = _interval
+
+                    line_data = line.split(';')
+                    metrics = line_data[begin+1:end+1]
+                    metrics = [float(m) for m in metrics]
+
+                    min_value = min(metrics)
+                    max_value = max(metrics)
+
+                    if min_value < min_value_interval:
+                        min_value_interval = min_value
+
+                    if max_value > max_value_interval:
+                        max_value_interval = max_value
+
+
+def generate_data_model(_filename, _interval, _choice, _metric, _scenes = scenes_list, _zones = zones_indices, _percent = 1, _step=1, _each=1, _norm=False, _custom=False):
+
+    output_train_filename = _filename + ".train"
+    output_test_filename = _filename + ".test"
+
+    if not '/' in output_train_filename:
+        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
+
+    # create path if not exists
+    if not os.path.exists(output_data_folder):
+        os.makedirs(output_data_folder)
+
+    train_file = open(output_train_filename, 'w')
+    test_file = open(output_test_filename, 'w')
+
+    for id_scene, folder_scene in enumerate(scenes_list):
+
+        # only take care of maxwell scenes
+        scene_path = os.path.join(path, folder_scene)
+
+        zones_indices = zones
+
+        # write into file
+        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
+
+        if not os.path.exists(folder_learned_path):
+            os.makedirs(folder_learned_path)
+
+        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
+
+        with open(file_learned_path, 'w') as f:
+            for i in _zones:
+                f.write(str(i) + ';')
+
+        for id_zone, index_folder in enumerate(zones_indices):
+
+            index_str = str(index_folder)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+            current_zone_folder = "zone" + index_str
+
+            zone_path = os.path.join(scene_path, current_zone_folder)
+
+            # if custom normalization choices then we use svd values not already normalized
+            if _custom:
+                data_filename = _metric + "_svd" + generic_output_file_svd
+            else:
+                data_filename = _metric + "_" + _choice + generic_output_file_svd
+
+            data_file_path = os.path.join(zone_path, data_filename)
+
+            # getting number of line and read randomly lines
+            f = open(data_file_path)
+            lines = f.readlines()
+
+            num_lines = len(lines)
+
+            lines_indexes = np.arange(num_lines)
+            random.shuffle(lines_indexes)
+
+            path_seuil = os.path.join(zone_path, seuil_expe_filename)
+
+            counter = 0
+            # check if user select current scene and zone to be part of training data set
+            for index in lines_indexes:
+
+                image_index = int(lines[index].split(';')[0])
+                percent = counter / num_lines
+
+                if image_index % _step == 0:
+                    line = construct_new_line(path_seuil, _interval, lines[index], _choice, _each, _norm)
+
+                    if id_zone in _zones and folder_scene in _scenes and percent <= _percent:
+                        train_file.write(line)
+                    else:
+                        test_file.write(line)
+
+                counter += 1
+
+            f.close()
+
+    train_file.close()
+    test_file.close()
+
+
+def main():
+
+    # getting all params
+    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
+
+    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
+    parser.add_argument('--metric', type=str, help='Metric data choice', choices=metric_choices)
+    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
+    parser.add_argument('--zones', type=str, help='Zones indices to use for training data set')
+    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)', default=1.0)
+    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
+    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
+    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
+    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
+
+    args = parser.parse_args()
+
+    p_filename = args.output
+    p_interval = list(map(int, args.interval.split(',')))
+    p_kind     = args.kind
+    p_metric   = args.metric
+    p_scenes   = args.scenes.split(',')
+    p_zones    = list(map(int, args.zones.split(',')))
+    p_percent  = args.percent
+    p_step     = args.step
+    p_each     = args.each
+    p_renderer = args.renderer
+    p_custom   = args.custom
+
+    # list all possibles choices of renderer
+    scenes_list = dt.get_renderer_scenes_names(p_renderer)
+    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
+
+    # getting scenes from indexes user selection
+    scenes_selected = []
+
+    for scene_id in p_scenes:
+        index = scenes_indexes.index(scene_id.strip())
+        scenes_selected.append(scenes_list[index])
+
+    # find min max value if necessary to renormalize data
+    if p_custom:
+        get_min_max_value_interval(scenes_list, p_interval, p_metric)
+
+        # write new file to save
+        if not os.path.exists(custom_min_max_folder):
+            os.makedirs(custom_min_max_folder)
+
+        min_max_folder_path = os.path.join(os.path.dirname(__file__), custom_min_max_folder)
+        min_max_filename_path = os.path.join(min_max_folder_path, p_custom)
+
+        with open(min_max_filename_path, 'w') as f:
+            f.write(str(min_value_interval) + '\n')
+            f.write(str(max_value_interval) + '\n')
+
+    # create database using img folder (generate first time only)
+    generate_data_model(p_filename, p_interval, p_kind, p_metric, scenes_selected, p_zones, p_percent, p_step, p_each, p_custom)
+
+if __name__== "__main__":
+    main()

+ 303 - 0
generate_data_model_random.py

@@ -0,0 +1,303 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Fri Sep 14 21:02:42 2018
+
+@author: jbuisine
+"""
+
+from __future__ import print_function
+import sys, os, argparse
+import numpy as np
+import random
+import time
+import json
+
+from PIL import Image
+from ipfml import processing, metrics, utils
+
+from modules.utils import config as cfg
+from modules.utils import data as dt
+
+# getting configuration information
+config_filename         = cfg.config_filename
+learned_folder          = cfg.learned_zones_folder
+min_max_filename        = cfg.min_max_filename_extension
+
+# define all scenes values
+all_scenes_list         = cfg.scenes_names
+all_scenes_indices      = cfg.scenes_indices
+
+normalization_choices   = cfg.normalization_choices
+path                    = cfg.dataset_path
+zones                   = cfg.zones_indices
+seuil_expe_filename     = cfg.seuil_expe_filename
+
+renderer_choices        = cfg.renderer_choices
+metric_choices          = cfg.metric_choices_labels
+output_data_folder      = cfg.output_data_folder
+custom_min_max_folder   = cfg.min_max_custom_folder
+min_max_ext             = cfg.min_max_filename_extension
+
+generic_output_file_svd = '_random.csv'
+
+min_value_interval      = sys.maxsize
+max_value_interval      = 0
+
+def construct_new_line(path_seuil, interval, line, choice, each, norm):
+    begin, end = interval
+
+    line_data = line.split(';')
+    seuil = line_data[0]
+    metrics = line_data[begin+1:end+1]
+
+    # keep only if modulo result is 0 (keep only each wanted values)
+    metrics = [float(m) for id, m in enumerate(metrics) if id % each == 0]
+
+    # TODO : check if it's always necessary to do that (loss of information for svd)
+    if norm:
+
+        if choice == 'svdne':
+            metrics = utils.normalize_arr_with_range(metrics, min_value_interval, max_value_interval)
+        if choice == 'svdn':
+            metrics = utils.normalize_arr(metrics)
+
+    with open(path_seuil, "r") as seuil_file:
+        seuil_learned = int(seuil_file.readline().strip())
+
+    if seuil_learned > int(seuil):
+        line = '1'
+    else:
+        line = '0'
+
+    for idx, val in enumerate(metrics):
+        line += ';'
+        line += str(val)
+    line += '\n'
+
+    return line
+
+def get_min_max_value_interval(_scenes_list, _interval, _metric):
+
+    global min_value_interval, max_value_interval
+
+    scenes = os.listdir(path)
+
+    # remove min max file from scenes folder
+    scenes = [s for s in scenes if min_max_filename not in s]
+
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take care of maxwell scenes
+        if folder_scene in _scenes_list:
+
+            scene_path = os.path.join(path, folder_scene)
+
+            zones_folder = []
+            # create zones list
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zones_folder.append("zone"+index_str)
+
+            for id_zone, zone_folder in enumerate(zones_folder):
+
+                zone_path = os.path.join(scene_path, zone_folder)
+
+                # if custom normalization choices then we use svd values not already normalized
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+
+                data_file_path = os.path.join(zone_path, data_filename)
+
+                # getting number of line and read randomly lines
+                f = open(data_file_path)
+                lines = f.readlines()
+
+                # check if user select current scene and zone to be part of training data set
+                for line in lines:
+
+                    begin, end = _interval
+
+                    line_data = line.split(';')
+
+                    metrics = line_data[begin+1:end+1]
+                    metrics = [float(m) for m in metrics]
+
+                    min_value = min(metrics)
+                    max_value = max(metrics)
+
+                    if min_value < min_value_interval:
+                        min_value_interval = min_value
+
+                    if max_value > max_value_interval:
+                        max_value_interval = max_value
+
+
+def generate_data_model(_scenes_list, _filename, _interval, _choice, _metric, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
+
+    output_train_filename = _filename + ".train"
+    output_test_filename = _filename + ".test"
+
+    if not '/' in output_train_filename:
+        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
+
+    # create path if not exists
+    if not os.path.exists(output_data_folder):
+        os.makedirs(output_data_folder)
+
+    train_file_data = []
+    test_file_data  = []
+
+    for id_scene, folder_scene in enumerate(_scenes_list):
+
+        scene_path = os.path.join(path, folder_scene)
+
+        zones_indices = zones
+
+        # shuffle list of zones (=> randomly choose zones)
+        # only in random mode
+        if _random:
+            random.shuffle(zones_indices)
+
+        # store zones learned
+        learned_zones_indices = zones_indices[:_nb_zones]
+
+        # write into file
+        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
+
+        if not os.path.exists(folder_learned_path):
+            os.makedirs(folder_learned_path)
+
+        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
+
+        with open(file_learned_path, 'w') as f:
+            for i in learned_zones_indices:
+                f.write(str(i) + ';')
+
+        for id_zone, index_folder in enumerate(zones_indices):
+
+            index_str = str(index_folder)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+            current_zone_folder = "zone" + index_str
+
+            zone_path = os.path.join(scene_path, current_zone_folder)
+
+            # if custom normalization choices then we use svd values not already normalized
+            if _custom:
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+            else:
+                data_filename = _metric + "_" + _choice + generic_output_file_svd
+
+            data_file_path = os.path.join(zone_path, data_filename)
+
+            # getting number of line and read randomly lines
+            f = open(data_file_path)
+            lines = f.readlines()
+
+            num_lines = len(lines)
+
+            # randomly shuffle image
+            if _random:
+                random.shuffle(lines)
+
+            path_seuil = os.path.join(zone_path, seuil_expe_filename)
+
+            counter = 0
+            # check if user select current scene and zone to be part of training data set
+            for data in lines:
+
+                percent = counter / num_lines
+                image_index = int(data.split(';')[0])
+
+                if image_index % _step == 0:
+                    line = construct_new_line(path_seuil, _interval, data, _choice, _each, _custom)
+
+                    if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
+                        train_file_data.append(line)
+                    else:
+                        test_file_data.append(line)
+
+                counter += 1
+
+            f.close()
+
+    train_file = open(output_train_filename, 'w')
+    test_file = open(output_test_filename, 'w')
+
+    for line in train_file_data:
+        train_file.write(line)
+
+    for line in test_file_data:
+        test_file.write(line)
+
+    train_file.close()
+    test_file.close()
+
+
+def main():
+
+    # getting all params
+    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
+
+    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
+    parser.add_argument('--metric', type=str, help='Metric data choice', choices=metric_choices)
+    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
+    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
+    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
+    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
+    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
+    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
+    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
+    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
+
+    args = parser.parse_args()
+
+    p_filename = args.output
+    p_interval = list(map(int, args.interval.split(',')))
+    p_kind     = args.kind
+    p_metric   = args.metric
+    p_scenes   = args.scenes.split(',')
+    p_nb_zones = args.nb_zones
+    p_random   = args.random
+    p_percent  = args.percent
+    p_step     = args.step
+    p_each     = args.each
+    p_renderer = args.renderer
+    p_custom   = args.custom
+
+
+    # list all possibles choices of renderer
+    scenes_list = dt.get_renderer_scenes_names(p_renderer)
+    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
+
+    # getting scenes from indexes user selection
+    scenes_selected = []
+
+    for scene_id in p_scenes:
+        index = scenes_indices.index(scene_id.strip())
+        scenes_selected.append(scenes_list[index])
+
+    # find min max value if necessary to renormalize data
+    if p_custom:
+        get_min_max_value_interval(scenes_list, p_interval, p_metric)
+
+        # write new file to save
+        if not os.path.exists(custom_min_max_folder):
+            os.makedirs(custom_min_max_folder)
+
+        min_max_folder_path = os.path.join(os.path.dirname(__file__), custom_min_max_folder)
+        min_max_filename_path = os.path.join(min_max_folder_path, p_custom)
+
+        with open(min_max_filename_path, 'w') as f:
+            f.write(str(min_value_interval) + '\n')
+            f.write(str(max_value_interval) + '\n')
+
+    # create database using img folder (generate first time only)
+    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_metric, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
+
+if __name__== "__main__":
+    main()

+ 314 - 0
generate_data_model_random_center.py

@@ -0,0 +1,314 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Fri Sep 14 21:02:42 2018
+
+@author: jbuisine
+"""
+
+from __future__ import print_function
+import sys, os, argparse
+import numpy as np
+import random
+import time
+import json
+
+from PIL import Image
+from ipfml import processing, metrics, utils
+
+from modules.utils import config as cfg
+from modules.utils import data as dt
+
+# getting configuration information
+config_filename         = cfg.config_filename
+learned_folder          = cfg.learned_zones_folder
+min_max_filename        = cfg.min_max_filename_extension
+
+# define all scenes values
+all_scenes_list         = cfg.scenes_names
+all_scenes_indices      = cfg.scenes_indices
+
+normalization_choices   = cfg.normalization_choices
+path                    = cfg.dataset_path
+zones                   = cfg.zones_indices
+seuil_expe_filename     = cfg.seuil_expe_filename
+
+renderer_choices        = cfg.renderer_choices
+metric_choices          = cfg.metric_choices_labels
+output_data_folder      = cfg.output_data_folder
+custom_min_max_folder   = cfg.min_max_custom_folder
+min_max_ext             = cfg.min_max_filename_extension
+
+generic_output_file_svd = '_random.csv'
+
+min_value_interval      = sys.maxsize
+max_value_interval      = 0
+abs_gap_data            = 150
+
+
+def construct_new_line(seuil_learned, interval, line, choice, each, norm):
+    begin, end = interval
+
+    line_data = line.split(';')
+    seuil = line_data[0]
+    metrics = line_data[begin+1:end+1]
+
+    # keep only if modulo result is 0 (keep only each wanted values)
+    metrics = [float(m) for id, m in enumerate(metrics) if id % each == 0]
+
+    # TODO : check if it's always necessary to do that (loss of information for svd)
+    if norm:
+
+        if choice == 'svdne':
+            metrics = utils.normalize_arr_with_range(metrics, min_value_interval, max_value_interval)
+        if choice == 'svdn':
+            metrics = utils.normalize_arr(metrics)
+
+    if seuil_learned > int(seuil):
+        line = '1'
+    else:
+        line = '0'
+
+    for idx, val in enumerate(metrics):
+        line += ';'
+        line += str(val)
+    line += '\n'
+
+    return line
+
+def get_min_max_value_interval(_scenes_list, _interval, _metric):
+
+    global min_value_interval, max_value_interval
+
+    scenes = os.listdir(path)
+
+    # remove min max file from scenes folder
+    scenes = [s for s in scenes if min_max_filename not in s]
+
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take care of maxwell scenes
+        if folder_scene in _scenes_list:
+
+            scene_path = os.path.join(path, folder_scene)
+
+            zones_folder = []
+            # create zones list
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zones_folder.append("zone"+index_str)
+
+            for id_zone, zone_folder in enumerate(zones_folder):
+
+                zone_path = os.path.join(scene_path, zone_folder)
+
+                # if custom normalization choices then we use svd values not already normalized
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+
+                data_file_path = os.path.join(zone_path, data_filename)
+
+                # getting number of line and read randomly lines
+                f = open(data_file_path)
+                lines = f.readlines()
+
+                # check if user select current scene and zone to be part of training data set
+                for line in lines:
+
+                    begin, end = _interval
+
+                    line_data = line.split(';')
+
+                    metrics = line_data[begin+1:end+1]
+                    metrics = [float(m) for m in metrics]
+
+                    min_value = min(metrics)
+                    max_value = max(metrics)
+
+                    if min_value < min_value_interval:
+                        min_value_interval = min_value
+
+                    if max_value > max_value_interval:
+                        max_value_interval = max_value
+
+
+def generate_data_model(_scenes_list, _filename, _interval, _choice, _metric, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
+
+    output_train_filename = _filename + ".train"
+    output_test_filename = _filename + ".test"
+
+    if not '/' in output_train_filename:
+        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
+
+    # create path if not exists
+    if not os.path.exists(output_data_folder):
+        os.makedirs(output_data_folder)
+
+    train_file_data = []
+    test_file_data  = []
+
+    for id_scene, folder_scene in enumerate(_scenes_list):
+
+        scene_path = os.path.join(path, folder_scene)
+
+        zones_indices = zones
+
+        # shuffle list of zones (=> randomly choose zones)
+        # only in random mode
+        if _random:
+            random.shuffle(zones_indices)
+
+        # store zones learned
+        learned_zones_indices = zones_indices[:_nb_zones]
+
+        # write into file
+        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
+
+        if not os.path.exists(folder_learned_path):
+            os.makedirs(folder_learned_path)
+
+        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
+
+        with open(file_learned_path, 'w') as f:
+            for i in learned_zones_indices:
+                f.write(str(i) + ';')
+
+        for id_zone, index_folder in enumerate(zones_indices):
+
+            index_str = str(index_folder)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+            current_zone_folder = "zone" + index_str
+
+            zone_path = os.path.join(scene_path, current_zone_folder)
+
+            # if custom normalization choices then we use svd values not already normalized
+            if _custom:
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+            else:
+                data_filename = _metric + "_" + _choice + generic_output_file_svd
+
+            data_file_path = os.path.join(zone_path, data_filename)
+
+            # getting number of line and read randomly lines
+            f = open(data_file_path)
+            lines = f.readlines()
+
+            num_lines = len(lines)
+
+            # randomly shuffle image
+            if _random:
+                random.shuffle(lines)
+
+            path_seuil = os.path.join(zone_path, seuil_expe_filename)
+
+            with open(path_seuil, "r") as seuil_file:
+                seuil_learned = int(seuil_file.readline().strip())
+
+            counter = 0
+            # check if user select current scene and zone to be part of training data set
+            for data in lines:
+
+                percent = counter / num_lines
+                image_index = int(data.split(';')[0])
+
+                if image_index % _step == 0:
+
+                    with open(path_seuil, "r") as seuil_file:
+                        seuil_learned = int(seuil_file.readline().strip())
+
+                    gap_threshold = abs(seuil_learned - image_index)
+
+                    # only keep data near to threshold of zone image
+                    if gap_threshold <= abs_gap_data:
+
+                        line = construct_new_line(seuil_learned, _interval, data, _choice, _each, _custom)
+
+                        if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
+                            train_file_data.append(line)
+                        else:
+                            test_file_data.append(line)
+
+                counter += 1
+
+            f.close()
+
+    train_file = open(output_train_filename, 'w')
+    test_file = open(output_test_filename, 'w')
+
+    for line in train_file_data:
+        train_file.write(line)
+
+    for line in test_file_data:
+        test_file.write(line)
+
+    train_file.close()
+    test_file.close()
+
+
+def main():
+
+    # getting all params
+    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
+
+    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
+    parser.add_argument('--metric', type=str, help='Metric data choice', choices=metric_choices)
+    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
+    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
+    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
+    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
+    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
+    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
+    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
+    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
+
+    args = parser.parse_args()
+
+    p_filename = args.output
+    p_interval = list(map(int, args.interval.split(',')))
+    p_kind     = args.kind
+    p_metric   = args.metric
+    p_scenes   = args.scenes.split(',')
+    p_nb_zones = args.nb_zones
+    p_random   = args.random
+    p_percent  = args.percent
+    p_step     = args.step
+    p_each     = args.each
+    p_renderer = args.renderer
+    p_custom   = args.custom
+
+
+    # list all possibles choices of renderer
+    scenes_list = dt.get_renderer_scenes_names(p_renderer)
+    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
+
+    # getting scenes from indexes user selection
+    scenes_selected = []
+
+    for scene_id in p_scenes:
+        index = scenes_indices.index(scene_id.strip())
+        scenes_selected.append(scenes_list[index])
+
+    # find min max value if necessary to renormalize data
+    if p_custom:
+        get_min_max_value_interval(scenes_list, p_interval, p_metric)
+
+        # write new file to save
+        if not os.path.exists(custom_min_max_folder):
+            os.makedirs(custom_min_max_folder)
+
+        min_max_folder_path = os.path.join(os.path.dirname(__file__), custom_min_max_folder)
+        min_max_filename_path = os.path.join(min_max_folder_path, p_custom)
+
+        with open(min_max_filename_path, 'w') as f:
+            f.write(str(min_value_interval) + '\n')
+            f.write(str(max_value_interval) + '\n')
+
+    # create database using img folder (generate first time only)
+    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_metric, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
+
+if __name__== "__main__":
+    main()

+ 313 - 0
generate_data_model_random_split.py

@@ -0,0 +1,313 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Fri Sep 14 21:02:42 2018
+
+@author: jbuisine
+"""
+
+from __future__ import print_function
+import sys, os, argparse
+import numpy as np
+import random
+import time
+import json
+
+from PIL import Image
+from ipfml import processing, metrics, utils
+
+from modules.utils import config as cfg
+from modules.utils import data as dt
+
+# getting configuration information
+config_filename         = cfg.config_filename
+learned_folder          = cfg.learned_zones_folder
+min_max_filename        = cfg.min_max_filename_extension
+
+# define all scenes values
+all_scenes_list         = cfg.scenes_names
+all_scenes_indices      = cfg.scenes_indices
+
+normalization_choices   = cfg.normalization_choices
+path                    = cfg.dataset_path
+zones                   = cfg.zones_indices
+seuil_expe_filename     = cfg.seuil_expe_filename
+
+renderer_choices        = cfg.renderer_choices
+metric_choices          = cfg.metric_choices_labels
+output_data_folder      = cfg.output_data_folder
+custom_min_max_folder   = cfg.min_max_custom_folder
+min_max_ext             = cfg.min_max_filename_extension
+
+generic_output_file_svd = '_random.csv'
+
+min_value_interval      = sys.maxsize
+max_value_interval      = 0
+abs_gap_data            = 100
+
+
+def construct_new_line(seuil_learned, interval, line, choice, each, norm):
+    begin, end = interval
+
+    line_data = line.split(';')
+    seuil = line_data[0]
+    metrics = line_data[begin+1:end+1]
+
+    # keep only if modulo result is 0 (keep only each wanted values)
+    metrics = [float(m) for id, m in enumerate(metrics) if id % each == 0]
+
+    # TODO : check if it's always necessary to do that (loss of information for svd)
+    if norm:
+
+        if choice == 'svdne':
+            metrics = utils.normalize_arr_with_range(metrics, min_value_interval, max_value_interval)
+        if choice == 'svdn':
+            metrics = utils.normalize_arr(metrics)
+
+    if seuil_learned > int(seuil):
+        line = '1'
+    else:
+        line = '0'
+
+    for idx, val in enumerate(metrics):
+        line += ';'
+        line += str(val)
+    line += '\n'
+
+    return line
+
+def get_min_max_value_interval(_scenes_list, _interval, _metric):
+
+    global min_value_interval, max_value_interval
+
+    scenes = os.listdir(path)
+
+    # remove min max file from scenes folder
+    scenes = [s for s in scenes if min_max_filename not in s]
+
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take care of maxwell scenes
+        if folder_scene in _scenes_list:
+
+            scene_path = os.path.join(path, folder_scene)
+
+            zones_folder = []
+            # create zones list
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zones_folder.append("zone"+index_str)
+
+            for id_zone, zone_folder in enumerate(zones_folder):
+
+                zone_path = os.path.join(scene_path, zone_folder)
+
+                # if custom normalization choices then we use svd values not already normalized
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+
+                data_file_path = os.path.join(zone_path, data_filename)
+
+                # getting number of line and read randomly lines
+                f = open(data_file_path)
+                lines = f.readlines()
+
+                # check if user select current scene and zone to be part of training data set
+                for line in lines:
+
+                    begin, end = _interval
+
+                    line_data = line.split(';')
+
+                    metrics = line_data[begin+1:end+1]
+                    metrics = [float(m) for m in metrics]
+
+                    min_value = min(metrics)
+                    max_value = max(metrics)
+
+                    if min_value < min_value_interval:
+                        min_value_interval = min_value
+
+                    if max_value > max_value_interval:
+                        max_value_interval = max_value
+
+
+def generate_data_model(_scenes_list, _filename, _interval, _choice, _metric, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
+
+    output_train_filename = _filename + ".train"
+    output_test_filename = _filename + ".test"
+
+    if not '/' in output_train_filename:
+        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
+
+    # create path if not exists
+    if not os.path.exists(output_data_folder):
+        os.makedirs(output_data_folder)
+
+    train_file_data = []
+    test_file_data  = []
+
+    for id_scene, folder_scene in enumerate(_scenes_list):
+
+        scene_path = os.path.join(path, folder_scene)
+
+        zones_indices = zones
+
+        # shuffle list of zones (=> randomly choose zones)
+        # only in random mode
+        if _random:
+            random.shuffle(zones_indices)
+
+        # store zones learned
+        learned_zones_indices = zones_indices[:_nb_zones]
+
+        # write into file
+        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
+
+        if not os.path.exists(folder_learned_path):
+            os.makedirs(folder_learned_path)
+
+        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
+
+        with open(file_learned_path, 'w') as f:
+            for i in learned_zones_indices:
+                f.write(str(i) + ';')
+
+        for id_zone, index_folder in enumerate(zones_indices):
+
+            index_str = str(index_folder)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+            current_zone_folder = "zone" + index_str
+
+            zone_path = os.path.join(scene_path, current_zone_folder)
+
+            # if custom normalization choices then we use svd values not already normalized
+            if _custom:
+                data_filename = _metric + "_svd"+ generic_output_file_svd
+            else:
+                data_filename = _metric + "_" + _choice + generic_output_file_svd
+
+            data_file_path = os.path.join(zone_path, data_filename)
+
+            # getting number of line and read randomly lines
+            f = open(data_file_path)
+            lines = f.readlines()
+
+            num_lines = len(lines)
+
+            # randomly shuffle image
+            if _random:
+                random.shuffle(lines)
+
+            path_seuil = os.path.join(zone_path, seuil_expe_filename)
+
+            with open(path_seuil, "r") as seuil_file:
+                seuil_learned = int(seuil_file.readline().strip())
+
+            counter = 0
+            # check if user select current scene and zone to be part of training data set
+            for data in lines:
+
+                percent = counter / num_lines
+                image_index = int(data.split(';')[0])
+
+                if image_index % _step == 0:
+
+                    with open(path_seuil, "r") as seuil_file:
+                        seuil_learned = int(seuil_file.readline().strip())
+
+                    gap_threshold = abs(seuil_learned - image_index)
+
+                    if gap_threshold > abs_gap_data:
+
+                        line = construct_new_line(seuil_learned, _interval, data, _choice, _each, _custom)
+
+                        if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
+                            train_file_data.append(line)
+                        else:
+                            test_file_data.append(line)
+
+                counter += 1
+
+            f.close()
+
+    train_file = open(output_train_filename, 'w')
+    test_file = open(output_test_filename, 'w')
+
+    for line in train_file_data:
+        train_file.write(line)
+
+    for line in test_file_data:
+        test_file.write(line)
+
+    train_file.close()
+    test_file.close()
+
+
+def main():
+
+    # getting all params
+    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
+
+    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
+    parser.add_argument('--metric', type=str, help='Metric data choice', choices=metric_choices)
+    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
+    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
+    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
+    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
+    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
+    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
+    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
+    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
+
+    args = parser.parse_args()
+
+    p_filename = args.output
+    p_interval = list(map(int, args.interval.split(',')))
+    p_kind     = args.kind
+    p_metric   = args.metric
+    p_scenes   = args.scenes.split(',')
+    p_nb_zones = args.nb_zones
+    p_random   = args.random
+    p_percent  = args.percent
+    p_step     = args.step
+    p_each     = args.each
+    p_renderer = args.renderer
+    p_custom   = args.custom
+
+
+    # list all possibles choices of renderer
+    scenes_list = dt.get_renderer_scenes_names(p_renderer)
+    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
+
+    # getting scenes from indexes user selection
+    scenes_selected = []
+
+    for scene_id in p_scenes:
+        index = scenes_indices.index(scene_id.strip())
+        scenes_selected.append(scenes_list[index])
+
+    # find min max value if necessary to renormalize data
+    if p_custom:
+        get_min_max_value_interval(scenes_list, p_interval, p_metric)
+
+        # write new file to save
+        if not os.path.exists(custom_min_max_folder):
+            os.makedirs(custom_min_max_folder)
+
+        min_max_folder_path = os.path.join(os.path.dirname(__file__), custom_min_max_folder)
+        min_max_filename_path = os.path.join(min_max_folder_path, p_custom)
+
+        with open(min_max_filename_path, 'w') as f:
+            f.write(str(min_value_interval) + '\n')
+            f.write(str(max_value_interval) + '\n')
+
+    # create database using img folder (generate first time only)
+    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_metric, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
+
+if __name__== "__main__":
+    main()

+ 12 - 0
list_files.sh

@@ -0,0 +1,12 @@
+search_dir=$1
+sentence=$2
+
+for entry in "$search_dir"/*
+do
+    if [ -f $entry ]; then
+        if grep -q "$sentence" "$entry"; then
+            echo "$entry"
+        fi
+    fi
+done
+

+ 0 - 0
modules/__init__.py


+ 75 - 0
modules/models.py

@@ -0,0 +1,75 @@
+from sklearn.model_selection import GridSearchCV
+from sklearn.linear_model import LogisticRegression
+from sklearn.ensemble import RandomForestClassifier, VotingClassifier
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.ensemble import GradientBoostingClassifier
+import sklearn.svm as svm
+
+
+def _get_best_model(X_train, y_train):
+
+    Cs = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
+    gammas = [0.001, 0.01, 0.1, 1, 5, 10, 100]
+    param_grid = {'kernel':['rbf'], 'C': Cs, 'gamma' : gammas}
+
+    svc = svm.SVC(probability=True)
+    clf = GridSearchCV(svc, param_grid, cv=10, scoring='accuracy', verbose=10)
+
+    clf.fit(X_train, y_train)
+
+    model = clf.best_estimator_
+
+    return model
+
+def svm_model(X_train, y_train):
+
+    return _get_best_model(X_train, y_train)
+
+
+def ensemble_model(X_train, y_train):
+
+    svm_model = _get_best_model(X_train, y_train)
+
+    lr_model = LogisticRegression(solver='liblinear', multi_class='ovr', random_state=1)
+    rf_model = RandomForestClassifier(n_estimators=100, random_state=1)
+
+    ensemble_model = VotingClassifier(estimators=[
+       ('svm', svm_model), ('lr', lr_model), ('rf', rf_model)], voting='soft', weights=[1,1,1])
+
+    ensemble_model.fit(X_train, y_train)
+
+    return ensemble_model
+
+
+def ensemble_model_v2(X_train, y_train):
+
+    svm_model = _get_best_model(X_train, y_train)
+    knc_model = KNeighborsClassifier(n_neighbors=2)
+    gbc_model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
+    lr_model = LogisticRegression(solver='liblinear', multi_class='ovr', random_state=1)
+    rf_model = RandomForestClassifier(n_estimators=100, random_state=1)
+
+    ensemble_model = VotingClassifier(estimators=[
+       ('lr', lr_model),
+       ('knc', knc_model),
+       ('gbc', gbc_model),
+       ('svm', svm_model),
+       ('rf', rf_model)],
+       voting='soft', weights=[1, 1, 1, 1, 1])
+
+    ensemble_model.fit(X_train, y_train)
+
+    return ensemble_model
+
+def get_trained_model(choice, X_train, y_train):
+
+    if choice == 'svm_model':
+        return svm_model(X_train, y_train)
+
+    if choice == 'ensemble_model':
+        return ensemble_model(X_train, y_train)
+
+    if choice == 'ensemble_model_v2':
+        return ensemble_model_v2(X_train, y_train)
+
+

+ 0 - 0
modules/utils/__init__.py


+ 41 - 0
modules/utils/config.py

@@ -0,0 +1,41 @@
+import numpy as np
+
+zone_folder                     = "zone"
+output_data_folder              = 'data'
+dataset_path                    = 'fichiersSVD_light'
+threshold_map_folder            = 'threshold_map'
+models_information_folder       = 'models_info'
+saved_models_folder             = 'saved_models'
+min_max_custom_folder           = 'custom_norm'
+learned_zones_folder            = 'learned_zones'
+correlation_indices_folder      = 'corr_indices'
+
+csv_model_comparisons_filename  = "models_comparisons.csv"
+seuil_expe_filename             = 'seuilExpe'
+min_max_filename_extension      = "_min_max_values"
+config_filename                 = "config"
+
+models_names_list               = ["svm_model","ensemble_model","ensemble_model_v2","deep_keras"]
+
+# define all scenes values
+renderer_choices                = ['all', 'maxwell', 'igloo', 'cycle']
+
+scenes_names                    = ['Appart1opt02', 'Bureau1', 'Cendrier', 'Cuisine01', 'EchecsBas', 'PNDVuePlongeante', 'SdbCentre', 'SdbDroite', 'Selles']
+scenes_indices                  = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']
+
+maxwell_scenes_names            = ['Appart1opt02', 'Cuisine01', 'SdbCentre', 'SdbDroite']
+maxwell_scenes_indices          = ['A', 'D', 'G', 'H']
+
+igloo_scenes_names              = ['Bureau1', 'PNDVuePlongeante']
+igloo_scenes_indices            = ['B', 'F']
+
+cycle_scenes_names              = ['EchecBas', 'Selles']
+cycle_scenes_indices            = ['E', 'I']
+
+normalization_choices           = ['svd', 'svdn', 'svdne']
+zones_indices                   = np.arange(16)
+
+metric_choices_labels           = ['lab', 'mscn', 'low_bits_2', 'low_bits_3', 'low_bits_4', 'low_bits_5', 'low_bits_6','low_bits_4_shifted_2', 'sub_blocks_stats', 'sub_blocks_area', 'sub_blocks_stats_reduced', 'sub_blocks_area_normed', 'mscn_var_4', 'mscn_var_16', 'mscn_var_64', 'mscn_var_16_max', 'mscn_var_64_max', 'ica_diff', 'svd_trunc_diff', 'ipca_diff', 'svd_reconstruct', 'highest_sv_std_filters', 'lowest_sv_std_filters']
+
+keras_epochs                    = 500
+keras_batch                     = 32

+ 360 - 0
modules/utils/data.py

@@ -0,0 +1,360 @@
+from ipfml import processing, metrics, utils
+from modules.utils.config import *
+
+from PIL import Image
+from skimage import color
+from sklearn.decomposition import FastICA
+from sklearn.decomposition import IncrementalPCA
+from sklearn.decomposition import TruncatedSVD
+from numpy.linalg import svd as lin_svd
+
+from scipy.signal import medfilt2d, wiener, cwt
+
+import numpy as np
+
+
+_scenes_names_prefix   = '_scenes_names'
+_scenes_indices_prefix = '_scenes_indices'
+
+# store all variables from current module context
+context_vars = vars()
+
+
+def get_svd_data(data_type, block):
+    """
+    Method which returns the data type expected
+    """
+
+    if data_type == 'lab':
+
+        block_file_path = '/tmp/lab_img.png'
+        block.save(block_file_path)
+        data = processing.get_LAB_L_SVD_s(Image.open(block_file_path))
+
+    if data_type == 'mscn':
+
+        img_mscn_revisited = processing.rgb_to_mscn(block)
+
+        # save tmp as img
+        img_output = Image.fromarray(img_mscn_revisited.astype('uint8'), 'L')
+        mscn_revisited_file_path = '/tmp/mscn_revisited_img.png'
+        img_output.save(mscn_revisited_file_path)
+        img_block = Image.open(mscn_revisited_file_path)
+
+        # extract from temp image
+        data = metrics.get_SVD_s(img_block)
+
+    """if data_type == 'mscn':
+
+        img_gray = np.array(color.rgb2gray(np.asarray(block))*255, 'uint8')
+        img_mscn = processing.calculate_mscn_coefficients(img_gray, 7)
+        img_mscn_norm = processing.normalize_2D_arr(img_mscn)
+
+        img_mscn_gray = np.array(img_mscn_norm*255, 'uint8')
+
+        data = metrics.get_SVD_s(img_mscn_gray)
+    """
+
+    if data_type == 'low_bits_6':
+
+        low_bits_6 = processing.rgb_to_LAB_L_low_bits(block, 6)
+        data = metrics.get_SVD_s(low_bits_6)
+
+    if data_type == 'low_bits_5':
+
+        low_bits_5 = processing.rgb_to_LAB_L_low_bits(block, 5)
+        data = metrics.get_SVD_s(low_bits_5)
+
+    if data_type == 'low_bits_4':
+
+        low_bits_4 = processing.rgb_to_LAB_L_low_bits(block, 4)
+        data = metrics.get_SVD_s(low_bits_4)
+
+    if data_type == 'low_bits_3':
+
+        low_bits_3 = processing.rgb_to_LAB_L_low_bits(block, 3)
+        data = metrics.get_SVD_s(low_bits_3)
+
+    if data_type == 'low_bits_2':
+
+        low_bits_2 = processing.rgb_to_LAB_L_low_bits(block, 2)
+        data = metrics.get_SVD_s(low_bits_2)
+
+    if data_type == 'low_bits_4_shifted_2':
+
+        data = metrics.get_SVD_s(processing.rgb_to_LAB_L_bits(block, (3, 6)))
+
+    if data_type == 'sub_blocks_stats':
+
+        block = np.asarray(block)
+        width, height, _= block.shape
+        sub_width, sub_height = int(width / 4), int(height / 4)
+
+        sub_blocks = processing.divide_in_blocks(block, (sub_width, sub_height))
+
+        data = []
+
+        for sub_b in sub_blocks:
+
+            # by default use the whole lab L canal
+            l_svd_data = np.array(processing.get_LAB_L_SVD_s(sub_b))
+
+            # get information we want from svd
+            data.append(np.mean(l_svd_data))
+            data.append(np.median(l_svd_data))
+            data.append(np.percentile(l_svd_data, 25))
+            data.append(np.percentile(l_svd_data, 75))
+            data.append(np.var(l_svd_data))
+
+            area_under_curve = utils.integral_area_trapz(l_svd_data, dx=100)
+            data.append(area_under_curve)
+
+        # convert into numpy array after computing all stats
+        data = np.asarray(data)
+
+    if data_type == 'sub_blocks_stats_reduced':
+
+        block = np.asarray(block)
+        width, height, _= block.shape
+        sub_width, sub_height = int(width / 4), int(height / 4)
+
+        sub_blocks = processing.divide_in_blocks(block, (sub_width, sub_height))
+
+        data = []
+
+        for sub_b in sub_blocks:
+
+            # by default use the whole lab L canal
+            l_svd_data = np.array(processing.get_LAB_L_SVD_s(sub_b))
+
+            # get information we want from svd
+            data.append(np.mean(l_svd_data))
+            data.append(np.median(l_svd_data))
+            data.append(np.percentile(l_svd_data, 25))
+            data.append(np.percentile(l_svd_data, 75))
+            data.append(np.var(l_svd_data))
+
+        # convert into numpy array after computing all stats
+        data = np.asarray(data)
+
+    if data_type == 'sub_blocks_area':
+
+        block = np.asarray(block)
+        width, height, _= block.shape
+        sub_width, sub_height = int(width / 8), int(height / 8)
+
+        sub_blocks = processing.divide_in_blocks(block, (sub_width, sub_height))
+
+        data = []
+
+        for sub_b in sub_blocks:
+
+            # by default use the whole lab L canal
+            l_svd_data = np.array(processing.get_LAB_L_SVD_s(sub_b))
+
+            area_under_curve = utils.integral_area_trapz(l_svd_data, dx=50)
+            data.append(area_under_curve)
+
+        # convert into numpy array after computing all stats
+        data = np.asarray(data)
+
+    if data_type == 'sub_blocks_area_normed':
+
+        block = np.asarray(block)
+        width, height, _= block.shape
+        sub_width, sub_height = int(width / 8), int(height / 8)
+
+        sub_blocks = processing.divide_in_blocks(block, (sub_width, sub_height))
+
+        data = []
+
+        for sub_b in sub_blocks:
+
+            # by default use the whole lab L canal
+            l_svd_data = np.array(processing.get_LAB_L_SVD_s(sub_b))
+            l_svd_data = utils.normalize_arr(l_svd_data)
+
+            area_under_curve = utils.integral_area_trapz(l_svd_data, dx=50)
+            data.append(area_under_curve)
+
+        # convert into numpy array after computing all stats
+        data = np.asarray(data)
+
+    if data_type == 'mscn_var_4':
+
+        data = _get_mscn_variance(block, (100, 100))
+
+    if data_type == 'mscn_var_16':
+
+        data = _get_mscn_variance(block, (50, 50))
+
+    if data_type == 'mscn_var_64':
+
+        data = _get_mscn_variance(block, (25, 25))
+
+    if data_type == 'mscn_var_16_max':
+
+        data = _get_mscn_variance(block, (50, 50))
+        data = np.asarray(data)
+        size = int(len(data) / 4)
+        indices = data.argsort()[-size:][::-1]
+        data = data[indices]
+
+    if data_type == 'mscn_var_64_max':
+
+        data = _get_mscn_variance(block, (25, 25))
+        data = np.asarray(data)
+        size = int(len(data) / 4)
+        indices = data.argsort()[-size:][::-1]
+        data = data[indices]
+
+    if data_type == 'ica_diff':
+        current_image = metrics.get_LAB_L(block)
+
+        ica = FastICA(n_components=50)
+        ica.fit(current_image)
+
+        image_ica = ica.fit_transform(current_image)
+        image_restored = ica.inverse_transform(image_ica)
+
+        final_image = utils.normalize_2D_arr(image_restored)
+        final_image = np.array(final_image * 255, 'uint8')
+
+        sv_values = utils.normalize_arr(metrics.get_SVD_s(current_image))
+        ica_sv_values = utils.normalize_arr(metrics.get_SVD_s(final_image))
+
+        data = abs(np.array(sv_values) - np.array(ica_sv_values))
+
+    if data_type == 'svd_trunc_diff':
+
+        current_image = metrics.get_LAB_L(block)
+
+        svd = TruncatedSVD(n_components=30, n_iter=100, random_state=42)
+        transformed_image = svd.fit_transform(current_image)
+        restored_image = svd.inverse_transform(transformed_image)
+
+        reduced_image = (current_image - restored_image)
+
+        U, s, V = metrics.get_SVD(reduced_image)
+        data = s
+
+    if data_type == 'ipca_diff':
+
+        current_image = metrics.get_LAB_L(block)
+
+        transformer = IncrementalPCA(n_components=20, batch_size=25)
+        transformed_image = transformer.fit_transform(current_image)
+        restored_image = transformer.inverse_transform(transformed_image)
+
+        reduced_image = (current_image - restored_image)
+
+        U, s, V = metrics.get_SVD(reduced_image)
+        data = s
+
+    if data_type == 'svd_reconstruct':
+
+        reconstructed_interval = (90, 200)
+        begin, end = reconstructed_interval
+
+        lab_img = metrics.get_LAB_L(block)
+        lab_img = np.array(lab_img, 'uint8')
+
+        U, s, V = lin_svd(lab_img, full_matrices=True)
+
+        smat = np.zeros((end-begin, end-begin), dtype=complex)
+        smat[:, :] = np.diag(s[begin:end])
+        output_img = np.dot(U[:, begin:end],  np.dot(smat, V[begin:end, :]))
+
+        output_img = np.array(output_img, 'uint8')
+
+        data = metrics.get_SVD_s(output_img)
+
+    if 'sv_std_filters' in data_type:
+
+        # convert into lab by default to apply filters
+        lab_img = metrics.get_LAB_L(block)
+        arr = np.array(lab_img)
+        images = []
+        
+        # Apply list of filter on arr
+        images.append(medfilt2d(arr, [3, 3]))
+        images.append(medfilt2d(arr, [5, 5]))
+        images.append(wiener(arr, [3, 3]))
+        images.append(wiener(arr, [5, 5]))
+        
+        # By default computation of current block image
+        s_arr = metrics.get_SVD_s(arr)
+        sv_vector = [s_arr]
+
+        # for each new image apply SVD and get SV 
+        for img in images:
+            s = metrics.get_SVD_s(img)
+            sv_vector.append(s)
+            
+        sv_array = np.array(sv_vector)
+        
+        _, len = sv_array.shape
+        
+        sv_std = []
+        
+        # normalize each SV vectors and compute standard deviation for each sub vectors
+        for i in range(len):
+            sv_array[:, i] = utils.normalize_arr(sv_array[:, i])
+            sv_std.append(np.std(sv_array[:, i]))
+        
+        indices = []
+
+        if 'lowest' in data_type:
+            indices = get_lowest_values(sv_std, 200)
+
+        if 'highest' in data_type:
+            indices = get_highest_values(sv_std, 200)
+
+        # data are arranged following std trend computed
+        data = s_arr[indices]
+
+    return data
+
+
+def get_highest_values(arr, n):
+    return np.array(arr).argsort()[-n:][::-1]
+
+
+def get_lowest_values(arr, n):
+    return np.array(arr).argsort()[::-1][-n:][::-1]
+
+
+def _get_mscn_variance(block, sub_block_size=(50, 50)):
+
+    blocks = processing.divide_in_blocks(block, sub_block_size)
+
+    data = []
+
+    for block in blocks:
+        mscn_coefficients = processing.get_mscn_coefficients(block)
+        flat_coeff = mscn_coefficients.flatten()
+        data.append(np.var(flat_coeff))
+
+    return np.sort(data)
+
+
+def get_renderer_scenes_indices(renderer_name):
+
+    if renderer_name not in renderer_choices:
+        raise ValueError("Unknown renderer name")
+
+    if renderer_name == 'all':
+        return scenes_indices
+    else:
+        return context_vars[renderer_name + _scenes_indices_prefix]
+
+def get_renderer_scenes_names(renderer_name):
+
+    if renderer_name not in renderer_choices:
+        raise ValueError("Unknown renderer name")
+
+    if renderer_name == 'all':
+        return scenes_names
+    else:
+        return context_vars[renderer_name + _scenes_names_prefix]
+

+ 12 - 0
modules/utils/filters.py

@@ -0,0 +1,12 @@
+import cv2
+import numpy as np
+from scipy.signal import medfilt2d, wiener, cwt
+
+
+def get_filters(arr):
+
+    filters = []
+
+    # TODO : get all needed filters and append to filters array
+    
+    return filters 

+ 145 - 0
predict_noisy_image_svd.py

@@ -0,0 +1,145 @@
+from sklearn.externals import joblib
+
+import numpy as np
+
+from ipfml import processing, utils
+from PIL import Image
+
+import sys, os, argparse, json
+
+from keras.models import model_from_json
+
+from modules.utils import config as cfg
+from modules.utils import data as dt
+
+path                  = cfg.dataset_path
+min_max_ext           = cfg.min_max_filename_extension
+metric_choices        = cfg.metric_choices_labels
+normalization_choices = cfg.normalization_choices
+
+custom_min_max_folder = cfg.min_max_custom_folder
+
+def main():
+
+    # getting all params
+    parser = argparse.ArgumentParser(description="Script which detects if an image is noisy or not using specific model")
+
+    parser.add_argument('--image', type=str, help='Image path')
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
+    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
+    parser.add_argument('--metric', type=str, help='Metric data choice', choices=metric_choices)
+    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
+
+    args = parser.parse_args()
+
+    p_img_file   = args.image
+    p_model_file = args.model
+    p_interval   = list(map(int, args.interval.split(',')))
+    p_mode       = args.mode
+    p_metric     = args.metric
+    p_custom     = args.custom
+
+    if '.joblib' in p_model_file:
+        kind_model = 'sklearn'
+
+    if '.json' in p_model_file:
+        kind_model = 'keras'
+
+    if 'corr' in p_model_file:
+        corr_model = True
+
+        indices_corr_path = os.path.join(cfg.correlation_indices_folder, p_model_file.split('/')[1].replace('.json', '').replace('.joblib', '') + '.csv')
+
+        with open(indices_corr_path, 'r') as f:
+            data_corr_indices = [int(x) for x in f.readline().split(';') if x != '']
+    else:
+        corr_model = False
+
+
+    if kind_model == 'sklearn':
+        # load of model file
+        model = joblib.load(p_model_file)
+
+    if kind_model == 'keras':
+        with open(p_model_file, 'r') as f:
+            json_model = json.load(f)
+            model = model_from_json(json_model)
+            model.load_weights(p_model_file.replace('.json', '.h5'))
+
+            model.compile(loss='binary_crossentropy',
+                        optimizer='adam',
+                        metrics=['accuracy'])
+
+    # load image
+    img = Image.open(p_img_file)
+
+    data = dt.get_svd_data(p_metric, img)
+
+    # get interval values
+    begin, end = p_interval
+
+    # check if custom min max file is used
+    if p_custom:
+
+        if corr_model:
+            test_data = data[data_corr_indices]
+        else:
+            test_data = data[begin:end]
+
+        if p_mode == 'svdne':
+
+            # set min_max_filename if custom use
+            min_max_file_path = custom_min_max_folder + '/' +  p_custom
+
+            # need to read min_max_file
+            file_path = os.path.join(os.path.dirname(__file__), min_max_file_path)
+            with open(file_path, 'r') as f:
+                min_val = float(f.readline().replace('\n', ''))
+                max_val = float(f.readline().replace('\n', ''))
+
+            test_data = utils.normalize_arr_with_range(test_data, min_val, max_val)
+
+        if p_mode == 'svdn':
+            test_data = utils.normalize_arr(test_data)
+
+    else:
+
+        # check mode to normalize data
+        if p_mode == 'svdne':
+
+            # set min_max_filename if custom use
+            min_max_file_path = path + '/' + p_metric + min_max_ext
+
+            # need to read min_max_file
+            file_path = os.path.join(os.path.dirname(__file__), min_max_file_path)
+            with open(file_path, 'r') as f:
+                min_val = float(f.readline().replace('\n', ''))
+                max_val = float(f.readline().replace('\n', ''))
+
+            l_values = utils.normalize_arr_with_range(data, min_val, max_val)
+
+        elif p_mode == 'svdn':
+            l_values = utils.normalize_arr(data)
+        else:
+            l_values = data
+
+        if corr_model:
+            test_data = data[data_corr_indices]
+        else:
+            test_data = data[begin:end]
+
+
+    # get prediction of model
+    if kind_model == 'sklearn':
+        prediction = model.predict([test_data])[0]
+
+    if kind_model == 'keras':
+        test_data = np.asarray(test_data).reshape(1, len(test_data), 1)
+        prediction = model.predict_classes([test_data])[0][0]
+
+    # output expected from others scripts
+    print(prediction)
+
+if __name__== "__main__":
+    main()

+ 232 - 0
predict_seuil_expe.py

@@ -0,0 +1,232 @@
+from sklearn.externals import joblib
+
+import numpy as np
+
+from ipfml import processing, utils
+from PIL import Image
+
+import sys, os, getopt
+import subprocess
+import time
+
+from modules.utils import config as cfg
+
+config_filename           = cfg.config_filename
+scenes_path               = cfg.dataset_path
+min_max_filename          = cfg.min_max_filename_extension
+threshold_expe_filename   = cfg.seuil_expe_filename
+
+threshold_map_folder      = cfg.threshold_map_folder
+threshold_map_file_prefix = cfg.threshold_map_folder + "_"
+
+zones                     = cfg.zones_indices
+
+tmp_filename              = '/tmp/__model__img_to_predict.png'
+
+current_dirpath = os.getcwd()
+
+def main():
+
+    p_custom = False
+
+    # TODO : use of argparse
+    
+    if len(sys.argv) <= 1:
+        print('Run with default parameters...')
+        print('python predict_seuil_expe.py --interval "0,20" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "ht:m:o:l:c", ["help=", "interval=", "model=", "mode=", "metric=" "limit_detection=", "custom="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python predict_seuil_expe.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python predict_seuil_expe.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+            sys.exit()
+        elif o in ("-t", "--interval"):
+            p_interval = a
+        elif o in ("-mo", "--model"):
+            p_model_file = a
+        elif o in ("-o", "--mode"):
+            p_mode = a
+
+            if p_mode != 'svdn' and p_mode != 'svdne' and p_mode != 'svd':
+                assert False, "Mode not recognized"
+
+        elif o in ("-me", "--metric"):
+            p_metric = a
+        elif o in ("-l", "--limit_detection"):
+            p_limit = int(a)
+        elif o in ("-c", "--custom"):
+            p_custom = a
+        else:
+            assert False, "unhandled option"
+
+    scenes = os.listdir(scenes_path)
+
+    scenes = [s for s in scenes if not min_max_filename in s]
+
+    # go ahead each scenes
+    for id_scene, folder_scene in enumerate(scenes):
+
+        print(folder_scene)
+
+        scene_path = os.path.join(scenes_path, folder_scene)
+
+        config_path = os.path.join(scene_path, config_filename)
+
+        with open(config_path, "r") as config_file:
+            last_image_name = config_file.readline().strip()
+            prefix_image_name = config_file.readline().strip()
+            start_index_image = config_file.readline().strip()
+            end_index_image = config_file.readline().strip()
+            step_counter = int(config_file.readline().strip())
+
+        threshold_expes = []
+        threshold_expes_detected = []
+        threshold_expes_counter = []
+        threshold_expes_found = []
+
+        # get zones list info
+        for index in zones:
+            index_str = str(index)
+            if len(index_str) < 2:
+                index_str = "0" + index_str
+            zone_folder = "zone"+index_str
+
+            threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
+
+            with open(threshold_path_file) as f:
+                threshold = int(f.readline())
+                threshold_expes.append(threshold)
+
+                # Initialize default data to get detected model threshold found
+                threshold_expes_detected.append(False)
+                threshold_expes_counter.append(0)
+                threshold_expes_found.append(int(end_index_image)) # by default use max
+
+        current_counter_index = int(start_index_image)
+        end_counter_index = int(end_index_image)
+
+        print(current_counter_index)
+        check_all_done = False
+
+        while(current_counter_index <= end_counter_index and not check_all_done):
+
+            current_counter_index_str = str(current_counter_index)
+
+            while len(start_index_image) > len(current_counter_index_str):
+                current_counter_index_str = "0" + current_counter_index_str
+
+            img_path = os.path.join(scene_path, prefix_image_name + current_counter_index_str + ".png")
+
+            current_img = Image.open(img_path)
+            img_blocks = processing.divide_in_blocks(current_img, (200, 200))
+
+
+            check_all_done = all(d == True for d in threshold_expes_detected)
+
+            for id_block, block in enumerate(img_blocks):
+
+                # check only if necessary for this scene (not already detected)
+                if not threshold_expes_detected[id_block]:
+
+                    tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
+                    block.save(tmp_file_path)
+
+                    python_cmd = "python predict_noisy_image_svd.py --image " + tmp_file_path + \
+                                    " --interval '" + p_interval + \
+                                    "' --model " + p_model_file  + \
+                                    " --mode " + p_mode + \
+                                    " --metric " + p_metric
+
+                    # specify use of custom file for min max normalization
+                    if p_custom:
+                        python_cmd = python_cmd + ' --custom ' + p_custom
+
+
+                    ## call command ##
+                    p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
+
+                    (output, err) = p.communicate()
+
+                    ## Wait for result ##
+                    p_status = p.wait()
+
+                    prediction = int(output)
+
+                    if prediction == 0:
+                        threshold_expes_counter[id_block] = threshold_expes_counter[id_block] + 1
+                    else:
+                        threshold_expes_counter[id_block] = 0
+
+                    if threshold_expes_counter[id_block] == p_limit:
+                        threshold_expes_detected[id_block] = True
+                        threshold_expes_found[id_block] = current_counter_index
+
+                    print(str(id_block) + " : " + str(current_counter_index) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
+
+            current_counter_index += step_counter
+            print("------------------------")
+            print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)))
+            print("------------------------")
+
+        # end of scene => display of results
+
+        # construct path using model name for saving threshold map folder
+        model_treshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
+
+        # create threshold model path if necessary
+        if not os.path.exists(model_treshold_path):
+            os.makedirs(model_treshold_path)
+
+        abs_dist = []
+
+        map_filename = os.path.join(model_treshold_path, threshold_map_file_prefix + folder_scene)
+        f_map = open(map_filename, 'w')
+
+        line_information = ""
+
+        # default header
+        f_map.write('|  |    |    |  |\n')
+        f_map.write('---|----|----|---\n')
+        for id, threshold in enumerate(threshold_expes_found):
+
+            line_information += str(threshold) + " / " + str(threshold_expes[id]) + " | "
+            abs_dist.append(abs(threshold - threshold_expes[id]))
+
+            if (id + 1) % 4 == 0:
+                f_map.write(line_information + '\n')
+                line_information = ""
+
+        f_map.write(line_information + '\n')
+
+        min_abs_dist = min(abs_dist)
+        max_abs_dist = max(abs_dist)
+        avg_abs_dist = sum(abs_dist) / len(abs_dist)
+
+        f_map.write('\nScene information : ')
+        f_map.write('\n- BEGIN : ' + str(start_index_image))
+        f_map.write('\n- END : ' + str(end_index_image))
+
+        f_map.write('\n\nDistances information : ')
+        f_map.write('\n- MIN : ' + str(min_abs_dist))
+        f_map.write('\n- MAX : ' + str(max_abs_dist))
+        f_map.write('\n- AVG : ' + str(avg_abs_dist))
+
+        f_map.write('\n\nOther information : ')
+        f_map.write('\n- Detection limit : ' + str(p_limit))
+
+        # by default print last line
+        f_map.close()
+
+        print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)) + " Done..")
+        print("------------------------")
+
+        time.sleep(10)
+
+
+if __name__== "__main__":
+    main()

+ 237 - 0
predict_seuil_expe_maxwell.py

@@ -0,0 +1,237 @@
+from sklearn.externals import joblib
+
+import numpy as np
+
+from ipfml import processing
+from PIL import Image
+
+import sys, os, getopt
+import subprocess
+import time
+
+
+from modules.utils import config as cfg
+
+config_filename           = cfg.config_filename
+scenes_path               = cfg.dataset_path
+min_max_filename          = cfg.min_max_filename_extension
+threshold_expe_filename   = cfg.seuil_expe_filename
+
+threshold_map_folder      = cfg.threshold_map_folder
+threshold_map_file_prefix = cfg.threshold_map_folder + "_"
+
+zones                     = cfg.zones_indices
+maxwell_scenes            = cfg.maxwell_scenes_names
+
+tmp_filename              = '/tmp/__model__img_to_predict.png'
+
+current_dirpath = os.getcwd()
+
+def main():
+
+    # by default..
+    p_custom = False
+
+    # TODO : use of argparse
+
+    if len(sys.argv) <= 1:
+        print('Run with default parameters...')
+        print('python predict_seuil_expe_maxwell.py --interval "0,20" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "ht:m:o:l:c", ["help=", "interval=", "model=", "mode=", "metric=", "limit_detection=", "custom="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python predict_seuil_expe_maxwell.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python predict_seuil_expe_maxwell.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+            sys.exit()
+        elif o in ("-t", "--interval"):
+            p_interval = a
+        elif o in ("-m", "--model"):
+            p_model_file = a
+        elif o in ("-o", "--mode"):
+            p_mode = a
+
+            if p_mode != 'svdn' and p_mode != 'svdne' and p_mode != 'svd':
+                assert False, "Mode not recognized"
+
+        elif o in ("-m", "--metric"):
+            p_metric = a
+        elif o in ("-l", "--limit_detection"):
+            p_limit = int(a)
+        elif o in ("-c", "--custom"):
+            p_custom = a
+        else:
+            assert False, "unhandled option"
+
+    scenes = os.listdir(scenes_path)
+
+    scenes = [s for s in scenes if s in maxwell_scenes]
+
+    # go ahead each scenes
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take in consideration maxwell scenes
+        if folder_scene in maxwell_scenes:
+
+            print(folder_scene)
+
+            scene_path = os.path.join(scenes_path, folder_scene)
+
+            config_path = os.path.join(scene_path, config_filename)
+
+            with open(config_path, "r") as config_file:
+                last_image_name = config_file.readline().strip()
+                prefix_image_name = config_file.readline().strip()
+                start_index_image = config_file.readline().strip()
+                end_index_image = config_file.readline().strip()
+                step_counter = int(config_file.readline().strip())
+
+            threshold_expes = []
+            threshold_expes_detected = []
+            threshold_expes_counter = []
+            threshold_expes_found = []
+
+            # get zones list info
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zone_folder = "zone"+index_str
+
+                threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
+
+                with open(threshold_path_file) as f:
+                    threshold = int(f.readline())
+                    threshold_expes.append(threshold)
+
+                    # Initialize default data to get detected model threshold found
+                    threshold_expes_detected.append(False)
+                    threshold_expes_counter.append(0)
+                    threshold_expes_found.append(int(end_index_image)) # by default use max
+
+            current_counter_index = int(start_index_image)
+            end_counter_index = int(end_index_image)
+
+            print(current_counter_index)
+            check_all_done = False
+
+            while(current_counter_index <= end_counter_index and not check_all_done):
+
+                current_counter_index_str = str(current_counter_index)
+
+                while len(start_index_image) > len(current_counter_index_str):
+                    current_counter_index_str = "0" + current_counter_index_str
+
+                img_path = os.path.join(scene_path, prefix_image_name + current_counter_index_str + ".png")
+
+                current_img = Image.open(img_path)
+                img_blocks = processing.divide_in_blocks(current_img, (200, 200))
+
+
+                check_all_done = all(d == True for d in threshold_expes_detected)
+
+                for id_block, block in enumerate(img_blocks):
+
+                    # check only if necessary for this scene (not already detected)
+                    if not threshold_expes_detected[id_block]:
+
+                        tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
+                        block.save(tmp_file_path)
+
+                        python_cmd = "python predict_noisy_image_svd.py --image " + tmp_file_path + \
+                                        " --interval '" + p_interval + \
+                                        "' --model " + p_model_file  + \
+                                        " --mode " + p_mode + \
+                                        " --metric " + p_metric
+
+                        # specify use of custom file for min max normalization
+                        if p_custom:
+                            python_cmd = python_cmd + ' --custom ' + p_custom
+
+                        ## call command ##
+                        p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
+
+                        (output, err) = p.communicate()
+
+                        ## Wait for result ##
+                        p_status = p.wait()
+
+                        prediction = int(output)
+
+                        if prediction == 0:
+                            threshold_expes_counter[id_block] = threshold_expes_counter[id_block] + 1
+                        else:
+                            threshold_expes_counter[id_block] = 0
+
+                        if threshold_expes_counter[id_block] == p_limit:
+                            threshold_expes_detected[id_block] = True
+                            threshold_expes_found[id_block] = current_counter_index
+
+                        print(str(id_block) + " : " + str(current_counter_index) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
+
+                current_counter_index += step_counter
+                print("------------------------")
+                print("Scene " + str(id_scene + 1) + "/" + str(len(maxwell_scenes)))
+                print("------------------------")
+
+            # end of scene => display of results
+
+            # construct path using model name for saving threshold map folder
+            model_treshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
+
+            # create threshold model path if necessary
+            if not os.path.exists(model_treshold_path):
+                os.makedirs(model_treshold_path)
+
+            abs_dist = []
+
+            map_filename = os.path.join(model_treshold_path, threshold_map_file_prefix + folder_scene)
+            f_map = open(map_filename, 'w')
+
+            line_information = ""
+
+            # default header
+            f_map.write('|  |    |    |  |\n')
+            f_map.write('---|----|----|---\n')
+            for id, threshold in enumerate(threshold_expes_found):
+
+                line_information += str(threshold) + " / " + str(threshold_expes[id]) + " | "
+                abs_dist.append(abs(threshold - threshold_expes[id]))
+
+                if (id + 1) % 4 == 0:
+                    f_map.write(line_information + '\n')
+                    line_information = ""
+
+            f_map.write(line_information + '\n')
+
+            min_abs_dist = min(abs_dist)
+            max_abs_dist = max(abs_dist)
+            avg_abs_dist = sum(abs_dist) / len(abs_dist)
+
+            f_map.write('\nScene information : ')
+            f_map.write('\n- BEGIN : ' + str(start_index_image))
+            f_map.write('\n- END : ' + str(end_index_image))
+
+            f_map.write('\n\nDistances information : ')
+            f_map.write('\n- MIN : ' + str(min_abs_dist))
+            f_map.write('\n- MAX : ' + str(max_abs_dist))
+            f_map.write('\n- AVG : ' + str(avg_abs_dist))
+
+            f_map.write('\n\nOther information : ')
+            f_map.write('\n- Detection limit : ' + str(p_limit))
+
+            # by default print last line
+            f_map.close()
+
+            print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)) + " Done..")
+            print("------------------------")
+
+            time.sleep(10)
+
+
+if __name__== "__main__":
+    main()

+ 196 - 0
predict_seuil_expe_maxwell_curve.py

@@ -0,0 +1,196 @@
+from sklearn.externals import joblib
+
+import numpy as np
+
+from ipfml import processing
+from PIL import Image
+
+import sys, os, getopt
+import subprocess
+import time
+
+from modules.utils import config as cfg
+
+config_filename           = cfg.config_filename
+scenes_path               = cfg.dataset_path
+min_max_filename          = cfg.min_max_filename_extension
+threshold_expe_filename   = cfg.seuil_expe_filename
+
+threshold_map_folder      = cfg.threshold_map_folder
+threshold_map_file_prefix = cfg.threshold_map_folder + "_"
+
+zones                     = cfg.zones_indices
+maxwell_scenes            = cfg.maxwell_scenes_names
+
+simulation_curves_zones   = "simulation_curves_zones_"
+tmp_filename              = '/tmp/__model__img_to_predict.png'
+
+current_dirpath = os.getcwd()
+
+
+def main():
+
+    p_custom = False
+
+    # TODO : use of argparse
+    
+    if len(sys.argv) <= 1:
+        print('Run with default parameters...')
+        print('python predict_seuil_expe_maxwell_curve.py --interval "0,20" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "ht:m:o:l:c", ["help=", "interval=", "model=", "mode=", "metric=", "limit_detection=", "custom="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python predict_seuil_expe_maxwell_curve.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python predict_seuil_expe_maxwell_curve.py --interval "xx,xx" --model path/to/xxxx.joblib --mode svdn --metric lab --limit_detection xx --custom min_max_filename')
+            sys.exit()
+        elif o in ("-t", "--interval"):
+            p_interval = a
+        elif o in ("-m", "--model"):
+            p_model_file = a
+        elif o in ("-o", "--mode"):
+            p_mode = a
+
+            if p_mode != 'svdn' and p_mode != 'svdne' and p_mode != 'svd':
+                assert False, "Mode not recognized"
+
+        elif o in ("-m", "--metric"):
+            p_metric = a
+        elif o in ("-l", "--limit_detection"):
+            p_limit = int(a)
+        elif o in ("-c", "--custom"):
+            p_custom = a
+        else:
+            assert False, "unhandled option"
+
+    scenes = os.listdir(scenes_path)
+
+    scenes = [s for s in scenes if s in maxwell_scenes]
+
+    print(scenes)
+
+    # go ahead each scenes
+    for id_scene, folder_scene in enumerate(scenes):
+
+        # only take in consideration maxwell scenes
+        if folder_scene in maxwell_scenes:
+
+            print(folder_scene)
+
+            scene_path = os.path.join(scenes_path, folder_scene)
+
+            config_path = os.path.join(scene_path, config_filename)
+
+            with open(config_path, "r") as config_file:
+                last_image_name = config_file.readline().strip()
+                prefix_image_name = config_file.readline().strip()
+                start_index_image = config_file.readline().strip()
+                end_index_image = config_file.readline().strip()
+                step_counter = int(config_file.readline().strip())
+
+            threshold_expes = []
+            threshold_expes_found = []
+            block_predictions_str = []
+
+            # get zones list info
+            for index in zones:
+                index_str = str(index)
+                if len(index_str) < 2:
+                    index_str = "0" + index_str
+                zone_folder = "zone"+index_str
+
+                threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
+
+                with open(threshold_path_file) as f:
+                    threshold = int(f.readline())
+                    threshold_expes.append(threshold)
+
+                    # Initialize default data to get detected model threshold found
+                    threshold_expes_found.append(int(end_index_image)) # by default use max
+
+                block_predictions_str.append(index_str + ";" + p_model_file + ";" + str(threshold) + ";" + str(start_index_image) + ";" + str(step_counter))
+
+            current_counter_index = int(start_index_image)
+            end_counter_index = int(end_index_image)
+
+            print(current_counter_index)
+
+            while(current_counter_index <= end_counter_index):
+
+                current_counter_index_str = str(current_counter_index)
+
+                while len(start_index_image) > len(current_counter_index_str):
+                    current_counter_index_str = "0" + current_counter_index_str
+
+                img_path = os.path.join(scene_path, prefix_image_name + current_counter_index_str + ".png")
+
+                current_img = Image.open(img_path)
+                img_blocks = processing.divide_in_blocks(current_img, (200, 200))
+
+                for id_block, block in enumerate(img_blocks):
+
+                    # check only if necessary for this scene (not already detected)
+                    #if not threshold_expes_detected[id_block]:
+
+                        tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
+                        block.save(tmp_file_path)
+
+                        python_cmd = "python predict_noisy_image_svd.py --image " + tmp_file_path + \
+                                        " --interval '" + p_interval + \
+                                        "' --model " + p_model_file  + \
+                                        " --mode " + p_mode + \
+                                        " --metric " + p_metric
+
+                        # specify use of custom file for min max normalization
+                        if p_custom:
+                            python_cmd = python_cmd + ' --custom ' + p_custom
+
+                        ## call command ##
+                        p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
+
+                        (output, err) = p.communicate()
+
+                        ## Wait for result ##
+                        p_status = p.wait()
+
+                        prediction = int(output)
+
+                        # save here in specific file of block all the predictions done
+                        block_predictions_str[id_block] = block_predictions_str[id_block] + ";" + str(prediction)
+
+                        print(str(id_block) + " : " + str(current_counter_index) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
+
+                current_counter_index += step_counter
+                print("------------------------")
+                print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)))
+                print("------------------------")
+
+            # end of scene => display of results
+
+            # construct path using model name for saving threshold map folder
+            model_threshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
+
+            # create threshold model path if necessary
+            if not os.path.exists(model_threshold_path):
+                os.makedirs(model_threshold_path)
+
+            map_filename = os.path.join(model_threshold_path, simulation_curves_zones + folder_scene)
+            f_map = open(map_filename, 'w')
+
+            for line in block_predictions_str:
+                f_map.write(line + '\n')
+            f_map.close()
+
+            print("Scene " + str(id_scene + 1) + "/" + str(len(maxwell_scenes)) + " Done..")
+            print("------------------------")
+
+            print("Model predictions are saved into %s" % map_filename)
+            time.sleep(10)
+
+
+if __name__== "__main__":
+    main()

+ 123 - 0
prediction_scene.py

@@ -0,0 +1,123 @@
+from sklearn.externals import joblib
+
+import numpy as np
+
+import pandas as pd
+from sklearn.metrics import accuracy_score
+from keras.models import Sequential
+from keras.layers import Conv1D, MaxPooling1D
+from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
+from keras import backend as K
+from keras.models import model_from_json
+from keras.wrappers.scikit_learn import KerasClassifier
+
+import sys, os, getopt
+import json
+
+from modules.utils import config as cfg
+
+output_model_folder = cfg.saved_models_folder
+
+def main():
+
+    # TODO : use of argparse
+    
+    if len(sys.argv) <= 1:
+        print('Run with default parameters...')
+        print('python prediction_scene.py --data xxxx.csv --model xxxx.joblib --output xxxx --scene xxxx')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "hd:o:s", ["help=", "data=", "model=", "output=", "scene="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python prediction_scene.py --data xxxx.csv --model xxxx.joblib --output xxxx --scene xxxx')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python prediction_scene.py --data xxxx.csv --model xxxx.joblib --output xxxx --scene xxxx')
+            sys.exit()
+        elif o in ("-d", "--data"):
+            p_data_file = a
+        elif o in ("-m", "--model"):
+            p_model_file = a
+        elif o in ("-o", "--output"):
+            p_output = a
+        elif o in ("-s", "--scene"):
+            p_scene = a
+        else:
+            assert False, "unhandled option"
+
+    if '.joblib' in p_model_file:
+        kind_model = 'sklearn'
+        model_ext = '.joblib'
+
+    if '.json' in p_model_file:
+        kind_model = 'keras'
+        model_ext = '.json'
+
+    if not os.path.exists(output_model_folder):
+        os.makedirs(output_model_folder)
+
+    dataset = pd.read_csv(p_data_file, header=None, sep=";")
+
+    y_dataset = dataset.ix[:,0]
+    x_dataset = dataset.ix[:,1:]
+
+    noisy_dataset = dataset[dataset.ix[:, 0] == 1]
+    not_noisy_dataset = dataset[dataset.ix[:, 0] == 0]
+
+    y_noisy_dataset = noisy_dataset.ix[:, 0]
+    x_noisy_dataset = noisy_dataset.ix[:, 1:]
+
+    y_not_noisy_dataset = not_noisy_dataset.ix[:, 0]
+    x_not_noisy_dataset = not_noisy_dataset.ix[:, 1:]
+
+    if kind_model == 'keras':
+        with open(p_model_file, 'r') as f:
+            json_model = json.load(f)
+            model = model_from_json(json_model)
+            model.load_weights(p_model_file.replace('.json', '.h5'))
+
+            model.compile(loss='binary_crossentropy',
+                  optimizer='adam',
+                  metrics=['accuracy'])
+
+        _, vector_size = np.array(x_dataset).shape
+
+        # reshape all data
+        x_dataset = np.array(x_dataset).reshape(len(x_dataset), vector_size, 1)
+        x_noisy_dataset = np.array(x_noisy_dataset).reshape(len(x_noisy_dataset), vector_size, 1)
+        x_not_noisy_dataset = np.array(x_not_noisy_dataset).reshape(len(x_not_noisy_dataset), vector_size, 1)
+
+
+    if kind_model == 'sklearn':
+        model = joblib.load(p_model_file)
+
+    if kind_model == 'keras':
+        y_pred = model.predict_classes(x_dataset)
+        y_noisy_pred = model.predict_classes(x_noisy_dataset)
+        y_not_noisy_pred = model.predict_classes(x_not_noisy_dataset)
+
+    if kind_model == 'sklearn':
+        y_pred = model.predict(x_dataset)
+        y_noisy_pred = model.predict(x_noisy_dataset)
+        y_not_noisy_pred = model.predict(x_not_noisy_dataset)
+
+    accuracy_global = accuracy_score(y_dataset, y_pred)
+    accuracy_noisy = accuracy_score(y_noisy_dataset, y_noisy_pred)
+    accuracy_not_noisy = accuracy_score(y_not_noisy_dataset, y_not_noisy_pred)
+
+    if(p_scene):
+        print(p_scene + " | " + str(accuracy_global) + " | " + str(accuracy_noisy) + " | " + str(accuracy_not_noisy))
+    else:
+        print(str(accuracy_global) + " \t | " + str(accuracy_noisy) + " \t | " + str(accuracy_not_noisy))
+
+        with open(p_output, 'w') as f:
+            f.write("Global accuracy found %s " % str(accuracy_global))
+            f.write("Noisy accuracy found %s " % str(accuracy_noisy))
+            f.write("Not noisy accuracy found %s " % str(accuracy_not_noisy))
+            for prediction in y_pred:
+                f.write(str(prediction) + '\n')
+
+if __name__== "__main__":
+    main()

+ 11 - 0
requirements.txt

@@ -0,0 +1,11 @@
+IPFML
+sklearn
+scikit-image
+tensorflow
+keras
+image_slicer
+Pillow
+pydot
+matplotlib
+path.py
+pandas

+ 24 - 0
runAll_maxwell.sh

@@ -0,0 +1,24 @@
+#! bin/bash
+
+# erase "models_info/models_comparisons.csv" file and write new header
+file_path='models_info/models_comparisons.csv'
+
+erased=$1
+
+if [ "${erased}" == "Y" ]; then
+    echo "Previous data file erased..."
+    rm ${file_path}
+    mkdir -p models_info
+    touch ${file_path}
+
+    # add of header
+    echo 'model_name; vector_size; start; end; nb_zones; metric; mode; tran_size; val_size; test_size; train_pct_size; val_pct_size; test_pct_size; train_acc; val_acc; test_acc; all_acc; F1_train; recall_train; roc_auc_train; F1_val; recall_val; roc_auc_val; F1_test; recall_test; roc_auc_test; F1_all; recall_all; roc_auc_all;' >> ${file_path}
+
+fi
+
+for size in {"4","8","16","26","32","40"}; do
+
+    for metric in {"lab","mscn","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2"}; do
+        bash generateAndTrain_maxwell.sh ${size} ${metric}
+    done
+done

+ 24 - 0
runAll_maxwell_custom.sh

@@ -0,0 +1,24 @@
+#! bin/bash
+
+# erase "models_info/models_comparisons.csv" file and write new header
+file_path='models_info/models_comparisons.csv'
+
+erased=$1
+
+if [ "${erased}" == "Y" ]; then
+    echo "Previous data file erased..."
+    rm ${file_path}
+    mkdir -p models_info
+    touch ${file_path}
+
+    # add of header
+    echo 'model_name; vector_size; start; end; nb_zones; metric; mode; tran_size; val_size; test_size; train_pct_size; val_pct_size; test_pct_size; train_acc; val_acc; test_acc; all_acc; F1_train; recall_train; roc_auc_train; F1_val; recall_val; roc_auc_val; F1_test; recall_test; roc_auc_test; F1_all; recall_all; roc_auc_all;' >> ${file_path}
+
+fi
+
+for size in {"4","8","16","26","32","40"}; do
+
+    for metric in {"lab","mscn","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2","ica_diff","svd_trunc_diff","ipca_diff","svd_reconstruct"}; do
+        bash generateAndTrain_maxwell_custom.sh ${size} ${metric}
+    done
+done

+ 24 - 0
runAll_maxwell_custom_center.sh

@@ -0,0 +1,24 @@
+#! bin/bash
+
+# erase "models_info/models_comparisons.csv" file and write new header
+file_path='models_info/models_comparisons.csv'
+
+erased=$1
+
+if [ "${erased}" == "Y" ]; then
+    echo "Previous data file erased..."
+    rm ${file_path}
+    mkdir -p models_info
+    touch ${file_path}
+
+    # add of header
+    echo 'model_name; vector_size; start; end; nb_zones; metric; mode; tran_size; val_size; test_size; train_pct_size; val_pct_size; test_pct_size; train_acc; val_acc; test_acc; all_acc; F1_train; recall_train; roc_auc_train; F1_val; recall_val; roc_auc_val; F1_test; recall_test; roc_auc_test; F1_all; recall_all; roc_auc_all;' >> ${file_path}
+
+fi
+
+for size in {"4","8","16","26","32","40"}; do
+
+    for metric in {"lab","mscn","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2","ica_diff","svd_trunc_diff","ipca_diff","svd_reconstruct"}; do
+        bash generateAndTrain_maxwell_custom_center.sh ${size} ${metric}
+    done
+done

+ 24 - 0
runAll_maxwell_custom_split.sh

@@ -0,0 +1,24 @@
+#! bin/bash
+
+# erase "models_info/models_comparisons.csv" file and write new header
+file_path='models_info/models_comparisons.csv'
+
+erased=$1
+
+if [ "${erased}" == "Y" ]; then
+    echo "Previous data file erased..."
+    rm ${file_path}
+    mkdir -p models_info
+    touch ${file_path}
+
+    # add of header
+    echo 'model_name; vector_size; start; end; nb_zones; metric; mode; tran_size; val_size; test_size; train_pct_size; val_pct_size; test_pct_size; train_acc; val_acc; test_acc; all_acc; F1_train; recall_train; roc_auc_train; F1_val; recall_val; roc_auc_val; F1_test; recall_test; roc_auc_test; F1_all; recall_all; roc_auc_all;' >> ${file_path}
+
+fi
+
+for size in {"4","8","16","26","32","40"}; do
+
+    for metric in {"lab","mscn","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2","ica_diff","svd_trunc_diff","ipca_diff","svd_reconstruct"}; do
+        bash generateAndTrain_maxwell_custom_split.sh ${size} ${metric}
+    done
+done

+ 62 - 0
run_maxwell_simulation.sh

@@ -0,0 +1,62 @@
+#! bin/bash
+
+# file which contains model names we want to use for simulation
+simulate_models="simulate_models.csv"
+
+# selection of four scenes (only maxwell)
+scenes="A, D, G, H"
+VECTOR_SIZE=200
+
+for size in {"4","8","16","26","32","40"}; do
+    for metric in {"lab","mscn","mscn_revisited","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2"}; do
+
+        half=$(($size/2))
+        start=-$half
+
+        for counter in {0..4}; do
+             end=$(($start+$size))
+
+             if [ "$end" -gt "$VECTOR_SIZE" ]; then
+                 start=$(($VECTOR_SIZE-$size))
+                 end=$(($VECTOR_SIZE))
+             fi
+
+             if [ "$start" -lt "0" ]; then
+                 start=$((0))
+                 end=$(($size))
+             fi
+
+             for nb_zones in {4,6,8,10,12,14}; do
+
+                 for mode in {"svd","svdn","svdne"}; do
+                     for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
+
+                        FILENAME="data/${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                        MODEL_NAME="${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+
+                        if grep -xq "${MODEL_NAME}" "${simulate_models}"; then
+                            echo "Run simulation for model ${MODEL_NAME}"
+
+                            # by default regenerate model
+                            python generate_data_model_random.py --output ${FILENAME} --interval "${start},${end}" --kind ${mode} --metric ${metric} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 40 --random 1
+
+                            python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
+
+                            python predict_seuil_expe_maxwell_curve.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric} --limit_detection '2'
+
+                            python save_model_result_in_md_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric}
+
+                        fi
+                    done
+                done
+            done
+
+            if [ "$counter" -eq "0" ]; then
+                start=$(($start+50-$half))
+            else
+                start=$(($start+50))
+            fi
+
+        done
+    done
+done

+ 63 - 0
run_maxwell_simulation_custom.sh

@@ -0,0 +1,63 @@
+#! bin/bash
+
+# file which contains model names we want to use for simulation
+simulate_models="simulate_models.csv"
+
+# selection of four scenes (only maxwell)
+scenes="A, D, G, H"
+VECTOR_SIZE=200
+
+for size in {"4","8","16","26","32","40"}; do
+    for metric in {"lab","mscn","mscn_revisited","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2","ica_diff","ipca_diff","svd_trunc_diff","svd_reconstruct"}; do
+
+        half=$(($size/2))
+        start=-$half
+
+        for counter in {0..4}; do
+             end=$(($start+$size))
+
+             if [ "$end" -gt "$VECTOR_SIZE" ]; then
+                 start=$(($VECTOR_SIZE-$size))
+                 end=$(($VECTOR_SIZE))
+             fi
+
+             if [ "$start" -lt "0" ]; then
+                 start=$((0))
+                 end=$(($size))
+             fi
+
+             for nb_zones in {4,6,8,10,12,14}; do
+
+                 for mode in {"svd","svdn","svdne"}; do
+                     for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
+
+                        FILENAME="data/${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                        MODEL_NAME="${model}_N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}"
+                        CUSTOM_MIN_MAX_FILENAME="N${size}_B${start}_E${end}_nb_zones_${nb_zones}_${metric}_${mode}_min_max"
+
+                        if grep -xq "${MODEL_NAME}" "${simulate_models}"; then
+                            echo "Run simulation for model ${MODEL_NAME}"
+
+                            # by default regenerate model
+                            python generate_data_model_random.py --output ${FILENAME} --interval "${start},${end}" --kind ${mode} --metric ${metric} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 40 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
+
+                            python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
+
+                            python predict_seuil_expe_maxwell_curve.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric} --limit_detection '2' --custom ${CUSTOM_MIN_MAX_FILENAME}
+
+                            python save_model_result_in_md_maxwell.py --interval "${start},${end}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --metric ${metric}
+
+                        fi
+                    done
+                done
+            done
+
+            if [ "$counter" -eq "0" ]; then
+                start=$(($start+50-$half))
+            else
+                start=$(($start+50))
+            fi
+
+        done
+    done
+done

+ 157 - 0
train_model.py

@@ -0,0 +1,157 @@
+from sklearn.model_selection import train_test_split
+from sklearn.model_selection import GridSearchCV
+from sklearn.linear_model import LogisticRegression
+from sklearn.ensemble import RandomForestClassifier, VotingClassifier
+
+import sklearn.svm as svm
+from sklearn.utils import shuffle
+from sklearn.externals import joblib
+from sklearn.metrics import accuracy_score, f1_score
+from sklearn.model_selection import cross_val_score
+
+import numpy as np
+import pandas as pd
+import sys, os, getopt
+
+from modules.utils import config as cfg
+from modules import models as mdl
+
+saved_models_folder = cfg.saved_models_folder
+models_list         = cfg.models_names_list
+
+current_dirpath = os.getcwd()
+output_model_folder = os.path.join(current_dirpath, saved_models_folder)
+
+
+def main():
+
+    # TODO : use argparse
+    
+    if len(sys.argv) <= 2:
+        print('python train_model.py --data xxxx --output xxxx --choice svm_model')
+        sys.exit(2)
+    try:
+        opts, args = getopt.getopt(sys.argv[1:], "hd:o:c", ["help=", "data=", "output=", "choice="])
+    except getopt.GetoptError:
+        # print help information and exit:
+        print('python train_model.py --data xxxx --output xxxx --choice svm_model')
+        sys.exit(2)
+    for o, a in opts:
+        if o == "-h":
+            print('python train_model.py --data xxxx --output xxxx --choice svm_model')
+            sys.exit()
+        elif o in ("-d", "--data"):
+            p_data_file = a
+        elif o in ("-o", "--output"):
+            p_output = a
+        elif o in ("-c", "--choice"):
+            p_choice = a
+
+            if not p_choice in models_list:
+                assert False, "Unknown model choice"
+
+        else:
+            assert False, "unhandled option"
+
+    if not os.path.exists(output_model_folder):
+        os.makedirs(output_model_folder)
+
+    ########################
+    # 1. Get and prepare data
+    ########################
+    dataset_train = pd.read_csv(p_data_file + '.train', header=None, sep=";")
+    dataset_test = pd.read_csv(p_data_file + '.test', header=None, sep=";")
+
+    # default first shuffle of data
+    dataset_train = shuffle(dataset_train)
+    dataset_test = shuffle(dataset_test)
+
+    # get dataset with equal number of classes occurences
+    noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 1]
+    not_noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 0]
+    nb_noisy_train = len(noisy_df_train.index)
+
+    noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 1]
+    not_noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 0]
+    nb_noisy_test = len(noisy_df_test.index)
+
+    final_df_train = pd.concat([not_noisy_df_train[0:nb_noisy_train], noisy_df_train])
+    final_df_test = pd.concat([not_noisy_df_test[0:nb_noisy_test], noisy_df_test])
+
+    # shuffle data another time
+    final_df_train = shuffle(final_df_train)
+    final_df_test = shuffle(final_df_test)
+
+    final_df_train_size = len(final_df_train.index)
+    final_df_test_size = len(final_df_test.index)
+
+    # use of the whole data set for training
+    x_dataset_train = final_df_train.ix[:,1:]
+    x_dataset_test = final_df_test.ix[:,1:]
+
+    y_dataset_train = final_df_train.ix[:,0]
+    y_dataset_test = final_df_test.ix[:,0]
+
+    #######################
+    # 2. Construction of the model : Ensemble model structure
+    #######################
+
+    print("-------------------------------------------")
+    print("Train dataset size: ", final_df_train_size)
+    model = mdl.get_trained_model(p_choice, x_dataset_train, y_dataset_train)
+
+    #######################
+    # 3. Fit model : use of cross validation to fit model
+    #######################
+    val_scores = cross_val_score(model, x_dataset_train, y_dataset_train, cv=5)
+    print("Accuracy: %0.2f (+/- %0.2f)" % (val_scores.mean(), val_scores.std() * 2))
+
+    ######################
+    # 4. Test : Validation and test dataset from .test dataset
+    ######################
+
+    # we need to specify validation size to 20% of whole dataset
+    val_set_size = int(final_df_train_size/3)
+    test_set_size = val_set_size
+
+    total_validation_size = val_set_size + test_set_size
+
+    if final_df_test_size > total_validation_size:
+        x_dataset_test = x_dataset_test[0:total_validation_size]
+        y_dataset_test = y_dataset_test[0:total_validation_size]
+
+    X_test, X_val, y_test, y_val = train_test_split(x_dataset_test, y_dataset_test, test_size=0.5, random_state=1)
+
+    y_test_model = model.predict(X_test)
+    y_val_model = model.predict(X_val)
+
+    val_accuracy = accuracy_score(y_val, y_val_model)
+    test_accuracy = accuracy_score(y_test, y_test_model)
+
+    val_f1 = f1_score(y_val, y_val_model)
+    test_f1 = f1_score(y_test, y_test_model)
+
+
+    ###################
+    # 5. Output : Print and write all information in csv
+    ###################
+
+    print("Validation dataset size ", val_set_size)
+    print("Validation: ", val_accuracy)
+    print("Validation F1: ", val_f1)
+    print("Test dataset size ", test_set_size)
+    print("Test: ", val_accuracy)
+    print("Test F1: ", test_f1)
+
+
+    ##################
+    # 6. Save model : create path if not exists
+    ##################
+
+    if not os.path.exists(saved_models_folder):
+        os.makedirs(saved_models_folder)
+
+    joblib.dump(model, output_model_folder + '/' + p_output + '.joblib')
+
+if __name__== "__main__":
+    main()