Explorar el Código

update of the whole project to enable use of new dataset

Jérôme BUISINE hace 4 años
padre
commit
30d257e0f7
Se han modificado 33 ficheros con 169 adiciones y 3596 borrados
  1. 4 99
      README.md
  2. 14 3
      custom_config.py
  3. 42 21
      data_attributes.py
  4. 0 147
      display/display_reconstructed_image_from_humans.py
  5. 0 184
      display/display_reconstructed_image_from_simulation.py
  6. 0 128
      display/display_simulation_curves.py
  7. 14 8
      find_best_attributes.py
  8. 17 17
      generate/generate_all_data.py
  9. 55 63
      generate/generate_data_model.py
  10. 0 299
      generate/generate_data_model_random_all.py
  11. 0 310
      generate/generate_data_model_random_center.py
  12. 0 309
      generate/generate_data_model_random_split.py
  13. 3 3
      models.py
  14. 0 93
      others/save_model_result_in_md.py
  15. 0 324
      others/save_model_result_in_md_maxwell.py
  16. 0 62
      others/testModelByScene.sh
  17. 0 70
      others/testModelByScene_maxwell.sh
  18. 0 214
      prediction/predict_seuil_expe.py
  19. 0 169
      prediction/predict_seuil_expe_curve_opti_scene.py
  20. 0 166
      prediction/predict_seuil_expe_curve_scene.py
  21. 0 216
      prediction/predict_seuil_expe_maxwell.py
  22. 0 174
      prediction/predict_seuil_expe_maxwell_curve.py
  23. 0 176
      prediction/predict_seuil_expe_maxwell_curve_opti.py
  24. 0 114
      prediction/prediction_scene.py
  25. 3 2
      requirements.txt
  26. 0 3
      run.sh
  27. 0 35
      run/runAll_maxwell_custom.sh
  28. 0 37
      run/runAll_maxwell_custom_optimization_attributes.sh
  29. 0 38
      run/runAll_maxwell_custom_optimization_filters.sh
  30. 0 6
      simulation/generate_all_simulate_curves.sh
  31. 0 39
      simulation/run_maxwell_simulation_filters_statistics.sh
  32. 0 56
      simulation/run_maxwell_simulation_filters_statistics_opti.sh
  33. 17 11
      train_model.py

+ 4 - 99
README.md

@@ -17,25 +17,14 @@ Filters list:
 pip install -r requirements.txt
 ```
 
-Generate all needed data for each features (which requires the whole dataset. In order to get it, you need to contact us).
-
-```bash
-python generate/generate_all_data.py --feature all
-```
-
-
 ## Project structure
 
 ### Link to your dataset
 
-You have to create a symbolic link to your own database which respects this structure:
+You need database which respects this structure:
 
 - dataset/
   - Scene1/
-    - zone00/
-    - ...
-    - zone15/
-      - seuilExpe (file which contains threshold samples of zone image perceived by human)
     - Scene1_00050.png
     - Scene1_00070.png
     - ...
@@ -45,12 +34,6 @@ You have to create a symbolic link to your own database which respects this stru
     - ...
   - ...
 
-Create your symbolic link:
-
-```
-ln -s /path/to/your/data dataset
-```
-
 ### Code architecture description
 
 - **modules/\***: contains all modules usefull for the whole project (such as configuration variables)
@@ -58,14 +41,6 @@ ln -s /path/to/your/data dataset
 - **generate/\***: contains python scripts for generate data from scenes (described later)
 - **data_processing/\***: all python scripts for generate custom dataset for models
 - **prediction/\***: all python scripts for predict new threshold from computed models
-- **simulation/\***: contains all bash scripts used for run simulation from models
-- **display/\***: contains all python scripts used for display Scene information (such as Singular values...)
-- **run/\***: bash scripts to run few step at once : 
-  - generate custom dataset
-  - train model
-  - keep model performance
-  - run simulation (if necessary)
-- **others/\***: folders which contains others scripts such as script for getting performance of model on specific scene and write it into Mardown file.
 - **data_attributes.py**: files which contains all extracted features implementation from an image.
 - **custom_config.py**: override the main configuration project of `modules/config/global_config.py`
 - **train_model.py**: script which is used to run specific model available.
@@ -73,79 +48,9 @@ ln -s /path/to/your/data dataset
 ### Generated data directories:
 
 - **data/\***: folder which will contain all generated *.train* & *.test* files in order to train model.
-- **saved_models/\***: all scikit learn or keras models saved.
-- **models_info/\***: all markdown files generated to get quick information about model performance and prediction obtained after running `run/runAll_*.sh` script.
-- **results/**:  This folder contains `model_comparisons.csv` file used for store models performance.
-
-
-## How to use ?
-
-**Remark**: Note here that all python script have *--help* command.
-
-```
-python generate_data_model.py --help
-```
-
-Parameters explained:
-- **feature**: feature choice wished
-- **output**: filename of data (which will be split into two parts, *.train* and *.test* relative to your choices). Need to be into `data` folder.
-- **interval**: the interval of data you want to use from SVD vector.
-- **kind**: kind of data ['svd', 'svdn', 'svdne']; not normalize, normalize vector only and normalize together.
-- **scenes**: scenes choice for training dataset.
-- **zones**: zones to take for training dataset.
-- **step**: specify if all pictures are used or not using step process.
-- **percent**: percent of data amount of zone to take (choose randomly) of zone
-- **custom**: specify if you want your data normalized using interval and not the whole singular values vector. If it is, the value of this parameter is the output filename which will store the min and max value found. This file will be usefull later to make prediction with model (optional parameter).
-
-### Train model
-
-This is an example of how to train a model
-
-```bash
-python train_model.py --data 'data/xxxx' --output 'model_file_to_save' --choice 'model_choice'
-```
-
-Expected values for the **choice** parameter are ['svm_model', 'ensemble_model', 'ensemble_model_v2'].
-
-### Predict image using model
-
-Now we have a model trained, we can use it with an image as input:
-
-```bash
-python prediction/predict_noisy_image_svd.py --image path/to/image.png --interval "x,x" --model saved_models/xxxxxx.joblib --feature 'lab' --mode 'svdn' --custom 'min_max_filename'
-```
-
-- **feature**: feature choice need to be one of the listed above.
-- **custom**: specify filename with custom min and max from your data interval. This file was generated using **custom** parameter of one of the **generate_data_model\*.py** script (optional parameter).
-
-The model will return only 0 or 1:
-- 1 means noisy image is detected.
-- 0 means image seem to be not noisy.
-
-All SVD features developed need:
-- Name added into *feature_choices_labels* global array variable of `custom_config.py` file.
-- A specification of how you compute the feature into *get_image_features* method of `data_attributes.py` file.
-
-### Predict scene using model
-
-Now we have a model trained, we can use it with an image as input:
-
-```bash
-python prediction_scene.py --data path/to/xxxx.csv --model saved_model/xxxx.joblib --output xxxxx --scene xxxx
-```
-**Remark**: *scene* parameter expected need to be the correct name of the Scene.
-
-### Visualize data
-
-All scripts with names **display/display_\*.py** are used to display data information or results.
-
-Just use --help option to get more information.
-
-### Simulate model on scene
-
-All scripts named **prediction/predict_seuil_expe\*.py** are used to simulate model prediction during rendering process. Do not forget the **custom** parameter filename if necessary.
-
-Once you have simulation done. Checkout your **threshold_map/%MODEL_NAME%/simulation\_curves\_zones\_\*/** folder and use it with help of **display_simulation_curves.py** script.
+- **data/saved_models/\***: all scikit learn or keras models saved.
+- **data/models_info/\***: all markdown files generated to get quick information about model performance and prediction obtained after running `run/runAll_*.sh` script.
+- **data/results/**:  This folder contains `model_comparisons.csv` file used for store models performance.
 
 ## License
 

+ 14 - 3
custom_config.py

@@ -1,17 +1,28 @@
 from modules.config.attributes_config import *
 
+import os
+
 # store all variables from global config
 context_vars = vars()
 
 # folders
-logs_folder                             = 'logs'
-backup_folder                           = 'backups'
+
+output_data_folder              = 'data'
+output_data_generated           = os.path.join(output_data_folder, 'generated')
+output_datasets                 = os.path.join(output_data_folder, 'datasets')
+output_zones_learned            = os.path.join(output_data_folder, 'learned_zones')
+output_models                   = os.path.join(output_data_folder, 'saved_models')
+output_results_folder           = os.path.join(output_data_folder, 'results')
+output_logs_folder              = os.path.join(output_data_folder, 'logs')
+output_backup_folder            = os.path.join(output_data_folder, 'backups')
+
+results_information_folder      = os.path.join(output_data_folder, 'results')
 
 ## min_max_custom_folder           = 'custom_norm'
 ## correlation_indices_folder      = 'corr_indices'
 
 # variables
-features_choices_labels                 = ['filters_statistics']
+features_choices_labels                 = features_choices_labels + ['filters_statistics']
 optimization_filters_result_filename    = 'optimization_comparisons_filters.csv'
 optimization_attributes_result_filename = 'optimization_comparisons_attributes.csv'
 

+ 42 - 21
data_attributes.py

@@ -4,7 +4,7 @@ import sys
 
 # image transform imports
 from PIL import Image
-from skimage import color
+from skimage import color, restoration
 from sklearn.decomposition import FastICA
 from sklearn.decomposition import IncrementalPCA
 from sklearn.decomposition import TruncatedSVD
@@ -12,6 +12,7 @@ from numpy.linalg import svd as lin_svd
 from scipy.signal import medfilt2d, wiener, cwt
 import pywt
 import cv2
+import gzip
 
 from ipfml.processing import transform, compression, segmentation
 from ipfml import utils
@@ -38,15 +39,16 @@ def get_image_features(data_type, block):
         # compute all filters statistics
         def get_stats(arr, I_filter):
 
-            # e1       = np.abs(arr - I_filter)
-            # L        = np.array(e1)
-            # mu0      = np.mean(L)
-            # A        = L - mu0
-            # H        = A * A
-            # E        = np.sum(H) / (img_width * img_height)
-            # P        = np.sqrt(E)
+            e1       = np.abs(arr - I_filter)
+            L        = np.array(e1)
+            mu0      = np.mean(L)
+            A        = L - mu0
+            H        = A * A
+            E        = np.sum(H) / (img_width * img_height)
+            P        = np.sqrt(E)
 
-            return np.mean(I_filter), np.std(I_filter)
+            return mu0, P
+            # return np.mean(I_filter), np.std(I_filter)
 
         stats = []
 
@@ -89,25 +91,44 @@ def get_image_features(data_type, block):
         
         data = np.array(data)
 
+    if 'statistics_extended' in data_type:
+
+        data = get_image_features('filters_statistics', block)
+
+        # add kolmogorov complexity
+        bytes_data = np.array(block).tobytes()
+        compress_data = gzip.compress(bytes_data)
+
+        data.append(data, sys.getsizeof(compress_data))
+
+        # add sobel complexity (kernel size of 5)
+        sobelx = cv2.Sobel(lab_img, cv2.CV_64F, 1, 0, ksize=5)
+        sobely = cv2.Sobel(lab_img, cv2.CV_64F, 0, 1,ksize=5)
+
+        sobel_mag = np.array(np.hypot(sobelx, sobely), 'uint8')  # magnitude
+
+        data.append(data, np.std(sobel_mag))
+
+    if 'lab' in data_type:
+
+        data = transform.get_LAB_L_SVD_s(block)
+
     return data
 
 
 def w2d(arr, mode='haar', level=1):
-    #convert to float   
+    #convert to float    
     imArray = arr
-    np.divide(imArray, 255)
-
-    # compute coefficients 
-    coeffs=pywt.wavedec2(imArray, mode, level=level)
 
-    #Process Coefficients
-    coeffs_H=list(coeffs)  
-    coeffs_H[0] *= 0
+    sigma = restoration.estimate_sigma(imArray, average_sigmas=True, multichannel=False)
+    imArray_H = restoration.denoise_wavelet(imArray, sigma=sigma, wavelet='db1', mode='soft', 
+        wavelet_levels=2, 
+        multichannel=False, 
+        convert2ycbcr=False, 
+        method='VisuShrink', 
+        rescale_sigma=True)
 
-    # reconstruction
-    imArray_H = pywt.waverec2(coeffs_H, mode)
-    imArray_H *= 255
-    imArray_H = np.uint8(imArray_H)
+    # imArray_H *= 100
 
     return imArray_H
 

+ 0 - 147
display/display_reconstructed_image_from_humans.py

@@ -1,147 +0,0 @@
-# main imports
-import numpy as np
-import pandas as pd
-import math
-import time
-
-import os, sys, argparse
-
-# image processing imports
-import matplotlib.pyplot as plt
-from PIL import Image
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from data_attributes import get_image_features
-from modules.utils import data as dt
-
-# other variables
-learned_zones_folder = cfg.learned_zones_folder
-models_name          = cfg.models_names_list
-
-# utils information
-zone_width, zone_height = (200, 200)
-scene_width, scene_height = (800, 800)
-nb_x_parts = math.floor(scene_width / zone_width)
-
-
-def reconstruct_image(scene_name, output):
-    """
-    @brief Method used to display simulation given .csv files
-    @param scene_name, scene name used
-    @param output, the output filename
-    @return nothing
-    """
-
-    # compute zone start index
-    zones_coordinates = []
-    for zone_index in cfg.zones_indices:
-        x_zone = (zone_index % nb_x_parts) * zone_width
-        y_zone = (math.floor(zone_index / nb_x_parts)) * zone_height
-
-        zones_coordinates.append((x_zone, y_zone))
-
-    scene_folder = os.path.join(cfg.dataset_path, scene_name)
-
-    folder_scene_elements = os.listdir(scene_folder)
-
-    zones_folder = [zone for zone in folder_scene_elements if 'zone' in zone]
-    zones_folder = sorted(zones_folder)
-
-    scenes_images = [img for img in folder_scene_elements if cfg.scene_image_extension in img]
-    scenes_images = sorted(scenes_images)
-
-    # 1. find thresholds from scene
-    human_thresholds = []
-
-    for zone_folder in zones_folder:
-        zone_path = os.path.join(scene_folder, zone_folder)
-        
-        with open(os.path.join(zone_path, cfg.seuil_expe_filename)) as f:
-            human_thresholds.append(int(f.readline()))
-
-    # 2. find images for each zone which are attached to these human thresholds by the model
-    zone_images_index = []
-
-    for threshold in human_thresholds:
-
-        current_image_index = 0
-
-        for image_name in scenes_images:
-
-            image_quality = dt.get_scene_image_quality(image_name)
-
-            if image_quality > threshold:
-                current_image_index = image_quality
-                break
-
-
-        str_index = str(current_image_index)
-        while len(str_index) < 5:
-            str_index = "0" + str_index
-
-        zone_images_index.append(str_index)
-
-    images_zones = []
-    line_images_zones = []
-    # get image using threshold by zone
-    for id, zone_index in enumerate(zone_images_index):
-        filtered_images = [img for img in scenes_images if zone_index in img]
-        
-        if len(filtered_images) > 0:
-            image_name = filtered_images[0]
-        else:
-            image_name = scenes_images[-1]
-        
-        image_path = os.path.join(scene_folder, image_name)
-        selected_image = Image.open(image_path)
-
-        x_zone, y_zone = zones_coordinates[id]
-        zone_image = np.array(selected_image)[y_zone:y_zone+zone_height, x_zone:x_zone+zone_width]
-        line_images_zones.append(zone_image)
-
-        if int(id + 1) % int(scene_width / zone_width) == 0:
-            images_zones.append(np.concatenate(line_images_zones, axis=1))
-            line_images_zones = []
-
-
-    # 3. reconstructed the image using these zones
-    reconstructed_image = np.concatenate(images_zones, axis=0)
-
-    # 4. Save the image with generated name based on scene
-    reconstructed_pil_img = Image.fromarray(reconstructed_image)
-
-    folders = output.split('/')
-    if len(folders) > 1:
-        output_folder = '/'.join(folders[:len(folders) - 1])
-        
-        if not os.path.exists(output_folder):
-            os.makedirs(output_folder)
-
-    reconstructed_pil_img.save(output)
-
-
-def main():
-
-    parser = argparse.ArgumentParser(description="Compute and save reconstructed images from human thresholds")
-
-    parser.add_argument('--scene', type=str, help='Scene index to use', choices=cfg.scenes_indices)
-    parser.add_argument('--output', type=str, help='Output reconstructed image path and filename')
-
-    args = parser.parse_args()
-
-    p_scene = args.scene
-    p_output = args.output
-    
-    scenes_list = cfg.scenes_names
-    scenes_indices = cfg.scenes_indices
-
-    scene_index = scenes_indices.index(p_scene.strip())
-    scene_name = scenes_list[scene_index]
-
-    reconstruct_image(scene_name, p_output)
-
-if __name__== "__main__":
-    main()

+ 0 - 184
display/display_reconstructed_image_from_simulation.py

@@ -1,184 +0,0 @@
-# main imports
-import numpy as np
-import pandas as pd
-import math
-import time
-
-import os, sys, argparse
-
-# image processing imports
-import matplotlib.pyplot as plt
-from PIL import Image
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from data_attributes import get_image_features
-
-# other variables
-learned_zones_folder = cfg.learned_zones_folder
-models_name          = cfg.models_names_list
-
-# utils information
-zone_width, zone_height = (200, 200)
-scene_width, scene_height = (800, 800)
-nb_x_parts = math.floor(scene_width / zone_width)
-
-
-def reconstruct_image(folder_path, model_name, p_limit):
-    """
-    @brief Method used to display simulation given .csv files
-    @param folder_path, folder which contains all .csv files obtained during simulation
-    @param model_name, current name of model
-    @return nothing
-    """
-
-    for name in models_name:
-        if name in model_name:
-            data_filename = model_name
-            learned_zones_folder_path = os.path.join(learned_zones_folder, data_filename)
-
-    data_files = [x for x in os.listdir(folder_path) if '.png' not in x]
-
-    scene_names = [f.split('_')[3] for f in data_files]
-
-    # compute zone start index
-    zones_coordinates = []
-    for index, zone_index in enumerate(cfg.zones_indices):
-        x_zone = (zone_index % nb_x_parts) * zone_width
-        y_zone = (math.floor(zone_index / nb_x_parts)) * zone_height
-
-        zones_coordinates.append((x_zone, y_zone))
-
-    print(zones_coordinates)
-
-    for id, f in enumerate(data_files):
-
-        scene_name = scene_names[id]
-        path_file = os.path.join(folder_path, f)
-
-        # TODO : check if necessary to keep information about zone learned when displaying data
-        scenes_zones_used_file_path = os.path.join(learned_zones_folder_path, scene_name + '.csv')
-
-        zones_used = []
-
-        if os.path.exists(scenes_zones_used_file_path):
-            with open(scenes_zones_used_file_path, 'r') as f:
-                zones_used = [int(x) for x in f.readline().split(';') if x != '']
-
-        # 1. find estimated threshold for each zone scene using `data_files` and p_limit
-        model_thresholds = []
-        df = pd.read_csv(path_file, header=None, sep=";")
-
-        for index, row in df.iterrows():
-
-            row = np.asarray(row)
-
-            #threshold = row[2]
-            start_index = row[3]
-            step_value = row[4]
-            rendering_predictions = row[5:]
-
-            nb_generated_image = 0
-            nb_not_noisy_prediction = 0
-
-            for prediction in rendering_predictions:
-                
-                if int(prediction) == 0:
-                    nb_not_noisy_prediction += 1
-                else:
-                    nb_not_noisy_prediction = 0
-
-                # exit loop if limit is targeted
-                if nb_not_noisy_prediction >= p_limit:
-                    break
-
-                nb_generated_image += 1
-            
-            current_threshold = start_index + step_value * nb_generated_image
-            model_thresholds.append(current_threshold)
-
-        # 2. find images for each zone which are attached to this estimated threshold by the model
-
-        zone_images_index = []
-
-        for est_threshold in model_thresholds:
-
-            str_index = str(est_threshold)
-            while len(str_index) < 5:
-                str_index = "0" + str_index
-
-            zone_images_index.append(str_index)
-
-        scene_folder = os.path.join(cfg.dataset_path, scene_name)
-        
-        scenes_images = [img for img in os.listdir(scene_folder) if cfg.scene_image_extension in img]
-        scenes_images = sorted(scenes_images)
-
-        images_zones = []
-        line_images_zones = []
-        # get image using threshold by zone
-        for id, zone_index in enumerate(zone_images_index):
-            filtered_images = [img for img in scenes_images if zone_index in img]
-            
-            if len(filtered_images) > 0:
-                image_name = filtered_images[0]
-            else:
-                image_name = scenes_images[-1]
-            
-            #print(image_name)
-            image_path = os.path.join(scene_folder, image_name)
-            selected_image = Image.open(image_path)
-
-            x_zone, y_zone = zones_coordinates[id]
-            zone_image = np.array(selected_image)[y_zone:y_zone+zone_height, x_zone:x_zone+zone_width]
-            line_images_zones.append(zone_image)
-
-            if int(id + 1) % int(scene_width / zone_width) == 0:
-                images_zones.append(np.concatenate(line_images_zones, axis=1))
-                print(len(line_images_zones))
-                line_images_zones = []
-
-
-        # 3. reconstructed the image using these zones
-        reconstructed_image = np.concatenate(images_zones, axis=0)
-
-        # 4. Save the image with generated name based on scene, model and `p_limit`
-        reconstructed_pil_img = Image.fromarray(reconstructed_image)
-
-        output_path = os.path.join(folder_path, scene_names[id] + '_reconstruction_limit_' + str(p_limit) + '.png')
-
-        reconstructed_pil_img.save(output_path)
-
-
-def main():
-
-    parser = argparse.ArgumentParser(description="Display simulations curves from simulation data")
-
-    parser.add_argument('--folder', type=str, help='Folder which contains simulations data for scenes')
-    parser.add_argument('--model', type=str, help='Name of the model used for simulations')
-    parser.add_argument('--limit', type=int, help='Detection limit to target to stop rendering (number of times model tells image has not more noise)')
-
-    args = parser.parse_args()
-
-    p_folder = args.folder
-    p_limit  = args.limit
-
-    if args.model:
-        p_model = args.model
-    else:
-        # find p_model from folder if model arg not given (folder path need to have model name)
-        if p_folder.split('/')[-1]:
-            p_model = p_folder.split('/')[-1]
-        else:
-            p_model = p_folder.split('/')[-2]
-    
-    print(p_model)
-
-    reconstruct_image(p_folder, p_model, p_limit)
-
-    print(p_folder)
-
-if __name__== "__main__":
-    main()

+ 0 - 128
display/display_simulation_curves.py

@@ -1,128 +0,0 @@
-# main imports
-import numpy as np
-import pandas as pd
-
-import os, sys, argparse
-
-# image processing imports
-import matplotlib.pyplot as plt
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from data_attributes import get_image_features
-
-# other variables
-learned_zones_folder = cfg.learned_zones_folder
-models_name          = cfg.models_names_list
-label_freq           = 6
-
-def display_curves(folder_path, model_name):
-    """
-    @brief Method used to display simulation given .csv files
-    @param folder_path, folder which contains all .csv files obtained during simulation
-    @param model_name, current name of model
-    @return nothing
-    """
-
-    for name in models_name:
-        if name in model_name:
-            data_filename = model_name
-            learned_zones_folder_path = os.path.join(learned_zones_folder, data_filename)
-
-    data_files = [x for x in os.listdir(folder_path) if '.png' not in x]
-
-    scene_names = [f.split('_')[3] for f in data_files]
-
-    for id, f in enumerate(data_files):
-
-        print(scene_names[id])
-        path_file = os.path.join(folder_path, f)
-
-        scenes_zones_used_file_path = os.path.join(learned_zones_folder_path, scene_names[id] + '.csv')
-
-        # by default zone used is empty
-        zones_used = []
-
-        if os.path.exists(scenes_zones_used_file_path):
-            with open(scenes_zones_used_file_path, 'r') as f:
-                zones_used = [int(x) for x in f.readline().split(';') if x != '']
-
-        print(zones_used)
-
-        df = pd.read_csv(path_file, header=None, sep=";")
-
-        fig=plt.figure(figsize=(35, 22))
-        fig.suptitle("Detection simulation for " + scene_names[id] + " scene", fontsize=20)
-
-        for index, row in df.iterrows():
-
-            row = np.asarray(row)
-
-            threshold = row[2]
-            start_index = row[3]
-            step_value = row[4]
-
-            counter_index = 0
-
-            current_value = start_index
-
-            while(current_value < threshold):
-                counter_index += 1
-                current_value += step_value
-
-            fig.add_subplot(4, 4, (index + 1))
-            plt.plot(row[5:])
-
-            if index in zones_used:
-                ax = plt.gca()
-                ax.set_facecolor((0.9, 0.95, 0.95))
-
-            # draw vertical line from (70,100) to (70, 250)
-            plt.plot([counter_index, counter_index], [-2, 2], 'k-', lw=2, color='red')
-
-            if index % 4 == 0:
-                plt.ylabel('Not noisy / Noisy', fontsize=20)
-
-            if index >= 12:
-                plt.xlabel('Samples per pixel', fontsize=20)
-
-            x_labels = [id * step_value + start_index for id, val in enumerate(row[5:]) if id % label_freq == 0]
-
-            x = [v for v in np.arange(0, len(row[5:])+1) if v % label_freq == 0]
-
-            plt.xticks(x, x_labels, rotation=45)
-            plt.ylim(-1, 2)
-
-        plt.savefig(os.path.join(folder_path, scene_names[id] + '_simulation_curve.png'))
-        #plt.show()
-
-def main():
-
-    parser = argparse.ArgumentParser(description="Display simulations curves from simulation data")
-
-    parser.add_argument('--folder', type=str, help='Folder which contains simulations data for scenes')
-    parser.add_argument('--model', type=str, help='Name of the model used for simulations')
-
-    args = parser.parse_args()
-
-    p_folder = args.folder
-
-    if args.model:
-        p_model = args.model
-    else:
-        # find p_model from folder if model arg not given (folder path need to have model name)
-        if p_folder.split('/')[-1]:
-            p_model = p_folder.split('/')[-1]
-        else:
-            p_model = p_folder.split('/')[-2]
-    
-    print(p_model)
-
-    display_curves(p_folder, p_model)
-
-    print(p_folder)
-
-if __name__== "__main__":
-    main()

+ 14 - 8
find_best_attributes.py

@@ -52,7 +52,7 @@ def validator(solution):
 
 # init solution (26 attributes)
 def init():
-    return BinarySolution([], 26).random(validator)
+    return BinarySolution([], number_of_values).random(validator)
 
 def loadDataset(filename):
 
@@ -95,13 +95,18 @@ def main():
 
     parser = argparse.ArgumentParser(description="Train and find best filters to use for model")
 
-    parser.add_argument('--data', type=str, help='dataset filename prefix (without .train and .test)')
-    parser.add_argument('--choice', type=str, help='model choice from list of choices', choices=models_list)
+    parser.add_argument('--data', type=str, help='dataset filename prefix (without .train and .test)', required=True)
+    parser.add_argument('--choice', type=str, help='model choice from list of choices', choices=models_list, required=True)
+    parser.add_argument('--length', type=str, help='max data length (need to be specify for evaluator)', required=True)
 
     args = parser.parse_args()
 
     p_data_file = args.data
     p_choice    = args.choice
+    p_length    = args.length
+
+    global number_of_values
+    number_of_values = p_length
 
     print(p_data_file)
 
@@ -109,8 +114,8 @@ def main():
     x_train, y_train, x_test, y_test = loadDataset(p_data_file)
 
     # create `logs` folder if necessary
-    if not os.path.exists(cfg.logs_folder):
-        os.makedirs(cfg.logs_folder)
+    if not os.path.exists(cfg.output_logs_folder):
+        os.makedirs(cfg.output_logs_folder)
 
     logging.basicConfig(format='%(asctime)s %(message)s', filename='logs/%s.log' % p_data_file.split('/')[-1], level=logging.DEBUG)
 
@@ -130,6 +135,7 @@ def main():
         y_train_filters = y_train
         x_test_filters = x_test.iloc[:, indices]
 
+        # TODO : use of GPU implementation of SVM
         model = mdl.get_trained_model(p_choice, x_train_filters, y_train_filters)
         
         y_test_model = model.predict(x_test_filters)
@@ -143,10 +149,10 @@ def main():
 
         return test_roc_auc
 
-    if not os.path.exists(cfg.backup_folder):
-        os.makedirs(cfg.backup_folder)
+    if not os.path.exists(cfg.output_backup_folder):
+        os.makedirs(cfg.output_backup_folder)
 
-    backup_file_path = os.path.join(cfg.backup_folder, p_data_file.split('/')[-1] + '.csv')
+    backup_file_path = os.path.join(cfg.output_backup_folder, p_data_file.split('/')[-1] + '.csv')
 
     # prepare optimization algorithm
     updators = [SimpleBinaryMutation(), SimpleMutation(), SimpleCrossover()]

+ 17 - 17
generate/generate_all_data.py

@@ -24,19 +24,15 @@ zone_folder             = cfg.zone_folder
 min_max_filename        = cfg.min_max_filename_extension
 
 # define all scenes values
-scenes_list             = cfg.scenes_names
-scenes_indexes          = cfg.scenes_indices
 choices                 = cfg.normalization_choices
-path                    = cfg.dataset_path
 zones                   = cfg.zones_indices
-seuil_expe_filename     = cfg.seuil_expe_filename
 
 features_choices        = cfg.features_choices_labels
-output_data_folder      = cfg.output_data_folder
+output_data_folder      = cfg.output_data_generated
 
 generic_output_file_svd = '_random.csv'
 
-def generate_data_svd(data_type, mode):
+def generate_data_svd(data_type, mode, dataset, output):
     """
     @brief Method which generates all .csv files from scenes
     @param data_type,  feature choice
@@ -44,7 +40,7 @@ def generate_data_svd(data_type, mode):
     @return nothing
     """
 
-    scenes = os.listdir(path)
+    scenes = os.listdir(dataset)
     # remove min max file from scenes folder
     scenes = [s for s in scenes if min_max_filename not in s]
 
@@ -52,13 +48,13 @@ def generate_data_svd(data_type, mode):
     min_val_found = sys.maxsize
     max_val_found = 0
 
-    data_min_max_filename = os.path.join(path, data_type + min_max_filename)
+    data_min_max_filename = os.path.join(dataset, data_type + min_max_filename)
 
     # go ahead each scenes
     for folder_scene in scenes:
 
         print(folder_scene)
-        scene_path = os.path.join(path, folder_scene)
+        scene_path = os.path.join(dataset, folder_scene)
 
         # getting output filename
         output_svd_filename = data_type + "_" + mode + generic_output_file_svd
@@ -116,7 +112,7 @@ def generate_data_svd(data_type, mode):
                     data = utils.normalize_arr_with_range(data, min_val, max_val)
 
                 if mode == 'svdn':
-                    data = utils.normalize_arr(data)
+                    data = utils.normalize_arr_with_range(data)
 
                 # save min and max found from dataset in order to normalize data using whole data known
                 if mode == 'svd':
@@ -164,26 +160,30 @@ def main():
 
    
     parser.add_argument('--feature', type=str, 
-                                    help="feature choice in order to compute data (use 'all' if all features are needed)")
+                                    help="feature choice in order to compute data (use 'all' if all features are needed)", required=True)
+    parser.add_argument('--dataset', type=str, help='dataset folder with all scenes', required=True)
+    parser.add_argument('--output', type=str, help='output expected name of generated file', required=True)
 
     args = parser.parse_args()
 
     p_feature = args.feature
+    p_dataset = args.dataset
+    p_output  = args.output
 
     # generate all or specific feature data
     if p_feature == 'all':
         for m in features_choices:
-            generate_data_svd(m, 'svd')
-            generate_data_svd(m, 'svdn')
-            generate_data_svd(m, 'svdne')
+            generate_data_svd(m, 'svd', p_dataset, p_output)
+            generate_data_svd(m, 'svdn', p_dataset, p_output)
+            generate_data_svd(m, 'svdne', p_dataset, p_output)
     else:
 
         if p_feature not in features_choices:
             raise ValueError('Unknown feature choice : ', features_choices)
             
-        generate_data_svd(p_feature, 'svd')
-        generate_data_svd(p_feature, 'svdn')
-        generate_data_svd(p_feature, 'svdne')
+        generate_data_svd(p_feature, 'svd', p_dataset, p_output)
+        generate_data_svd(p_feature, 'svdn', p_dataset, p_output)
+        generate_data_svd(p_feature, 'svdne', p_dataset, p_output)
 
 if __name__== "__main__":
     main()

+ 55 - 63
generate/generate_data_model.py

@@ -18,20 +18,16 @@ from data_attributes import get_image_features
 
 
 # getting configuration information
-learned_folder          = cfg.learned_zones_folder
+learned_folder          = cfg.output_zones_learned
 min_max_filename        = cfg.min_max_filename_extension
 
 # define all scenes variables
-scenes_list             = cfg.scenes_names
-scenes_indexes          = cfg.scenes_indices
-path                    = cfg.dataset_path
 zones                   = cfg.zones_indices
 seuil_expe_filename     = cfg.seuil_expe_filename
 
-renderer_choices        = cfg.renderer_choices
 normalization_choices   = cfg.normalization_choices
 features_choices        = cfg.features_choices_labels
-output_data_folder      = cfg.output_data_folder
+output_data_folder      = cfg.output_datasets
 custom_min_max_folder   = cfg.min_max_custom_folder
 min_max_ext             = cfg.min_max_filename_extension
 zones_indices           = cfg.zones_indices
@@ -41,7 +37,7 @@ generic_output_file_svd = '_random.csv'
 min_value_interval = sys.maxsize
 max_value_interval = 0
 
-def construct_new_line(path_seuil, interval, line, choice, each, norm):
+def construct_new_line(threshold, interval, line, choice, each, norm):
     begin, end = interval
 
     line_data = line.split(';')
@@ -56,10 +52,7 @@ def construct_new_line(path_seuil, interval, line, choice, each, norm):
         if choice == 'svdn':
             features = utils.normalize_arr(features)
 
-    with open(path_seuil, "r") as seuil_file:
-        seuil_learned = int(seuil_file.readline().strip())
-
-    if seuil_learned > int(seuil):
+    if threshold > int(seuil):
         line = '1'
     else:
         line = '0'
@@ -71,7 +64,7 @@ def construct_new_line(path_seuil, interval, line, choice, each, norm):
 
     return line
 
-def get_min_max_value_interval(_scenes_list, _interval, _feature):
+def get_min_max_value_interval(path, _scenes_list, _interval, _feature):
 
     global min_value_interval, max_value_interval
 
@@ -123,13 +116,10 @@ def get_min_max_value_interval(_scenes_list, _interval, _feature):
                         max_value_interval = max_value
 
 
-def generate_data_model(_filename, _interval, _choice, _feature, _scenes = scenes_list, _zones = zones_indices, _percent = 1, _step=1, _each=1, _norm=False, _custom=False):
-
-    output_train_filename = _filename + ".train"
-    output_test_filename = _filename + ".test"
+def generate_data_model(_filename, _data_path, _interval, _choice, _feature, _thresholds, _learned_zones, _step=1, _each=1, _norm=False, _custom=False):
 
-    if not '/' in output_train_filename:
-        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
+    output_train_filename = os.path.join(output_data_folder, _filename + ".train")
+    output_test_filename = os.path.join(output_data_folder,_filename + ".test")
 
     # create path if not exists
     if not os.path.exists(output_data_folder):
@@ -138,24 +128,15 @@ def generate_data_model(_filename, _interval, _choice, _feature, _scenes = scene
     train_file = open(output_train_filename, 'w')
     test_file = open(output_test_filename, 'w')
 
-    for folder_scene in scenes_list:
-
-        # only take care of maxwell scenes
-        scene_path = os.path.join(path, folder_scene)
-
-        zones_indices = zones
-
-        # write into file
-        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
-
-        if not os.path.exists(folder_learned_path):
-            os.makedirs(folder_learned_path)
+    # get zone indices
+    zones_indices = np.arange(16)
 
-        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
+    for folder_scene in _thresholds:
 
-        with open(file_learned_path, 'w') as f:
-            for i in _zones:
-                f.write(str(i) + ';')
+        # get train zones
+        train_zones = _learned_zones[folder_scene]
+        scene_thresholds = _thresholds[folder_scene]
+        scene_path = os.path.join(_data_path, folder_scene)
 
         for id_zone, index_folder in enumerate(zones_indices):
 
@@ -183,19 +164,16 @@ def generate_data_model(_filename, _interval, _choice, _feature, _scenes = scene
             lines_indexes = np.arange(num_lines)
             random.shuffle(lines_indexes)
 
-            path_seuil = os.path.join(zone_path, seuil_expe_filename)
-
             counter = 0
             # check if user select current scene and zone to be part of training data set
             for index in lines_indexes:
 
                 image_index = int(lines[index].split(';')[0])
-                percent = counter / num_lines
 
                 if image_index % _step == 0:
-                    line = construct_new_line(path_seuil, _interval, lines[index], _choice, _each, _norm)
+                    line = construct_new_line(scene_thresholds[id_zone], _interval, lines[index], _choice, _each, _norm)
 
-                    if id_zone in _zones and folder_scene in _scenes and percent <= _percent:
+                    if id_zone in train_zones:
                         train_file.write(line)
                     else:
                         test_file.write(line)
@@ -213,48 +191,63 @@ def main():
     # getting all params
     parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
 
-    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
+    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)', required=True)
+    parser.add_argument('--data', type=str, help='folder which contains data of dataset', required=True)
+    parser.add_argument('--thresholds', type=str, help='file with scene list information and thresholds', required=True)
+    parser.add_argument('--selected_zones', type=str, help='file which contains all selected zones of scene', required=True)  
+    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"', required=True)
     parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
-    parser.add_argument('--zones', type=str, help='Zones indices to use for training data set')
-    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)', default=1.0)
+    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices, required=True)
     parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
     parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
-    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
     parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
 
     args = parser.parse_args()
 
     p_filename = args.output
+    p_data     = args.data
+    p_thresholds = args.thresholds
+    p_selected_zones = args.selected_zones
     p_interval = list(map(int, args.interval.split(',')))
     p_kind     = args.kind
     p_feature  = args.feature
-    p_scenes   = args.scenes.split(',')
-    p_zones    = list(map(int, args.zones.split(',')))
-    p_percent  = args.percent
     p_step     = args.step
     p_each     = args.each
-    p_renderer = args.renderer
     p_custom   = args.custom
 
-    # list all possibles choices of renderer
-    scenes_list = dt.get_renderer_scenes_names(p_renderer)
-    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
+    # 1. retrieve human_thresholds
+    human_thresholds = {}
+
+    # extract thresholds
+    with open(p_thresholds) as f:
+        thresholds_line = f.readlines()
+
+        for line in thresholds_line:
+            data = line.split(';')
+            del data[-1] # remove unused last element `\n`
+            current_scene = data[0]
+            thresholds_scene = data[1:]
 
-    # getting scenes from indexes user selection
-    scenes_selected = []
+            # TODO : check if really necessary
+            if current_scene != '50_shades_of_grey':
+                human_thresholds[current_scene] = [ int(threshold) for threshold in  thresholds_scene ]
 
-    for scene_id in p_scenes:
-        index = scenes_indices.index(scene_id.strip())
-        scenes_selected.append(scenes_list[index])
+    # 2. get selected zones
+    selected_zones = {}
+    with(open(p_selected_zones, 'r')) as f:
 
-    print(scenes_selected)
+        for line in f.readlines():
+
+            data = line.split(';')
+            del data[-1]
+            scene_name = data[0]
+            thresholds = data[1:]
+
+            selected_zones[scene_name] = [ int(t) for t in thresholds ]
 
     # find min max value if necessary to renormalize data
     if p_custom:
-        get_min_max_value_interval(scenes_list, p_interval, p_feature)
+        get_min_max_value_interval(p_data, selected_zones, p_interval, p_feature)
 
         # write new file to save
         if not os.path.exists(custom_min_max_folder):
@@ -267,9 +260,8 @@ def main():
             f.write(str(min_value_interval) + '\n')
             f.write(str(max_value_interval) + '\n')
 
-
     # create database using img folder (generate first time only)
-    generate_data_model(p_filename, p_interval, p_kind, p_feature, scenes_selected, p_zones, p_percent, p_step, p_each, p_custom)
+    generate_data_model(p_filename, p_data, p_interval, p_kind, p_feature, human_thresholds, selected_zones, p_step, p_each, p_custom)
 
 if __name__== "__main__":
-    main()
+    main()

+ 0 - 299
generate/generate_data_model_random_all.py

@@ -1,299 +0,0 @@
-# main imports
-import sys, os, argparse
-import numpy as np
-import pandas as pd
-import random
-
-# image processing imports
-from PIL import Image
-
-from ipfml import utils
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-from data_attributes import get_image_features
-
-
-# getting configuration information
-learned_folder          = cfg.learned_zones_folder
-min_max_filename        = cfg.min_max_filename_extension
-
-# define all scenes variables
-all_scenes_list         = cfg.scenes_names
-all_scenes_indices      = cfg.scenes_indices
-
-normalization_choices   = cfg.normalization_choices
-path                    = cfg.dataset_path
-zones                   = cfg.zones_indices
-seuil_expe_filename     = cfg.seuil_expe_filename
-
-renderer_choices        = cfg.renderer_choices
-features_choices        = cfg.features_choices_labels
-output_data_folder      = cfg.output_data_folder
-custom_min_max_folder   = cfg.min_max_custom_folder
-min_max_ext             = cfg.min_max_filename_extension
-
-generic_output_file_svd = '_random.csv'
-
-min_value_interval      = sys.maxsize
-max_value_interval      = 0
-
-def construct_new_line(path_seuil, interval, line, choice, each, norm):
-    begin, end = interval
-
-    line_data = line.split(';')
-    seuil = line_data[0]
-    features = line_data[begin+1:end+1]
-
-    # keep only if modulo result is 0 (keep only each wanted values)
-    features = [float(m) for id, m in enumerate(features) if id % each == 0]
-
-    # TODO : check if it's always necessary to do that (loss of information for svd)
-    if norm:
-
-        if choice == 'svdne':
-            features = utils.normalize_arr_with_range(features, min_value_interval, max_value_interval)
-        if choice == 'svdn':
-            features = utils.normalize_arr(features)
-
-    with open(path_seuil, "r") as seuil_file:
-        seuil_learned = int(seuil_file.readline().strip())
-
-    if seuil_learned > int(seuil):
-        line = '1'
-    else:
-        line = '0'
-
-    for val in features:
-        line += ';'
-        line += str(val)
-    line += '\n'
-
-    return line
-
-
-def get_min_max_value_interval(_scenes_list, _interval, _feature):
-
-    global min_value_interval, max_value_interval
-
-    scenes = os.listdir(path)
-
-    # remove min max file from scenes folder
-    scenes = [s for s in scenes if min_max_filename not in s]
-
-    for folder_scene in scenes:
-
-        # only take care of maxwell scenes
-        if folder_scene in _scenes_list:
-
-            scene_path = os.path.join(path, folder_scene)
-
-            zones_folder = []
-            # create zones list
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zones_folder.append("zone"+index_str)
-
-            for zone_folder in zones_folder:
-
-                zone_path = os.path.join(scene_path, zone_folder)
-
-                # if custom normalization choices then we use svd values not already normalized
-                data_filename = _feature + "_svd"+ generic_output_file_svd
-
-                data_file_path = os.path.join(zone_path, data_filename)
-
-                # getting number of line and read randomly lines
-                f = open(data_file_path)
-                lines = f.readlines()
-
-                # check if user select current scene and zone to be part of training data set
-                for line in lines:
-
-                    begin, end = _interval
-
-                    line_data = line.split(';')
-
-                    features = line_data[begin+1:end+1]
-                    features = [float(m) for m in features]
-
-                    min_value = min(features)
-                    max_value = max(features)
-
-                    if min_value < min_value_interval:
-                        min_value_interval = min_value
-
-                    if max_value > max_value_interval:
-                        max_value_interval = max_value
-
-
-def generate_data_model(_scenes_list, _filename, _interval, _choice, _feature, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
-
-    output_train_filename = _filename + ".train"
-    output_test_filename = _filename + ".test"
-
-    if not '/' in output_train_filename:
-        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
-
-    # create path if not exists
-    if not os.path.exists(output_data_folder):
-        os.makedirs(output_data_folder)
-
-    train_file_data = []
-    test_file_data  = []
-
-    for folder_scene in _scenes_list:
-
-        scene_path = os.path.join(path, folder_scene)
-
-        zones_indices = zones
-
-        # shuffle list of zones (=> randomly choose zones)
-        # only in random mode
-        if _random:
-            random.shuffle(zones_indices)
-
-        # store zones learned
-        learned_zones_indices = zones_indices[:_nb_zones]
-
-        # write into file
-        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
-
-        if not os.path.exists(folder_learned_path):
-            os.makedirs(folder_learned_path)
-
-        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
-
-        with open(file_learned_path, 'w') as f:
-            for i in learned_zones_indices:
-                f.write(str(i) + ';')
-
-        for id_zone, index_folder in enumerate(zones_indices):
-
-            index_str = str(index_folder)
-            if len(index_str) < 2:
-                index_str = "0" + index_str
-            current_zone_folder = "zone" + index_str
-
-            zone_path = os.path.join(scene_path, current_zone_folder)
-
-            # if custom normalization choices then we use svd values not already normalized
-            if _custom:
-                data_filename = _feature + "_svd" + generic_output_file_svd
-            else:
-                data_filename = _feature + "_" + _choice + generic_output_file_svd
-
-            data_file_path = os.path.join(zone_path, data_filename)
-
-            # getting number of line and read randomly lines
-            f = open(data_file_path)
-            lines = f.readlines()
-
-            num_lines = len(lines)
-
-            # randomly shuffle image
-            if _random:
-                random.shuffle(lines)
-
-            path_seuil = os.path.join(zone_path, seuil_expe_filename)
-
-            counter = 0
-            # check if user select current scene and zone to be part of training data set
-            for data in lines:
-
-                percent = counter / num_lines
-                image_index = int(data.split(';')[0])
-
-                if image_index % _step == 0:
-                    line = construct_new_line(path_seuil, _interval, data, _choice, _each, _custom)
-
-                    if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
-                        train_file_data.append(line)
-                    else:
-                        test_file_data.append(line)
-
-                counter += 1
-
-            f.close()
-
-    train_file = open(output_train_filename, 'w')
-    test_file = open(output_test_filename, 'w')
-
-    for line in train_file_data:
-        train_file.write(line)
-
-    for line in test_file_data:
-        test_file.write(line)
-
-    train_file.close()
-    test_file.close()
-
-
-def main():
-
-    # getting all params
-    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
-
-    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
-    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
-    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
-    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
-    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
-    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
-    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    p_filename = args.output
-    p_interval = list(map(int, args.interval.split(',')))
-    p_kind     = args.kind
-    p_feature  = args.feature
-    p_scenes   = args.scenes.split(',')
-    p_nb_zones = args.nb_zones
-    p_random   = args.random
-    p_percent  = args.percent
-    p_step     = args.step
-    p_each     = args.each
-    p_renderer = args.renderer
-    p_custom   = args.custom
-
-    # list all possibles choices of renderer
-    scenes_list = dt.get_renderer_scenes_names(p_renderer)
-    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
-
-    # getting scenes from indexes user selection
-    scenes_selected = []
-
-    for scene_id in p_scenes:
-        index = scenes_indices.index(scene_id.strip())
-        scenes_selected.append(scenes_list[index])
-
-    # find min max value if necessary to renormalize data
-    if p_custom:
-        get_min_max_value_interval(scenes_list, p_interval, p_feature)
-
-        # write new file to save
-        if not os.path.exists(custom_min_max_folder):
-            os.makedirs(custom_min_max_folder)
-
-        min_max_filename_path = os.path.join(custom_min_max_folder, p_custom)
-
-        with open(min_max_filename_path, 'w') as f:
-            f.write(str(min_value_interval) + '\n')
-            f.write(str(max_value_interval) + '\n')
-
-    # create database using img folder (generate first time only)
-    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_feature, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
-
-if __name__== "__main__":
-    main()

+ 0 - 310
generate/generate_data_model_random_center.py

@@ -1,310 +0,0 @@
-# main imports
-import sys, os, argparse
-import numpy as np
-import pandas as pd
-import random
-
-# image processing imports
-from PIL import Image
-
-from ipfml import utils
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-from data_attributes import get_image_features
-
-
-# getting configuration information
-learned_folder          = cfg.learned_zones_folder
-min_max_filename        = cfg.min_max_filename_extension
-
-# define all scenes variables
-all_scenes_list         = cfg.scenes_names
-all_scenes_indices      = cfg.scenes_indices
-
-normalization_choices   = cfg.normalization_choices
-path                    = cfg.dataset_path
-zones                   = cfg.zones_indices
-seuil_expe_filename     = cfg.seuil_expe_filename
-
-renderer_choices        = cfg.renderer_choices
-features_choices        = cfg.features_choices_labels
-output_data_folder      = cfg.output_data_folder
-custom_min_max_folder   = cfg.min_max_custom_folder
-min_max_ext             = cfg.min_max_filename_extension
-
-generic_output_file_svd = '_random.csv'
-
-min_value_interval      = sys.maxsize
-max_value_interval      = 0
-abs_gap_data            = 150
-
-
-def construct_new_line(seuil_learned, interval, line, choice, each, norm):
-    begin, end = interval
-
-    line_data = line.split(';')
-    seuil = line_data[0]
-    features = line_data[begin+1:end+1]
-
-    # keep only if modulo result is 0 (keep only each wanted values)
-    features = [float(m) for id, m in enumerate(features) if id % each == 0]
-
-    # TODO : check if it's always necessary to do that (loss of information for svd)
-    if norm:
-
-        if choice == 'svdne':
-            features = utils.normalize_arr_with_range(features, min_value_interval, max_value_interval)
-        if choice == 'svdn':
-            features = utils.normalize_arr(features)
-
-    if seuil_learned > int(seuil):
-        line = '1'
-    else:
-        line = '0'
-
-    for val in features:
-        line += ';'
-        line += str(val)
-    line += '\n'
-
-    return line
-
-def get_min_max_value_interval(_scenes_list, _interval, _feature):
-
-    global min_value_interval, max_value_interval
-
-    scenes = os.listdir(path)
-
-    # remove min max file from scenes folder
-    scenes = [s for s in scenes if min_max_filename not in s]
-
-    for folder_scene in scenes:
-
-        # only take care of maxwell scenes
-        if folder_scene in _scenes_list:
-
-            scene_path = os.path.join(path, folder_scene)
-
-            zones_folder = []
-            # create zones list
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zones_folder.append("zone"+index_str)
-
-            for zone_folder in zones_folder:
-
-                zone_path = os.path.join(scene_path, zone_folder)
-
-                # if custom normalization choices then we use svd values not already normalized
-                data_filename = _feature + "_svd"+ generic_output_file_svd
-
-                data_file_path = os.path.join(zone_path, data_filename)
-
-                # getting number of line and read randomly lines
-                f = open(data_file_path)
-                lines = f.readlines()
-
-                # check if user select current scene and zone to be part of training data set
-                for line in lines:
-
-                    begin, end = _interval
-
-                    line_data = line.split(';')
-
-                    features = line_data[begin+1:end+1]
-                    features = [float(m) for m in features]
-
-                    min_value = min(features)
-                    max_value = max(features)
-
-                    if min_value < min_value_interval:
-                        min_value_interval = min_value
-
-                    if max_value > max_value_interval:
-                        max_value_interval = max_value
-
-
-def generate_data_model(_scenes_list, _filename, _interval, _choice, _feature, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
-
-    output_train_filename = _filename + ".train"
-    output_test_filename = _filename + ".test"
-
-    if not '/' in output_train_filename:
-        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
-
-    # create path if not exists
-    if not os.path.exists(output_data_folder):
-        os.makedirs(output_data_folder)
-
-    train_file_data = []
-    test_file_data  = []
-
-    for folder_scene in _scenes_list:
-
-        scene_path = os.path.join(path, folder_scene)
-
-        zones_indices = zones
-
-        # shuffle list of zones (=> randomly choose zones)
-        # only in random mode
-        if _random:
-            random.shuffle(zones_indices)
-
-        # store zones learned
-        learned_zones_indices = zones_indices[:_nb_zones]
-
-        # write into file
-        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
-
-        if not os.path.exists(folder_learned_path):
-            os.makedirs(folder_learned_path)
-
-        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
-
-        with open(file_learned_path, 'w') as f:
-            for i in learned_zones_indices:
-                f.write(str(i) + ';')
-
-        for id_zone, index_folder in enumerate(zones_indices):
-
-            index_str = str(index_folder)
-            if len(index_str) < 2:
-                index_str = "0" + index_str
-            current_zone_folder = "zone" + index_str
-
-            zone_path = os.path.join(scene_path, current_zone_folder)
-
-            # if custom normalization choices then we use svd values not already normalized
-            if _custom:
-                data_filename = _feature + "_svd"+ generic_output_file_svd
-            else:
-                data_filename = _feature + "_" + _choice + generic_output_file_svd
-
-            data_file_path = os.path.join(zone_path, data_filename)
-
-            # getting number of line and read randomly lines
-            f = open(data_file_path)
-            lines = f.readlines()
-
-            num_lines = len(lines)
-
-            # randomly shuffle image
-            if _random:
-                random.shuffle(lines)
-
-            path_seuil = os.path.join(zone_path, seuil_expe_filename)
-
-            with open(path_seuil, "r") as seuil_file:
-                seuil_learned = int(seuil_file.readline().strip())
-
-            counter = 0
-            # check if user select current scene and zone to be part of training data set
-            for data in lines:
-
-                percent = counter / num_lines
-                image_index = int(data.split(';')[0])
-
-                if image_index % _step == 0:
-
-                    with open(path_seuil, "r") as seuil_file:
-                        seuil_learned = int(seuil_file.readline().strip())
-
-                    gap_threshold = abs(seuil_learned - image_index)
-
-                    # only keep data near to threshold of zone image
-                    if gap_threshold <= abs_gap_data:
-
-                        line = construct_new_line(seuil_learned, _interval, data, _choice, _each, _custom)
-
-                        if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
-                            train_file_data.append(line)
-                        else:
-                            test_file_data.append(line)
-
-                counter += 1
-
-            f.close()
-
-    train_file = open(output_train_filename, 'w')
-    test_file = open(output_test_filename, 'w')
-
-    for line in train_file_data:
-        train_file.write(line)
-
-    for line in test_file_data:
-        test_file.write(line)
-
-    train_file.close()
-    test_file.close()
-
-
-def main():
-
-    # getting all params
-    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
-
-    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
-    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
-    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
-    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
-    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
-    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
-    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    p_filename = args.output
-    p_interval = list(map(int, args.interval.split(',')))
-    p_kind     = args.kind
-    p_feature  = args.feature
-    p_scenes   = args.scenes.split(',')
-    p_nb_zones = args.nb_zones
-    p_random   = args.random
-    p_percent  = args.percent
-    p_step     = args.step
-    p_each     = args.each
-    p_renderer = args.renderer
-    p_custom   = args.custom
-
-
-    # list all possibles choices of renderer
-    scenes_list = dt.get_renderer_scenes_names(p_renderer)
-    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
-
-    # getting scenes from indexes user selection
-    scenes_selected = []
-
-    for scene_id in p_scenes:
-        index = scenes_indices.index(scene_id.strip())
-        scenes_selected.append(scenes_list[index])
-
-    # find min max value if necessary to renormalize data
-    if p_custom:
-        get_min_max_value_interval(scenes_list, p_interval, p_feature)
-
-        # write new file to save
-        if not os.path.exists(custom_min_max_folder):
-            os.makedirs(custom_min_max_folder)
-
-        min_max_filename_path = os.path.join(custom_min_max_folder, p_custom)
-
-        with open(min_max_filename_path, 'w') as f:
-            f.write(str(min_value_interval) + '\n')
-            f.write(str(max_value_interval) + '\n')
-
-    # create database using img folder (generate first time only)
-    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_feature, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
-
-if __name__== "__main__":
-    main()

+ 0 - 309
generate/generate_data_model_random_split.py

@@ -1,309 +0,0 @@
-# main imports
-import sys, os, argparse
-import numpy as np
-import pandas as pd
-import random
-
-# image processing imports
-from PIL import Image
-
-from ipfml import utils
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-from data_attributes import get_image_features
-
-
-# getting configuration information
-learned_folder          = cfg.learned_zones_folder
-min_max_filename        = cfg.min_max_filename_extension
-
-# define all scenes variables
-all_scenes_list         = cfg.scenes_names
-all_scenes_indices      = cfg.scenes_indices
-
-normalization_choices   = cfg.normalization_choices
-path                    = cfg.dataset_path
-zones                   = cfg.zones_indices
-seuil_expe_filename     = cfg.seuil_expe_filename
-
-renderer_choices        = cfg.renderer_choices
-features_choices        = cfg.features_choices_labels
-output_data_folder      = cfg.output_data_folder
-custom_min_max_folder   = cfg.min_max_custom_folder
-min_max_ext             = cfg.min_max_filename_extension
-
-generic_output_file_svd = '_random.csv'
-
-min_value_interval      = sys.maxsize
-max_value_interval      = 0
-abs_gap_data            = 100
-
-
-def construct_new_line(seuil_learned, interval, line, choice, each, norm):
-    begin, end = interval
-
-    line_data = line.split(';')
-    seuil = line_data[0]
-    features = line_data[begin+1:end+1]
-
-    # keep only if modulo result is 0 (keep only each wanted values)
-    features = [float(m) for id, m in enumerate(features) if id % each == 0]
-
-    # TODO : check if it's always necessary to do that (loss of information for svd)
-    if norm:
-
-        if choice == 'svdne':
-            features = utils.normalize_arr_with_range(features, min_value_interval, max_value_interval)
-        if choice == 'svdn':
-            features = utils.normalize_arr(features)
-
-    if seuil_learned > int(seuil):
-        line = '1'
-    else:
-        line = '0'
-
-    for val in features:
-        line += ';'
-        line += str(val)
-    line += '\n'
-
-    return line
-
-def get_min_max_value_interval(_scenes_list, _interval, _feature):
-
-    global min_value_interval, max_value_interval
-
-    scenes = os.listdir(path)
-
-    # remove min max file from scenes folder
-    scenes = [s for s in scenes if min_max_filename not in s]
-
-    for folder_scene in scenes:
-
-        # only take care of maxwell scenes
-        if folder_scene in _scenes_list:
-
-            scene_path = os.path.join(path, folder_scene)
-
-            zones_folder = []
-            # create zones list
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zones_folder.append("zone"+index_str)
-
-            for zone_folder in zones_folder:
-
-                zone_path = os.path.join(scene_path, zone_folder)
-
-                # if custom normalization choices then we use svd values not already normalized
-                data_filename = _feature + "_svd"+ generic_output_file_svd
-
-                data_file_path = os.path.join(zone_path, data_filename)
-
-                # getting number of line and read randomly lines
-                f = open(data_file_path)
-                lines = f.readlines()
-
-                # check if user select current scene and zone to be part of training data set
-                for line in lines:
-
-                    begin, end = _interval
-
-                    line_data = line.split(';')
-
-                    features = line_data[begin+1:end+1]
-                    features = [float(m) for m in features]
-
-                    min_value = min(features)
-                    max_value = max(features)
-
-                    if min_value < min_value_interval:
-                        min_value_interval = min_value
-
-                    if max_value > max_value_interval:
-                        max_value_interval = max_value
-
-
-def generate_data_model(_scenes_list, _filename, _interval, _choice, _feature, _scenes, _nb_zones = 4, _percent = 1, _random=0, _step=1, _each=1, _custom = False):
-
-    output_train_filename = _filename + ".train"
-    output_test_filename = _filename + ".test"
-
-    if not '/' in output_train_filename:
-        raise Exception("Please select filename with directory path to save data. Example : data/dataset")
-
-    # create path if not exists
-    if not os.path.exists(output_data_folder):
-        os.makedirs(output_data_folder)
-
-    train_file_data = []
-    test_file_data  = []
-
-    for folder_scene in _scenes_list:
-
-        scene_path = os.path.join(path, folder_scene)
-
-        zones_indices = zones
-
-        # shuffle list of zones (=> randomly choose zones)
-        # only in random mode
-        if _random:
-            random.shuffle(zones_indices)
-
-        # store zones learned
-        learned_zones_indices = zones_indices[:_nb_zones]
-
-        # write into file
-        folder_learned_path = os.path.join(learned_folder, _filename.split('/')[1])
-
-        if not os.path.exists(folder_learned_path):
-            os.makedirs(folder_learned_path)
-
-        file_learned_path = os.path.join(folder_learned_path, folder_scene + '.csv')
-
-        with open(file_learned_path, 'w') as f:
-            for i in learned_zones_indices:
-                f.write(str(i) + ';')
-
-        for id_zone, index_folder in enumerate(zones_indices):
-
-            index_str = str(index_folder)
-            if len(index_str) < 2:
-                index_str = "0" + index_str
-            current_zone_folder = "zone" + index_str
-
-            zone_path = os.path.join(scene_path, current_zone_folder)
-
-            # if custom normalization choices then we use svd values not already normalized
-            if _custom:
-                data_filename = _feature + "_svd"+ generic_output_file_svd
-            else:
-                data_filename = _feature + "_" + _choice + generic_output_file_svd
-
-            data_file_path = os.path.join(zone_path, data_filename)
-
-            # getting number of line and read randomly lines
-            f = open(data_file_path)
-            lines = f.readlines()
-
-            num_lines = len(lines)
-
-            # randomly shuffle image
-            if _random:
-                random.shuffle(lines)
-
-            path_seuil = os.path.join(zone_path, seuil_expe_filename)
-
-            with open(path_seuil, "r") as seuil_file:
-                seuil_learned = int(seuil_file.readline().strip())
-
-            counter = 0
-            # check if user select current scene and zone to be part of training data set
-            for data in lines:
-
-                percent = counter / num_lines
-                image_index = int(data.split(';')[0])
-
-                if image_index % _step == 0:
-
-                    with open(path_seuil, "r") as seuil_file:
-                        seuil_learned = int(seuil_file.readline().strip())
-
-                    gap_threshold = abs(seuil_learned - image_index)
-
-                    if gap_threshold > abs_gap_data:
-
-                        line = construct_new_line(seuil_learned, _interval, data, _choice, _each, _custom)
-
-                        if id_zone < _nb_zones and folder_scene in _scenes and percent <= _percent:
-                            train_file_data.append(line)
-                        else:
-                            test_file_data.append(line)
-
-                counter += 1
-
-            f.close()
-
-    train_file = open(output_train_filename, 'w')
-    test_file = open(output_test_filename, 'w')
-
-    for line in train_file_data:
-        train_file.write(line)
-
-    for line in test_file_data:
-        test_file.write(line)
-
-    train_file.close()
-    test_file.close()
-
-
-def main():
-
-    # getting all params
-    parser = argparse.ArgumentParser(description="Generate data for model using correlation matrix information from data")
-
-    parser.add_argument('--output', type=str, help='output file name desired (.train and .test)')
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--kind', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scenes', type=str, help='List of scenes to use for training data')
-    parser.add_argument('--nb_zones', type=int, help='Number of zones to use for training data set')
-    parser.add_argument('--random', type=int, help='Data will be randomly filled or not', choices=[0, 1])
-    parser.add_argument('--percent', type=float, help='Percent of data use for train and test dataset (by default 1)')
-    parser.add_argument('--step', type=int, help='Photo step to keep for build datasets', default=1)
-    parser.add_argument('--each', type=int, help='Each features to keep from interval', default=1)
-    parser.add_argument('--renderer', type=str, help='Renderer choice in order to limit scenes used', choices=renderer_choices, default='all')
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    p_filename = args.output
-    p_interval = list(map(int, args.interval.split(',')))
-    p_kind     = args.kind
-    p_feature  = args.feature
-    p_scenes   = args.scenes.split(',')
-    p_nb_zones = args.nb_zones
-    p_random   = args.random
-    p_percent  = args.percent
-    p_step     = args.step
-    p_each     = args.each
-    p_renderer = args.renderer
-    p_custom   = args.custom
-
-
-    # list all possibles choices of renderer
-    scenes_list = dt.get_renderer_scenes_names(p_renderer)
-    scenes_indices = dt.get_renderer_scenes_indices(p_renderer)
-
-    # getting scenes from indexes user selection
-    scenes_selected = []
-
-    for scene_id in p_scenes:
-        index = scenes_indices.index(scene_id.strip())
-        scenes_selected.append(scenes_list[index])
-
-    # find min max value if necessary to renormalize data
-    if p_custom:
-        get_min_max_value_interval(scenes_list, p_interval, p_feature)
-
-        # write new file to save
-        if not os.path.exists(custom_min_max_folder):
-            os.makedirs(custom_min_max_folder)
-
-        min_max_filename_path = os.path.join(custom_min_max_folder, p_custom)
-
-        with open(min_max_filename_path, 'w') as f:
-            f.write(str(min_value_interval) + '\n')
-            f.write(str(max_value_interval) + '\n')
-
-    # create database using img folder (generate first time only)
-    generate_data_model(scenes_list, p_filename, p_interval, p_kind, p_feature, scenes_selected, p_nb_zones, p_percent, p_random, p_step, p_each, p_custom)
-
-if __name__== "__main__":
-    main()

+ 3 - 3
models.py

@@ -10,13 +10,13 @@ import sklearn.svm as svm
 
 def _get_best_model(X_train, y_train):
 
-    #Cs = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
-    Cs = [1, 2, 4, 8, 16, 32]
+    Cs = [0.001, 0.01, 0.1, 1, 10, 100, 1000]
+    #Cs = [1, 2, 4, 8, 16, 32]
     gammas = [0.001, 0.01, 0.1, 1, 5, 10, 100]
     param_grid = {'kernel':['rbf'], 'C': Cs, 'gamma' : gammas}
 
     svc = svm.SVC(probability=True)
-    clf = GridSearchCV(svc, param_grid, cv=10, scoring='accuracy', verbose=0)
+    clf = GridSearchCV(svc, param_grid, cv=10, scoring='accuracy', verbose=2)
 
     clf.fit(X_train, y_train)
 

+ 0 - 93
others/save_model_result_in_md.py

@@ -1,93 +0,0 @@
-# main imports
-import numpy as np
-import sys, os, argparse
-import subprocess
-import time
-
-# models imports
-from sklearn.externals import joblib
-
-# image processing imports
-from PIL import Image
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-
-# variables and parameters
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-markdowns_folder          = cfg.models_information_folder
-zones                     = cfg.zones_indices
-
-current_dirpath = os.getcwd()
-
-def main():
-
-    parser = argparse.ArgumentParser(description="Display SVD data of scene zone")
-
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--feature', type=str, help='Feature data choice', choices=cfg.features_choices_labels)
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=cfg.normalization_choices)
-
-    args = parser.parse_args()
-    
-    p_interval   = list(map(int, args.interval.split(',')))
-    p_model_file = args.model
-    p_metric     = args.metric
-    p_mode       = args.mode
-
-
-    # call model and get global result in scenes
-
-    begin, end = p_interval
-
-    bash_cmd = "bash others/testModelByScene.sh '" + str(begin) + "' '" + str(end) + "' '" + p_model_file + "' '" + p_mode + "' '" + p_metric + "'"
-    print(bash_cmd)
-
-    ## call command ##
-    p = subprocess.Popen(bash_cmd, stdout=subprocess.PIPE, shell=True)
-
-    (output, err) = p.communicate()
-
-    ## Wait for result ##
-    p_status = p.wait()
-
-    if not os.path.exists(markdowns_folder):
-        os.makedirs(markdowns_folder)
-
-    # get model name to construct model
-    md_model_path = os.path.join(markdowns_folder, p_model_file.split('/')[-1].replace('.joblib', '.md'))
-
-    with open(md_model_path, 'w') as f:
-        f.write(output.decode("utf-8"))
-
-        # read each threshold_map information if exists
-        model_map_info_path = os.path.join(threshold_map_folder, p_model_file.replace('saved_models/', ''))
-
-        if not os.path.exists(model_map_info_path):
-            f.write('\n\n No threshold map information')
-        else:
-            maps_files = os.listdir(model_map_info_path)
-
-            # get all map information
-            for t_map_file in maps_files:
-
-                file_path = os.path.join(model_map_info_path, t_map_file)
-                with open(file_path, 'r') as map_file:
-
-                    title_scene =  t_map_file.replace(threshold_map_file_prefix, '')
-                    f.write('\n\n## ' + title_scene + '\n')
-                    content = map_file.readlines()
-
-                    # getting each map line information
-                    for line in content:
-                        f.write(line)
-
-        f.close()
-
-if __name__== "__main__":
-    main()

+ 0 - 324
others/save_model_result_in_md_maxwell.py

@@ -1,324 +0,0 @@
-# main imports
-import numpy as np
-import pandas as pd
-
-import sys, os, argparse
-import subprocess
-import time
-import json
-
-# models imports
-from sklearn.utils import shuffle
-from sklearn.externals import joblib
-from sklearn.metrics import accuracy_score, f1_score, recall_score, roc_auc_score
-from sklearn.model_selection import cross_val_score
-from sklearn.model_selection import StratifiedKFold
-from sklearn.model_selection import train_test_split
-
-from keras.models import Sequential
-from keras.layers import Conv1D, MaxPooling1D
-from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
-from keras.wrappers.scikit_learn import KerasClassifier
-from keras import backend as K
-from keras.models import model_from_json
-
-# image processing imports
-from ipfml import processing
-from PIL import Image
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-
-# variables and parameters
-threshold_map_folder        = cfg.threshold_map_folder
-threshold_map_file_prefix   = cfg.threshold_map_folder + "_"
-
-markdowns_folder            = cfg.models_information_folder
-final_csv_model_comparisons = cfg.csv_model_comparisons_filename
-models_name                 = cfg.models_names_list
-
-zones                       = cfg.zones_indices
-
-current_dirpath = os.getcwd()
-
-
-def main():
-
-    kind_model = 'keras'
-    model_ext = ''
-    
-    parser = argparse.ArgumentParser(description="Display SVD data of scene zone")
-
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=cfg.features_choices_labels)
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=cfg.normalization_choices)
-
-    args = parser.parse_args()
-
-    p_interval   = list(map(int, args.interval.split(',')))
-    p_model_file = args.model
-    p_feature    = args.feature
-    p_mode       = args.mode
-
-
-    # call model and get global result in scenes
-    begin, end = p_interval
-
-    bash_cmd = "bash others/testModelByScene_maxwell.sh '" + str(begin) + "' '" + str(end) + "' '" + p_model_file + "' '" + p_mode + "' '" + p_feature + "'"
-
-    print(bash_cmd)
-
-    ## call command ##
-    p = subprocess.Popen(bash_cmd, stdout=subprocess.PIPE, shell=True)
-
-    (output, err) = p.communicate()
-
-    ## Wait for result ##
-    p_status = p.wait()
-
-    if not os.path.exists(markdowns_folder):
-        os.makedirs(markdowns_folder)
-
-    # get model name to construct model
-
-    if '.joblib' in p_model_file:
-        kind_model = 'sklearn'
-        model_ext = '.joblib'
-
-    if '.json' in p_model_file:
-        kind_model = 'keras'
-        model_ext = '.json'
-
-    md_model_path = os.path.join(markdowns_folder, p_model_file.split('/')[-1].replace(model_ext, '.md'))
-
-    with open(md_model_path, 'w') as f:
-        f.write(output.decode("utf-8"))
-
-        # read each threshold_map information if exists
-        model_map_info_path = os.path.join(threshold_map_folder, p_model_file.replace('saved_models/', ''))
-
-        if not os.path.exists(model_map_info_path):
-            f.write('\n\n No threshold map information')
-        else:
-            maps_files = os.listdir(model_map_info_path)
-
-            # get all map information
-            for t_map_file in maps_files:
-
-                file_path = os.path.join(model_map_info_path, t_map_file)
-                with open(file_path, 'r') as map_file:
-
-                    title_scene =  t_map_file.replace(threshold_map_file_prefix, '')
-                    f.write('\n\n## ' + title_scene + '\n')
-                    content = map_file.readlines()
-
-                    # getting each map line information
-                    for line in content:
-                        f.write(line)
-
-        f.close()
-
-    # Keep model information to compare
-    current_model_name = p_model_file.split('/')[-1].replace(model_ext, '')
-
-    # Prepare writing in .csv file into results folder
-    output_final_file_path = os.path.join(cfg.results_information_folder, final_csv_model_comparisons)
-
-    if not os.path.exists(cfg.results_information_folder):
-        os.makedirs(cfg.results_information_folder)
-
-    output_final_file = open(output_final_file_path, "a")
-
-    print(current_model_name)
-    # reconstruct data filename
-    for name in models_name:
-        if name in current_model_name:
-            data_filename = current_model_name
-            current_data_file_path = os.path.join('data', data_filename)
-
-    print("Current data file ")
-    print(current_data_file_path)
-    model_scores = []
-
-    ########################
-    # 1. Get and prepare data
-    ########################
-    dataset_train = pd.read_csv(current_data_file_path + '.train', header=None, sep=";")
-    dataset_test = pd.read_csv(current_data_file_path + '.test', header=None, sep=";")
-
-    # default first shuffle of data
-    dataset_train = shuffle(dataset_train)
-    dataset_test = shuffle(dataset_test)
-
-    # get dataset with equal number of classes occurences
-    noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 1]
-    not_noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 0]
-    nb_noisy_train = len(noisy_df_train.index)
-
-    noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 1]
-    not_noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 0]
-    nb_noisy_test = len(noisy_df_test.index)
-
-    final_df_train = pd.concat([not_noisy_df_train[0:nb_noisy_train], noisy_df_train])
-    final_df_test = pd.concat([not_noisy_df_test[0:nb_noisy_test], noisy_df_test])
-
-    # shuffle data another time
-    final_df_train = shuffle(final_df_train)
-    final_df_test = shuffle(final_df_test)
-
-    final_df_train_size = len(final_df_train.index)
-    final_df_test_size = len(final_df_test.index)
-
-    # use of the whole data set for training
-    x_dataset_train = final_df_train.ix[:,1:]
-    x_dataset_test = final_df_test.ix[:,1:]
-
-    y_dataset_train = final_df_train.ix[:,0]
-    y_dataset_test = final_df_test.ix[:,0]
-
-    #######################
-    # 2. Getting model
-    #######################
-
-    if kind_model == 'keras':
-        with open(p_model_file, 'r') as f:
-            json_model = json.load(f)
-            model = model_from_json(json_model)
-            model.load_weights(p_model_file.replace('.json', '.h5'))
-
-            model.compile(loss='binary_crossentropy',
-                        optimizer='adam',
-                        features=['accuracy'])
-
-        # reshape all input data
-        x_dataset_train = np.array(x_dataset_train).reshape(len(x_dataset_train), end, 1)
-        x_dataset_test = np.array(x_dataset_test).reshape(len(x_dataset_test), end, 1)
-
-
-    if kind_model == 'sklearn':
-        model = joblib.load(p_model_file)
-
-    #######################
-    # 3. Fit model : use of cross validation to fit model
-    #######################
-
-    if kind_model == 'keras':
-        model.fit(x_dataset_train, y_dataset_train, validation_split=0.20, epochs=cfg.keras_epochs, batch_size=cfg.keras_batch)
-
-    if kind_model == 'sklearn':
-        model.fit(x_dataset_train, y_dataset_train)
-
-        train_accuracy = cross_val_score(model, x_dataset_train, y_dataset_train, cv=5)
-
-    ######################
-    # 4. Test : Validation and test dataset from .test dataset
-    ######################
-
-    # we need to specify validation size to 20% of whole dataset
-    val_set_size = int(final_df_train_size/3)
-    test_set_size = val_set_size
-
-    total_validation_size = val_set_size + test_set_size
-
-    if final_df_test_size > total_validation_size:
-        x_dataset_test = x_dataset_test[0:total_validation_size]
-        y_dataset_test = y_dataset_test[0:total_validation_size]
-
-    X_test, X_val, y_test, y_val = train_test_split(x_dataset_test, y_dataset_test, test_size=0.5, random_state=1)
-
-    if kind_model == 'keras':
-        y_test_model = model.predict_classes(X_test)
-        y_val_model = model.predict_classes(X_val)
-
-        y_train_model = model.predict_classes(x_dataset_train)
-
-        train_accuracy = accuracy_score(y_dataset_train, y_train_model)
-
-    if kind_model == 'sklearn':
-        y_test_model = model.predict(X_test)
-        y_val_model = model.predict(X_val)
-
-        y_train_model = model.predict(x_dataset_train)
-
-    val_accuracy = accuracy_score(y_val, y_val_model)
-    test_accuracy = accuracy_score(y_test, y_test_model)
-
-    train_f1 = f1_score(y_dataset_train, y_train_model)
-    train_recall = recall_score(y_dataset_train, y_train_model)
-    train_roc_auc = roc_auc_score(y_dataset_train, y_train_model)
-
-    val_f1 = f1_score(y_val, y_val_model)
-    val_recall = recall_score(y_val, y_val_model)
-    val_roc_auc = roc_auc_score(y_val, y_val_model)
-
-    test_f1 = f1_score(y_test, y_test_model)
-    test_recall = recall_score(y_test, y_test_model)
-    test_roc_auc = roc_auc_score(y_test, y_test_model)
-
-    if kind_model == 'keras':
-        # stats of all dataset
-        all_x_data = np.concatenate([x_dataset_train, X_test, X_val])
-        all_y_data = np.concatenate([y_dataset_train, y_test, y_val])
-        all_y_model = model.predict_classes(all_x_data)
-
-    if kind_model == 'sklearn':
-        # stats of all dataset
-        all_x_data = pd.concat([x_dataset_train, X_test, X_val])
-        all_y_data = pd.concat([y_dataset_train, y_test, y_val])
-        all_y_model = model.predict(all_x_data)
-
-    all_accuracy = accuracy_score(all_y_data, all_y_model)
-    all_f1_score = f1_score(all_y_data, all_y_model)
-    all_recall_score = recall_score(all_y_data, all_y_model)
-    all_roc_auc_score = roc_auc_score(all_y_data, all_y_model)
-
-    # stats of dataset sizes
-    total_samples = final_df_train_size + val_set_size + test_set_size
-
-    model_scores.append(final_df_train_size)
-    model_scores.append(val_set_size)
-    model_scores.append(test_set_size)
-
-    model_scores.append(final_df_train_size / total_samples)
-    model_scores.append(val_set_size / total_samples)
-    model_scores.append(test_set_size / total_samples)
-
-    # add of scores
-    model_scores.append(train_accuracy)
-    model_scores.append(val_accuracy)
-    model_scores.append(test_accuracy)
-    model_scores.append(all_accuracy)
-
-    model_scores.append(train_f1)
-    model_scores.append(train_recall)
-    model_scores.append(train_roc_auc)
-
-    model_scores.append(val_f1)
-    model_scores.append(val_recall)
-    model_scores.append(val_roc_auc)
-
-    model_scores.append(test_f1)
-    model_scores.append(test_recall)
-    model_scores.append(test_roc_auc)
-
-    model_scores.append(all_f1_score)
-    model_scores.append(all_recall_score)
-    model_scores.append(all_roc_auc_score)
-
-    # TODO : improve...
-    # check if it's always the case...
-    nb_zones = current_data_file_path.split('_')[7]
-
-    final_file_line = current_model_name + '; ' + str(end - begin) + '; ' + str(begin) + '; ' + str(end) + '; ' + str(nb_zones) + '; ' + p_feature + '; ' + p_mode
-
-    for s in model_scores:
-        final_file_line += '; ' + str(s)
-
-    output_final_file.write(final_file_line + '\n')
-
-
-if __name__== "__main__":
-    main()

+ 0 - 62
others/testModelByScene.sh

@@ -1,62 +0,0 @@
-#! bin/bash
-
-if [ -z "$1" ]
-  then
-    echo "No first argument supplied"
-    echo "Need of begin vector index"
-    exit 1
-fi
-
-if [ -z "$2" ]
-  then
-    echo "No second argument supplied"
-    echo "Need of end vector index"
-    exit 1
-fi
-
-if [ -z "$3" ]
-  then
-    echo "No third argument supplied"
-    echo "Need of model input"
-    exit 1
-fi
-
-if [ -z "$4" ]
-  then
-    echo "No fourth argument supplied"
-    echo "Need of mode file : 'svd', 'svdn', svdne"
-    exit 1
-fi
-
-if [ -z "$5" ]
-  then
-    echo "No fifth argument supplied"
-    echo "Need of feature : 'lab', 'mscn'"
-    exit 1
-fi
-
-INPUT_BEGIN=$1
-INPUT_END=$2
-INPUT_MODEL=$3
-INPUT_MODE=$4
-INPUT_FEATURE=$5
-
-zones="0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15"
-
-echo "**Model :** ${INPUT_MODEL}"
-echo "**feature :** ${INPUT_FEATURE}"
-echo "**Mode :** ${INPUT_MODE}"
-echo "**Vector range :** [${INPUT_BEGIN}, ${INPUT_END}]"
-echo ""
-echo " # | GLOBAL | NOISY | NOT NOISY"
-echo "---|--------|-------|----------"
-
-for scene in {"A","B","C","D","E","F","G","H","I"}; do
-
-  FILENAME="data/data_${INPUT_MODE}_${INPUT_FEATURE}_B${INPUT_BEGIN}_E${INPUT_END}_scene${scene}"
-
-  python generate/generate_data_model.py --output ${FILENAME} --interval "${INPUT_BEGIN},${INPUT_END}" --kind ${INPUT_MODE} --feature ${INPUT_FEATURE} --scenes "${scene}" --zones "${zones}" --percent 1 --sep ";" --rowindex "0"
-
-  python prediction/prediction_scene.py --data "$FILENAME.train" --model ${INPUT_MODEL} --output "${INPUT_MODEL}_Scene${scene}_mode_${INPUT_MODE}_feature_${INPUT_FEATURE}.prediction" --scene ${scene}
-
-done

+ 0 - 70
others/testModelByScene_maxwell.sh

@@ -1,70 +0,0 @@
-#! bin/bash
-
-if [ -z "$1" ]
-  then
-    echo "No first argument supplied"
-    echo "Need of begin vector index"
-    exit 1
-fi
-
-if [ -z "$2" ]
-  then
-    echo "No second argument supplied"
-    echo "Need of end vector index"
-    exit 1
-fi
-
-if [ -z "$3" ]
-  then
-    echo "No third argument supplied"
-    echo "Need of model input"
-    exit 1
-fi
-
-if [ -z "$4" ]
-  then
-    echo "No fourth argument supplied"
-    echo "Need of mode file : 'svd', 'svdn', svdne"
-    exit 1
-fi
-
-if [ -z "$5" ]
-  then
-    echo "No fifth argument supplied"
-    echo "Need of feature : 'lab', 'mscn'"
-    exit 1
-fi
-
-if [ -z "$6" ]
-  then
-    echo "No sixth argument supplied"
-fi
-
-
-
-INPUT_BEGIN=$1
-INPUT_END=$2
-INPUT_MODEL=$3
-INPUT_MODE=$4
-INPUT_FEATURE=$5
-
-zones="0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15"
-
-echo "**Model :** ${INPUT_MODEL}"
-echo "**feature :** ${INPUT_FEATURE}"
-echo "**Mode :** ${INPUT_MODE}"
-echo "**Vector range :** [${INPUT_BEGIN}, ${INPUT_END}]"
-echo ""
-echo " # | GLOBAL | NOISY | NOT NOISY"
-echo "---|--------|-------|----------"
-
-# only take maxwell scenes
-for scene in {"A","D","G","H"}; do
-
-  FILENAME="data/data_${INPUT_MODE}_${INPUT_FEATURE}_B${INPUT_BEGIN}_E${INPUT_END}_scene${scene}"
-
-  python generate/generate_data_model.py --output ${FILENAME} --interval "${INPUT_BEGIN},${INPUT_END}" --kind ${INPUT_MODE} --feature ${INPUT_FEATURE} --scenes "${scene}" --zones "${zones}" --percent 1
-
-  python prediction/prediction_scene.py --data "$FILENAME.train" --model ${INPUT_MODEL} --output "${INPUT_MODEL}_Scene${scene}_mode_${INPUT_MODE}_feature_${INPUT_FEATURE}.prediction" --scene ${scene}
-
-done

+ 0 - 214
prediction/predict_seuil_expe.py

@@ -1,214 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-def main():
-
-    p_custom = False
-
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='Feature data choice', choices=features_choices)
-    parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    p_interval   = list(map(int, args.interval.split(',')))
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature     = args.feature
-    p_limit      = args.limit
-    p_custom     = args.custom
-
-    scenes = os.listdir(scenes_path)
-    scenes = [s for s in scenes if not min_max_filename in s]
-
-    # go ahead each scenes
-    for id_scene, folder_scene in enumerate(scenes):
-
-        print(folder_scene)
-
-        scene_path = os.path.join(scenes_path, folder_scene)
-
-        threshold_expes = []
-        threshold_expes_detected = []
-        threshold_expes_counter = []
-        threshold_expes_found = []
-
-            # get all images of folder
-        scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-        start_quality_image = dt.get_scene_image_quality(scene_images[0])
-        end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-       
-        # get zones list info
-        for index in zones:
-            index_str = str(index)
-            if len(index_str) < 2:
-                index_str = "0" + index_str
-            zone_folder = "zone"+index_str
-
-            threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-            with open(threshold_path_file) as f:
-                threshold = int(f.readline())
-                threshold_expes.append(threshold)
-
-                # Initialize default data to get detected model threshold found
-                threshold_expes_detected.append(False)
-                threshold_expes_counter.append(0)
-                threshold_expes_found.append(end_quality_image) # by default use max
-
-        check_all_done = False
-
-        # for each images
-        for img_path in scene_images:
-
-            current_img = Image.open(img_path)
-            current_quality_image = dt.get_scene_image_quality(img_path)
-            current_image_potfix = dt.get_scene_image_postfix(img_path)
-
-            img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-            current_img = Image.open(img_path)
-            img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-            check_all_done = all(d == True for d in threshold_expes_detected)
-
-            if check_all_done:
-                break
-
-            for id_block, block in enumerate(img_blocks):
-
-                # check only if necessary for this scene (not already detected)
-                if not threshold_expes_detected[id_block]:
-
-                    tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                    block.save(tmp_file_path)
-
-                    python_cmd = "python prediction/predict_noisy_image_svd.py --image " + tmp_file_path + \
-                                    " --interval '" + p_interval + \
-                                    "' --model " + p_model_file  + \
-                                    " --mode " + p_mode + \
-                                    " --feature " + p_feature
-
-                    # specify use of custom file for min max normalization
-                    if p_custom:
-                        python_cmd = python_cmd + ' --custom ' + p_custom
-
-
-                    ## call command ##
-                    p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                    (output, err) = p.communicate()
-
-                    ## Wait for result ##
-                    p_status = p.wait()
-
-                    prediction = int(output)
-
-                    if prediction == 0:
-                        threshold_expes_counter[id_block] = threshold_expes_counter[id_block] + 1
-                    else:
-                        threshold_expes_counter[id_block] = 0
-
-                    if threshold_expes_counter[id_block] == p_limit:
-                        threshold_expes_detected[id_block] = True
-                        threshold_expes_found[id_block] = current_quality_image
-
-                    print(str(id_block) + " : " + current_image_potfix + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-            print("------------------------")
-            print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)))
-            print("------------------------")
-
-        # end of scene => display of results
-
-        # construct path using model name for saving threshold map folder
-        model_treshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-        # create threshold model path if necessary
-        if not os.path.exists(model_treshold_path):
-            os.makedirs(model_treshold_path)
-
-        abs_dist = []
-
-        map_filename = os.path.join(model_treshold_path, threshold_map_file_prefix + folder_scene)
-        f_map = open(map_filename, 'w')
-
-        line_information = ""
-
-        # default header
-        f_map.write('|  |    |    |  |\n')
-        f_map.write('---|----|----|---\n')
-        for id, threshold in enumerate(threshold_expes_found):
-
-            line_information += str(threshold) + " / " + str(threshold_expes[id]) + " | "
-            abs_dist.append(abs(threshold - threshold_expes[id]))
-
-            if (id + 1) % 4 == 0:
-                f_map.write(line_information + '\n')
-                line_information = ""
-
-        f_map.write(line_information + '\n')
-
-        min_abs_dist = min(abs_dist)
-        max_abs_dist = max(abs_dist)
-        avg_abs_dist = sum(abs_dist) / len(abs_dist)
-
-        f_map.write('\nScene information : ')
-        f_map.write('\n- BEGIN : ' + str(start_quality_image))
-        f_map.write('\n- END : ' + str(end_quality_image))
-
-        f_map.write('\n\nDistances information : ')
-        f_map.write('\n- MIN : ' + str(min_abs_dist))
-        f_map.write('\n- MAX : ' + str(max_abs_dist))          
-        f_map.write('\n- AVG : ' + str(avg_abs_dist))
-
-        f_map.write('\n\nOther information : ')
-        f_map.write('\n- Detection limit : ' + str(p_limit))
-
-        # by default print last line
-        f_map.close()
-
-        print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)) + " Done..")
-        print("------------------------")
-
-
-if __name__== "__main__":
-    main()

+ 0 - 169
prediction/predict_seuil_expe_curve_opti_scene.py

@@ -1,169 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-simulation_curves_zones   = "simulation_curves_zones_"
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-
-def main():
-
-    p_custom = False
-        
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--solution', type=str, help='Data of solution to specify filters to use')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scene', type=str, help='scene to use for simulation', choices=cfg.scenes_indices)
-    #parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-    parser.add_argument('--filter', type=str, help='filter reduction solution used', choices=cfg.filter_reduction_choices)
-
-    args = parser.parse_args()
-
-    # keep p_interval as it is
-    p_solution   = args.solution
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature    = args.feature
-    p_scene      = args.scene
-    #p_limit      = args.limit
-    p_custom     = args.custom
-    p_filter     = args.filter
-
-    # get scene name using index
-    
-    # list all possibles choices of renderer
-    scenes_list = cfg.scenes_names
-    scenes_indices = cfg.scenes_indices
-
-    scene_index = scenes_indices.index(p_scene.strip())
-    scene_name = scenes_list[scene_index]
-
-    print(scene_name)
-    scene_path = os.path.join(scenes_path, scene_name)
-
-    threshold_expes = []
-    threshold_expes_found = []
-    block_predictions_str = []
-
-    # get all images of folder
-    scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-    start_quality_image = dt.get_scene_image_quality(scene_images[0])
-    end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-    # using first two images find the step of quality used
-    quality_step_image  = dt.get_scene_image_quality(scene_images[1]) - start_quality_image
-
-    # get zones list info
-    for index in zones:
-        index_str = str(index)
-        if len(index_str) < 2:
-            index_str = "0" + index_str
-        zone_folder = "zone"+index_str
-
-        threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-        with open(threshold_path_file) as f:
-            threshold = int(f.readline())
-            threshold_expes.append(threshold)
-
-            # Initialize default data to get detected model threshold found
-            threshold_expes_found.append(end_quality_image) # by default use max
-
-        block_predictions_str.append(index_str + ";" + p_model_file + ";" + str(threshold) + ";" + str(start_quality_image) + ";" + str(quality_step_image))
-
-
-    # for each images
-    for img_path in scene_images:
-
-        current_img = Image.open(img_path)
-        current_quality_image = dt.get_scene_image_quality(img_path)
-
-        img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-        for id_block, block in enumerate(img_blocks):
-
-            # check only if necessary for this scene (not already detected)
-            #if not threshold_expes_detected[id_block]:
-
-                tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                block.save(tmp_file_path)
-
-                python_cmd_line = "python prediction/predict_noisy_image_svd_" + p_filter + ".py --image {0} --solution '{1}' --model {2} --mode {3} --feature {4}"
-                python_cmd = python_cmd_line.format(tmp_file_path, p_solution, p_model_file, p_mode, p_feature) 
-
-                # specify use of custom file for min max normalization
-                if p_custom:
-                    python_cmd = python_cmd + ' --custom ' + p_custom
-
-                ## call command ##
-                p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                (output, err) = p.communicate()
-
-                ## Wait for result ##
-                p_status = p.wait()
-
-                prediction = int(output)
-
-                # save here in specific file of block all the predictions done
-                block_predictions_str[id_block] = block_predictions_str[id_block] + ";" + str(prediction)
-
-                print(str(id_block) + " : " + str(current_quality_image) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-    # end of scene => display of results
-
-    # construct path using model name for saving threshold map folder
-    model_threshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-    # create threshold model path if necessary
-    if not os.path.exists(model_threshold_path):
-        os.makedirs(model_threshold_path)
-
-    map_filename = os.path.join(model_threshold_path, simulation_curves_zones + scene_name)
-    f_map = open(map_filename, 'w')
-
-    for line in block_predictions_str:
-        f_map.write(line + '\n')
-    f_map.close()
-
-    print("------------------------")
-
-    print("Model predictions are saved into %s" % map_filename)
-
-
-if __name__== "__main__":
-    main()

+ 0 - 166
prediction/predict_seuil_expe_curve_scene.py

@@ -1,166 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-simulation_curves_zones   = "simulation_curves_zones_"
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-
-def main():
-
-    p_custom = False
-        
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    parser.add_argument('--scene', type=str, help='scene to use for simulation', choices=cfg.scenes_indices)
-    #parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-    parser.add_argument('--filter', type=str, help='filter reduction solution used', choices=cfg.filter_reduction_choices)
-
-    args = parser.parse_args()
-
-    # keep p_interval as it is
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature    = args.feature
-    p_scene      = args.scene
-    #p_limit      = args.limit
-    p_custom     = args.custom
-    p_filter     = args.filter
-
-    # get scene name using index
-    
-    # list all possibles choices of renderer
-    scenes_list = cfg.scenes_names
-    scenes_indices = cfg.scenes_indices
-
-    scene_index = scenes_indices.index(p_scene.strip())
-    scene_name = scenes_list[scene_index]
-
-    print(scene_name)
-    scene_path = os.path.join(scenes_path, scene_name)
-
-    threshold_expes = []
-    threshold_expes_found = []
-    block_predictions_str = []
-
-    # get all images of folder
-    scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-    start_quality_image = dt.get_scene_image_quality(scene_images[0])
-    end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-    # using first two images find the step of quality used
-    quality_step_image  = dt.get_scene_image_quality(scene_images[1]) - start_quality_image
-
-    # get zones list info
-    for index in zones:
-        index_str = str(index)
-        if len(index_str) < 2:
-            index_str = "0" + index_str
-        zone_folder = "zone"+index_str
-
-        threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-        with open(threshold_path_file) as f:
-            threshold = int(f.readline())
-            threshold_expes.append(threshold)
-
-            # Initialize default data to get detected model threshold found
-            threshold_expes_found.append(end_quality_image) # by default use max
-
-        block_predictions_str.append(index_str + ";" + p_model_file + ";" + str(threshold) + ";" + str(start_quality_image) + ";" + str(quality_step_image))
-
-
-    # for each images
-    for img_path in scene_images:
-
-        current_img = Image.open(img_path)
-        current_quality_image = dt.get_scene_image_quality(img_path)
-
-        img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-        for id_block, block in enumerate(img_blocks):
-
-            # check only if necessary for this scene (not already detected)
-            #if not threshold_expes_detected[id_block]:
-
-                tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                block.save(tmp_file_path)
-
-                python_cmd_line = "python prediction/predict_noisy_image_svd.py --image {0} --model {2} --mode {3} --feature {4}"
-                python_cmd = python_cmd_line.format(tmp_file_path, p_model_file, p_mode, p_feature) 
-
-                # specify use of custom file for min max normalization
-                if p_custom:
-                    python_cmd = python_cmd + ' --custom ' + p_custom
-
-                ## call command ##
-                p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                (output, err) = p.communicate()
-
-                ## Wait for result ##
-                p_status = p.wait()
-
-                prediction = int(output)
-
-                # save here in specific file of block all the predictions done
-                block_predictions_str[id_block] = block_predictions_str[id_block] + ";" + str(prediction)
-
-                print(str(id_block) + " : " + str(current_quality_image) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-
-    # construct path using model name for saving threshold map folder
-    model_threshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-    # create threshold model path if necessary
-    if not os.path.exists(model_threshold_path):
-        os.makedirs(model_threshold_path)
-
-    map_filename = os.path.join(model_threshold_path, simulation_curves_zones + scene_name)
-    f_map = open(map_filename, 'w')
-
-    for line in block_predictions_str:
-        f_map.write(line + '\n')
-    f_map.close()
-
-    print("------------------------")
-
-    print("Model predictions are saved into %s" % map_filename)
-
-
-if __name__== "__main__":
-    main()

+ 0 - 216
prediction/predict_seuil_expe_maxwell.py

@@ -1,216 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-maxwell_scenes            = cfg.maxwell_scenes_names
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-def main():
-
-    # by default..
-    p_custom = False
-
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='Feature data choice', choices=features_choices)
-    parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    p_interval   = list(map(int, args.interval.split(',')))
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature    = args.feature
-    p_limit      = args.limit
-    p_custom     = args.custom
-
-    scenes = os.listdir(scenes_path)
-    scenes = [s for s in scenes if s in maxwell_scenes]
-
-    # go ahead each scenes
-    for id_scene, folder_scene in enumerate(scenes):
-
-        # only take in consideration maxwell scenes
-        if folder_scene in maxwell_scenes:
-
-            print(folder_scene)
-
-            scene_path = os.path.join(scenes_path, folder_scene)
-
-            threshold_expes = []
-            threshold_expes_detected = []
-            threshold_expes_counter = []
-            threshold_expes_found = []
-
-            # get all images of folder
-            scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-            start_quality_image = dt.get_scene_image_quality(scene_images[0])
-            end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-    
-
-            # get zones list info
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zone_folder = "zone"+index_str
-
-                threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-                with open(threshold_path_file) as f:
-                    threshold = int(f.readline())
-                    threshold_expes.append(threshold)
-
-                    # Initialize default data to get detected model threshold found
-                    threshold_expes_detected.append(False)
-                    threshold_expes_counter.append(0)
-                    threshold_expes_found.append(end_quality_image) # by default use max
-
-            check_all_done = False
-
-            # for each images
-            for img_path in scene_images:
-
-                current_img = Image.open(img_path)
-                current_postfix_image = dt.get_scene_image_postfix(img_path)
-
-                img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-                check_all_done = all(d == True for d in threshold_expes_detected)
-
-                if check_all_done:
-                    break
-
-                for id_block, block in enumerate(img_blocks):
-
-                    # check only if necessary for this scene (not already detected)
-                    if not threshold_expes_detected[id_block]:
-
-                        tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                        block.save(tmp_file_path)
-
-                        python_cmd = "python prediction/predict_noisy_image_svd.py --image " + tmp_file_path + \
-                                        " --interval '" + p_interval + \
-                                        "' --model " + p_model_file  + \
-                                        " --mode " + p_mode + \
-                                        " --feature " + p_feature
-
-                        # specify use of custom file for min max normalization
-                        if p_custom:
-                            python_cmd = python_cmd + ' --custom ' + p_custom
-
-                        ## call command ##
-                        p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                        (output, err) = p.communicate()
-
-                        ## Wait for result ##
-                        p_status = p.wait()
-
-                        prediction = int(output)
-
-                        if prediction == 0:
-                            threshold_expes_counter[id_block] = threshold_expes_counter[id_block] + 1
-                        else:
-                            threshold_expes_counter[id_block] = 0
-
-                        if threshold_expes_counter[id_block] == p_limit:
-                            threshold_expes_detected[id_block] = True
-                            threshold_expes_found[id_block] = int(current_postfix_image)
-
-                        print(str(id_block) + " : " + current_postfix_image + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-                print("------------------------")
-                print("Scene " + str(id_scene + 1) + "/" + str(len(maxwell_scenes)))
-                print("------------------------")
-
-            # end of scene => display of results
-
-            # construct path using model name for saving threshold map folder
-            model_treshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-            # create threshold model path if necessary
-            if not os.path.exists(model_treshold_path):
-                os.makedirs(model_treshold_path)
-
-            abs_dist = []
-
-            map_filename = os.path.join(model_treshold_path, threshold_map_file_prefix + folder_scene)
-            f_map = open(map_filename, 'w')
-
-            line_information = ""
-
-            # default header
-            f_map.write('|  |    |    |  |\n')
-            f_map.write('---|----|----|---\n')
-            for id, threshold in enumerate(threshold_expes_found):
-
-                line_information += str(threshold) + " / " + str(threshold_expes[id]) + " | "
-                abs_dist.append(abs(threshold - threshold_expes[id]))
-
-                if (id + 1) % 4 == 0:
-                    f_map.write(line_information + '\n')
-                    line_information = ""
-
-            f_map.write(line_information + '\n')
-
-            min_abs_dist = min(abs_dist)
-            max_abs_dist = max(abs_dist)
-            avg_abs_dist = sum(abs_dist) / len(abs_dist)
-
-            f_map.write('\nScene information : ')
-            f_map.write('\n- BEGIN : ' + str(start_quality_image))
-            f_map.write('\n- END : ' + str(end_quality_image))
-
-            f_map.write('\n\nDistances information : ')
-            f_map.write('\n- MIN : ' + str(min_abs_dist))
-            f_map.write('\n- MAX : ' + str(max_abs_dist))
-            f_map.write('\n- AVG : ' + str(avg_abs_dist))
-
-            f_map.write('\n\nOther information : ')
-            f_map.write('\n- Detection limit : ' + str(p_limit))
-
-            # by default print last line
-            f_map.close()
-
-            print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)) + " Done..")
-            print("------------------------")
-
-
-if __name__== "__main__":
-    main()

+ 0 - 174
prediction/predict_seuil_expe_maxwell_curve.py

@@ -1,174 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-maxwell_scenes            = cfg.maxwell_scenes_names
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-simulation_curves_zones   = "simulation_curves_zones_"
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-
-def main():
-
-    p_custom = False
-        
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--interval', type=str, help='Interval value to keep from svd', default='"0, 200"')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    #parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-
-    args = parser.parse_args()
-
-    # keep p_interval as it is
-    p_interval   = args.interval
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature    = args.feature
-    #p_limit      = args.limit
-    p_custom     = args.custom
-
-    scenes = os.listdir(scenes_path)
-    scenes = [s for s in scenes if s in maxwell_scenes]
-
-    print(scenes)
-
-    # go ahead each scenes
-    for id_scene, folder_scene in enumerate(scenes):
-
-        # only take in consideration maxwell scenes
-        if folder_scene in maxwell_scenes:
-
-            print(folder_scene)
-
-            scene_path = os.path.join(scenes_path, folder_scene)
-
-            threshold_expes = []
-            threshold_expes_found = []
-            block_predictions_str = []
-
-            # get all images of folder
-            scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-            start_quality_image = dt.get_scene_image_quality(scene_images[0])
-            end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-            # using first two images find the step of quality used
-            quality_step_image  = dt.get_scene_image_quality(scene_images[1]) - start_quality_image
-
-            # get zones list info
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zone_folder = "zone"+index_str
-
-                threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-                with open(threshold_path_file) as f:
-                    threshold = int(f.readline())
-                    threshold_expes.append(threshold)
-
-                    # Initialize default data to get detected model threshold found
-                    threshold_expes_found.append(end_quality_image) # by default use max
-
-                block_predictions_str.append(index_str + ";" + p_model_file + ";" + str(threshold) + ";" + str(start_quality_image) + ";" + str(quality_step_image))
-
-
-            # for each images
-            for img_path in scene_images:
-
-                current_img = Image.open(img_path)
-                current_quality_image = dt.get_scene_image_quality(img_path)
-
-                img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-                for id_block, block in enumerate(img_blocks):
-
-                    # check only if necessary for this scene (not already detected)
-                    #if not threshold_expes_detected[id_block]:
-
-                        tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                        block.save(tmp_file_path)
-
-                        python_cmd_line = "python prediction/predict_noisy_image_svd.py --image {0} --interval '{1}' --model {2} --mode {3} --feature {4}"
-                        python_cmd = python_cmd_line.format(tmp_file_path, p_interval, p_model_file, p_mode, p_feature) 
-
-                        # specify use of custom file for min max normalization
-                        if p_custom:
-                            python_cmd = python_cmd + ' --custom ' + p_custom
-
-                        ## call command ##
-                        p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                        (output, err) = p.communicate()
-
-                        ## Wait for result ##
-                        p_status = p.wait()
-
-                        prediction = int(output)
-
-                        # save here in specific file of block all the predictions done
-                        block_predictions_str[id_block] = block_predictions_str[id_block] + ";" + str(prediction)
-
-                        print(str(id_block) + " : " + str(current_quality_image) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-                print("------------------------")
-                print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)))
-                print("------------------------")
-
-            # end of scene => display of results
-
-            # construct path using model name for saving threshold map folder
-            model_threshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-            # create threshold model path if necessary
-            if not os.path.exists(model_threshold_path):
-                os.makedirs(model_threshold_path)
-
-            map_filename = os.path.join(model_threshold_path, simulation_curves_zones + folder_scene)
-            f_map = open(map_filename, 'w')
-
-            for line in block_predictions_str:
-                f_map.write(line + '\n')
-            f_map.close()
-
-            print("Scene " + str(id_scene + 1) + "/" + str(len(maxwell_scenes)) + " Done..")
-            print("------------------------")
-
-            print("Model predictions are saved into %s" % map_filename)
-
-
-if __name__== "__main__":
-    main()

+ 0 - 176
prediction/predict_seuil_expe_maxwell_curve_opti.py

@@ -1,176 +0,0 @@
-# main imports
-import sys, os, argparse
-import subprocess
-import time
-import numpy as np
-
-# image processing imports
-from ipfml.processing import segmentation
-from PIL import Image
-
-# models imports
-from sklearn.externals import joblib
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-from modules.utils import data as dt
-
-
-# variables and parameters
-scenes_path               = cfg.dataset_path
-min_max_filename          = cfg.min_max_filename_extension
-threshold_expe_filename   = cfg.seuil_expe_filename
-
-threshold_map_folder      = cfg.threshold_map_folder
-threshold_map_file_prefix = cfg.threshold_map_folder + "_"
-
-zones                     = cfg.zones_indices
-maxwell_scenes            = cfg.maxwell_scenes_names
-normalization_choices     = cfg.normalization_choices
-features_choices          = cfg.features_choices_labels
-
-simulation_curves_zones   = "simulation_curves_zones_"
-tmp_filename              = '/tmp/__model__img_to_predict.png'
-
-current_dirpath = os.getcwd()
-
-
-def main():
-
-    p_custom = False
-        
-    parser = argparse.ArgumentParser(description="Script which predicts threshold using specific model")
-
-    parser.add_argument('--solution', type=str, help='Data of solution to specify filters to use')
-    parser.add_argument('--model', type=str, help='.joblib or .json file (sklearn or keras model)')
-    parser.add_argument('--mode', type=str, help='Kind of normalization level wished', choices=normalization_choices)
-    parser.add_argument('--feature', type=str, help='feature data choice', choices=features_choices)
-    #parser.add_argument('--limit_detection', type=int, help='Specify number of same prediction to stop threshold prediction', default=2)
-    parser.add_argument('--custom', type=str, help='Name of custom min max file if use of renormalization of data', default=False)
-    parser.add_argument('--filter', type=str, help='filter reduction solution used', choices=cfg.filter_reduction_choices)
-
-    args = parser.parse_args()
-
-    # keep p_interval as it is
-    p_solution   = args.solution
-    p_model_file = args.model
-    p_mode       = args.mode
-    p_feature    = args.feature
-    #p_limit      = args.limit
-    p_custom     = args.custom
-    p_filter     = args.filter
-
-    scenes = os.listdir(scenes_path)
-    scenes = [s for s in scenes if s in maxwell_scenes]
-
-    print(scenes)
-
-    # go ahead each scenes
-    for id_scene, folder_scene in enumerate(scenes):
-
-        # only take in consideration maxwell scenes
-        if folder_scene in maxwell_scenes:
-
-            print(folder_scene)
-
-            scene_path = os.path.join(scenes_path, folder_scene)
-
-            threshold_expes = []
-            threshold_expes_found = []
-            block_predictions_str = []
-
-            # get all images of folder
-            scene_images = sorted([os.path.join(scene_path, img) for img in os.listdir(scene_path) if cfg.scene_image_extension in img])
-
-            start_quality_image = dt.get_scene_image_quality(scene_images[0])
-            end_quality_image   = dt.get_scene_image_quality(scene_images[-1])
-            # using first two images find the step of quality used
-            quality_step_image  = dt.get_scene_image_quality(scene_images[1]) - start_quality_image
-
-            # get zones list info
-            for index in zones:
-                index_str = str(index)
-                if len(index_str) < 2:
-                    index_str = "0" + index_str
-                zone_folder = "zone"+index_str
-
-                threshold_path_file = os.path.join(os.path.join(scene_path, zone_folder), threshold_expe_filename)
-
-                with open(threshold_path_file) as f:
-                    threshold = int(f.readline())
-                    threshold_expes.append(threshold)
-
-                    # Initialize default data to get detected model threshold found
-                    threshold_expes_found.append(end_quality_image) # by default use max
-
-                block_predictions_str.append(index_str + ";" + p_model_file + ";" + str(threshold) + ";" + str(start_quality_image) + ";" + str(quality_step_image))
-
-
-            # for each images
-            for img_path in scene_images:
-
-                current_img = Image.open(img_path)
-                current_quality_image = dt.get_scene_image_quality(img_path)
-
-                img_blocks = segmentation.divide_in_blocks(current_img, (200, 200))
-
-                for id_block, block in enumerate(img_blocks):
-
-                    # check only if necessary for this scene (not already detected)
-                    #if not threshold_expes_detected[id_block]:
-
-                        tmp_file_path = tmp_filename.replace('__model__',  p_model_file.split('/')[-1].replace('.joblib', '_'))
-                        block.save(tmp_file_path)
-
-                        python_cmd_line = "python prediction/predict_noisy_image_svd_" + p_filter + ".py --image {0} --solution '{1}' --model {2} --mode {3} --feature {4}"
-                        python_cmd = python_cmd_line.format(tmp_file_path, p_solution, p_model_file, p_mode, p_feature) 
-
-                        # specify use of custom file for min max normalization
-                        if p_custom:
-                            python_cmd = python_cmd + ' --custom ' + p_custom
-
-                        ## call command ##
-                        p = subprocess.Popen(python_cmd, stdout=subprocess.PIPE, shell=True)
-
-                        (output, err) = p.communicate()
-
-                        ## Wait for result ##
-                        p_status = p.wait()
-
-                        prediction = int(output)
-
-                        # save here in specific file of block all the predictions done
-                        block_predictions_str[id_block] = block_predictions_str[id_block] + ";" + str(prediction)
-
-                        print(str(id_block) + " : " + str(current_quality_image) + "/" + str(threshold_expes[id_block]) + " => " + str(prediction))
-
-                print("------------------------")
-                print("Scene " + str(id_scene + 1) + "/" + str(len(scenes)))
-                print("------------------------")
-
-            # end of scene => display of results
-
-            # construct path using model name for saving threshold map folder
-            model_threshold_path = os.path.join(threshold_map_folder, p_model_file.split('/')[-1].replace('.joblib', ''))
-
-            # create threshold model path if necessary
-            if not os.path.exists(model_threshold_path):
-                os.makedirs(model_threshold_path)
-
-            map_filename = os.path.join(model_threshold_path, simulation_curves_zones + folder_scene)
-            f_map = open(map_filename, 'w')
-
-            for line in block_predictions_str:
-                f_map.write(line + '\n')
-            f_map.close()
-
-            print("Scene " + str(id_scene + 1) + "/" + str(len(maxwell_scenes)) + " Done..")
-            print("------------------------")
-
-            print("Model predictions are saved into %s" % map_filename)
-
-
-if __name__== "__main__":
-    main()

+ 0 - 114
prediction/prediction_scene.py

@@ -1,114 +0,0 @@
-# main imports
-import sys, os, argparse
-import numpy as np
-import json
-import pandas as pd
-
-# models imports
-from sklearn.externals import joblib
-from sklearn.metrics import accuracy_score
-from keras.models import Sequential
-from keras.layers import Conv1D, MaxPooling1D
-from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
-from keras import backend as K
-from keras.models import model_from_json
-from keras.wrappers.scikit_learn import KerasClassifier
-
-# modules imports
-sys.path.insert(0, '') # trick to enable import of main folder module
-
-import custom_config as cfg
-
-# parameters and variables
-output_model_folder = cfg.saved_models_folder
-
-def main():
-    
-    parser = argparse.ArgumentParser(description="Give model performance on specific scene")
-
-    parser.add_argument('--data', type=str, help='dataset filename prefix of specific scene (without .train and .test)')
-    parser.add_argument('--model', type=str, help='saved model (Keras or SKlearn) filename with extension')
-    parser.add_argument('--output', type=str, help="filename to store predicted and performance model obtained on scene")
-    parser.add_argument('--scene', type=str, help="scene indice to predict", choices=cfg.scenes_indices)
-
-    args = parser.parse_args()
-
-    p_data_file  = args.data
-    p_model_file = args.model
-    p_output     = args.output
-    p_scene      = args.scene
-
-    if '.joblib' in p_model_file:
-        kind_model = 'sklearn'
-        model_ext = '.joblib'
-
-    if '.json' in p_model_file:
-        kind_model = 'keras'
-        model_ext = '.json'
-
-    if not os.path.exists(output_model_folder):
-        os.makedirs(output_model_folder)
-
-    dataset = pd.read_csv(p_data_file, header=None, sep=";")
-
-    y_dataset = dataset.ix[:,0]
-    x_dataset = dataset.ix[:,1:]
-
-    noisy_dataset = dataset[dataset.ix[:, 0] == 1]
-    not_noisy_dataset = dataset[dataset.ix[:, 0] == 0]
-
-    y_noisy_dataset = noisy_dataset.ix[:, 0]
-    x_noisy_dataset = noisy_dataset.ix[:, 1:]
-
-    y_not_noisy_dataset = not_noisy_dataset.ix[:, 0]
-    x_not_noisy_dataset = not_noisy_dataset.ix[:, 1:]
-
-    if kind_model == 'keras':
-        with open(p_model_file, 'r') as f:
-            json_model = json.load(f)
-            model = model_from_json(json_model)
-            model.load_weights(p_model_file.replace('.json', '.h5'))
-
-            model.compile(loss='binary_crossentropy',
-                  optimizer='adam',
-                  metrics=['accuracy'])
-
-        _, vector_size = np.array(x_dataset).shape
-
-        # reshape all data
-        x_dataset = np.array(x_dataset).reshape(len(x_dataset), vector_size, 1)
-        x_noisy_dataset = np.array(x_noisy_dataset).reshape(len(x_noisy_dataset), vector_size, 1)
-        x_not_noisy_dataset = np.array(x_not_noisy_dataset).reshape(len(x_not_noisy_dataset), vector_size, 1)
-
-
-    if kind_model == 'sklearn':
-        model = joblib.load(p_model_file)
-
-    if kind_model == 'keras':
-        y_pred = model.predict_classes(x_dataset)
-        y_noisy_pred = model.predict_classes(x_noisy_dataset)
-        y_not_noisy_pred = model.predict_classes(x_not_noisy_dataset)
-
-    if kind_model == 'sklearn':
-        y_pred = model.predict(x_dataset)
-        y_noisy_pred = model.predict(x_noisy_dataset)
-        y_not_noisy_pred = model.predict(x_not_noisy_dataset)
-
-    accuracy_global = accuracy_score(y_dataset, y_pred)
-    accuracy_noisy = accuracy_score(y_noisy_dataset, y_noisy_pred)
-    accuracy_not_noisy = accuracy_score(y_not_noisy_dataset, y_not_noisy_pred)
-
-    if(p_scene):
-        print(p_scene + " | " + str(accuracy_global) + " | " + str(accuracy_noisy) + " | " + str(accuracy_not_noisy))
-    else:
-        print(str(accuracy_global) + " \t | " + str(accuracy_noisy) + " \t | " + str(accuracy_not_noisy))
-
-        with open(p_output, 'w') as f:
-            f.write("Global accuracy found %s " % str(accuracy_global))
-            f.write("Noisy accuracy found %s " % str(accuracy_noisy))
-            f.write("Not noisy accuracy found %s " % str(accuracy_not_noisy))
-            for prediction in y_pred:
-                f.write(str(prediction) + '\n')
-
-if __name__== "__main__":
-    main()

+ 3 - 2
requirements.txt

@@ -1,4 +1,4 @@
-IPFML
+ipfml
 sklearn
 scikit-image
 tensorflow
@@ -9,4 +9,5 @@ pydot
 matplotlib
 path.py
 pandas
-opencv-python
+opencv-python
+gzip

+ 0 - 3
run.sh

@@ -1,3 +0,0 @@
-python generate/generate_all_data.py --feature filters_statistics
-python generate/generate_data_model.py --interval 0,26 --kind svdn --feature filters_statistics --scenes A,B,C,D,E,F --zones 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 --output data/26_attributes_data --each 1 --kind svdn
-python train_model.py --data data/26_attributes_data --output 26_attributes_model --choice svm_model

+ 0 - 35
run/runAll_maxwell_custom.sh

@@ -1,35 +0,0 @@
-#! bin/bash
-
-# erase "results/models_comparisons.csv" file and write new header
-list="all, center, split"
-
-if [ -z "$1" ]
-  then
-    echo "No argument supplied"
-    echo "Need argument from [${list}]"
-    exit 1
-fi
-
-if [[ "$1" =~ ^(all|center|split)$ ]]; then
-    echo "$1 is in the list"
-else
-    echo "$1 is not in the list"
-fi
-
-data=$1
-erased=$2
-
-if [ "${erased}" == "Y" ]; then
-    echo "Previous data file erased..."
-    rm ${file_path}
-    mkdir -p results
-    touch ${file_path}
-
-    # add of header
-    echo 'model_name; vector_size; start; end; nb_zones; feature; mode; tran_size; val_size; test_size; train_pct_size; val_pct_size; test_pct_size; train_acc; val_acc; test_acc; all_acc; F1_train; recall_train; roc_auc_train; F1_val; recall_val; roc_auc_val; F1_test; recall_test; roc_auc_test; F1_all; recall_all; roc_auc_all;' >> ${file_path}
-fi
-
-size=26
-feature="filters_statistics"
-
-bash data_processing/generateAndTrain_maxwell_custom.sh ${size} ${feature} ${data}

+ 0 - 37
run/runAll_maxwell_custom_optimization_attributes.sh

@@ -1,37 +0,0 @@
-#! bin/bash
-
-# erase "results/optimization_comparisons.csv" file and write new header
-file_path='results/optimization_comparisons_attributes.csv'
-list="all, center, split"
-
-if [ -z "$1" ]
-  then
-    echo "No argument supplied"
-    echo "Need argument from [${list}]"
-    exit 1
-fi
-
-if [[ "$1" =~ ^(all|center|split)$ ]]; then
-    echo "$1 is in the list"
-else
-    echo "$1 is not in the list"
-fi
-
-data=$1
-erased=$2
-
-if [ "${erased}" == "Y" ]; then
-    echo "Previous data file erased..."
-    rm ${file_path}
-    mkdir -p results
-    touch ${file_path}
-
-    # add of header
-    echo 'data_file; ils_iteration; ls_iteration; best_solution; nb_attributes; nb_filters; fitness (roc test);' >> ${file_path}
-fi
-
-size=26
-feature="filters_statistics"
-filter="attributes"
-
-bash data_processing/generateAndTrain_maxwell_custom_optimization.sh ${size} ${feature} ${data} ${filter}

+ 0 - 38
run/runAll_maxwell_custom_optimization_filters.sh

@@ -1,38 +0,0 @@
-#! bin/bash
-
-# erase "results/optimization_comparisons.csv" file and write new header
-file_path='results/optimization_comparisons_filters.csv'
-list="all, center, split"
-
-if [ -z "$1" ]
-  then
-    echo "No argument supplied"
-    echo "Need argument from [${list}]"
-    exit 1
-fi
-
-if [[ "$1" =~ ^(all|center|split)$ ]]; then
-    echo "$1 is in the list"
-else
-    echo "$1 is not in the list"
-fi
-
-data=$1
-erased=$2
-
-if [ "${erased}" == "Y" ]; then
-    echo "Previous results file erased..."
-    rm ${file_path}
-    mkdir -p results
-    touch ${file_path}
-
-    # add of header
-    echo 'data_file; ils_iteration; ls_iteration; best_solution; nb_filters; fitness (roc test);' >> ${file_path}
-
-fi
-
-size=26
-feature="filters_statistics"
-filter="filters"
-
-bash data_processing/generateAndTrain_maxwell_custom_optimization.sh ${size} ${feature} ${data} ${filter}

+ 0 - 6
simulation/generate_all_simulate_curves.sh

@@ -1,6 +0,0 @@
-for file in "threshold_map"/*; do
-
-    echo ${file}
-
-    python display/display_simulation_curves.py --folder ${file}
-done

+ 0 - 39
simulation/run_maxwell_simulation_filters_statistics.sh

@@ -1,39 +0,0 @@
-#! bin/bash
-
-# file which contains model names we want to use for simulation
-simulate_models="simulate_models.csv"
-
-# selection of four scenes (only maxwell)
-scenes="A,D,G,H"
-
-size="26"
-
-# for feature in {"lab","mscn","low_bits_2","low_bits_3","low_bits_4","low_bits_5","low_bits_6","low_bits_4_shifted_2","ica_diff","svd_trunc_diff","ipca_diff","svd_reconstruct"}; do
-feature="filters_statistics"
-
-for nb_zones in {4,6,8,10,11,12}; do
-    for mode in {"svd","svdn","svdne"}; do
-        for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
-
-            FILENAME="data/${model}_N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_all"
-            MODEL_NAME="${model}_N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_all"
-            CUSTOM_MIN_MAX_FILENAME="N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_all_min_max"
-
-            #echo $MODEL_NAME
-
-            # only compute if necessary (perhaps server will fall.. Just in case)
-            if grep -xq "${MODEL_NAME}" "${simulate_models}"; then
-
-                #echo "Run simulation for ${MODEL_NAME}..."
-
-                # Use of already generated model
-                # python generate/generate_data_model_random.py --output ${FILENAME} --interval "0,${size}" --kind ${mode} --feature ${feature} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 40 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
-                # python train_model.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model}
-
-                echo python prediction/predict_seuil_expe_maxwell_curve.py --interval "0,${size}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --feature ${feature} --custom ${CUSTOM_MIN_MAX_FILENAME}
-
-                # python others/save_model_result_in_md_maxwell.py --interval "0,${size}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --feature ${feature}
-            fi
-        done
-    done
-done

+ 0 - 56
simulation/run_maxwell_simulation_filters_statistics_opti.sh

@@ -1,56 +0,0 @@
-#! bin/bash
-
-# file which contains model names we want to use for simulation
-list="attributes, filters"
-
-if [ -z "$1" ]
-  then
-    echo "No argument supplied"
-    echo "Need argument from [${list}]"
-    exit 1
-fi
-
-
-# selection of four scenes (only maxwell)
-scenes="A, D, G, H"
-size="26"
-feature="filters_statistics"
-filter=$1
-
-simulate_models="simulate_models_${filter}_all.csv"
-
-
-for nb_zones in {4,6,8,10,12}; do
-    for mode in {"svd","svdn","svdne"}; do
-        for model in {"svm_model","ensemble_model","ensemble_model_v2"}; do
-            for data in {"all","center","split"}; do
-
-                FILENAME="data/${model}_N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_${data}_${filter}"
-                MODEL_NAME="${model}_N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_${data}_${filter}"
-                CUSTOM_MIN_MAX_FILENAME="N${size}_B0_E${size}_nb_zones_${nb_zones}_${feature}_${mode}_${data}_${filter}_min_max"
-
-                # only compute if necessary (perhaps server will fall.. Just in case)
-                if grep -q "${FILENAME}" "${simulate_models}"; then
-
-                    echo "Found ${FILENAME}"
-                    line=$(grep -n ${FILENAME} ${simulate_models})
-
-                    # extract solution
-                    IFS=\; read -a fields <<<"$line"
-
-                    SOLUTION=${fields[1]}
-
-                    echo "Run simulation for ${MODEL_NAME}... with ${SOLUTION}"
-
-                    # Use of already generated model
-                    python generate/generate_data_model_random_${data}.py --output ${FILENAME} --interval "0,${size}" --kind ${mode} --feature ${feature} --scenes "${scenes}" --nb_zones "${nb_zones}" --percent 1 --renderer "maxwell" --step 10 --random 1 --custom ${CUSTOM_MIN_MAX_FILENAME}
-                    python train_model_${filter}.py --data ${FILENAME} --output ${MODEL_NAME} --choice ${model} --solution "${SOLUTION}"
-
-                    python prediction/predict_seuil_expe_maxwell_curve_opti.py --solution "${SOLUTION}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --feature ${feature} --custom ${CUSTOM_MIN_MAX_FILENAME} --filter ${filter}
-
-                    #python others/save_model_result_in_md_maxwell.py --solution "${SOLUTION}" --model "saved_models/${MODEL_NAME}.joblib" --mode "${mode}" --feature ${feature}
-                fi
-            done
-        done
-    done
-done

+ 17 - 11
train_model.py

@@ -34,7 +34,7 @@ def main():
 
     parser = argparse.ArgumentParser(description="Train SKLearn model and save it into .joblib file")
 
-    parser.add_argument('--data', type=str, help='dataset filename prefix (without .train and .test)')
+    parser.add_argument('--data', type=str, help='dataset filename prefiloc (without .train and .test)')
     parser.add_argument('--output', type=str, help='output file name desired for model (without .joblib extension)')
     parser.add_argument('--choice', type=str, help='model choice from list of choices', choices=models_list)
 
@@ -58,12 +58,12 @@ def main():
     dataset_test = shuffle(dataset_test)
 
     # get dataset with equal number of classes occurences
-    noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 1]
-    not_noisy_df_train = dataset_train[dataset_train.ix[:, 0] == 0]
+    noisy_df_train = dataset_train[dataset_train.iloc[:, 0] == 1]
+    not_noisy_df_train = dataset_train[dataset_train.iloc[:, 0] == 0]
     nb_noisy_train = len(noisy_df_train.index)
 
-    noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 1]
-    not_noisy_df_test = dataset_test[dataset_test.ix[:, 0] == 0]
+    noisy_df_test = dataset_test[dataset_test.iloc[:, 0] == 1]
+    not_noisy_df_test = dataset_test[dataset_test.iloc[:, 0] == 0]
     nb_noisy_test = len(noisy_df_test.index)
 
     final_df_train = pd.concat([not_noisy_df_train[0:nb_noisy_train], noisy_df_train])
@@ -77,11 +77,11 @@ def main():
     final_df_test_size = len(final_df_test.index)
 
     # use of the whole data set for training
-    x_dataset_train = final_df_train.ix[:,1:]
-    x_dataset_test = final_df_test.ix[:,1:]
+    x_dataset_train = final_df_train.iloc[:,1:]
+    x_dataset_test = final_df_test.iloc[:,1:]
 
-    y_dataset_train = final_df_train.ix[:,0]
-    y_dataset_test = final_df_test.ix[:,0]
+    y_dataset_train = final_df_train.iloc[:,0]
+    y_dataset_test = final_df_test.iloc[:,0]
 
     #######################
     # 2. Construction of the model : Ensemble model structure
@@ -112,7 +112,7 @@ def main():
         x_dataset_test = x_dataset_test[0:total_validation_size]
         y_dataset_test = y_dataset_test[0:total_validation_size]
 
-    X_test, X_val, y_test, y_val = train_test_split(x_dataset_test, y_dataset_test, test_size=0.5, random_state=1)
+    X_test, X_val, y_test, y_val = train_test_split(x_dataset_test, y_dataset_test, test_size=0.2, random_state=1)
 
     y_test_model = model.predict(X_test)
     y_val_model = model.predict(X_val)
@@ -120,6 +120,12 @@ def main():
     val_accuracy = accuracy_score(y_val, y_val_model)
     test_accuracy = accuracy_score(y_test, y_test_model)
 
+    print('Train dataset 1 ', np.any(y_test_model == 1))
+    print('Train dataset 0 ', np.any(y_test_model == 0))
+
+    print('Val dataset 1 ', np.any(y_val_model == 1))
+    print('Val dataset 0 ', np.any(y_val_model == 0))
+
     val_f1 = f1_score(y_val, y_val_model)
     test_f1 = f1_score(y_test, y_test_model)
 
@@ -131,7 +137,7 @@ def main():
     print("Validation: ", val_accuracy)
     print("Validation F1: ", val_f1)
     print("Test dataset size ", test_set_size)
-    print("Test: ", val_accuracy)
+    print("Test: ", test_accuracy)
     print("Test F1: ", test_f1)
 
     ##################