Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction 

Spark plugins that trains and classify data based on Multinomial/Binary Logistic Regression.

Use-case

Following are the use-cases that the plugin should support:

  1. User should be able to train the data.
  2. User should be able to classify the test data using the model build while training the data.
  3. User should be able to provide the list of columns(features) to use for training.
  4. User should be able to provide the list of columns(features) to classify.
  5. User should be able to provide the column to be used as prediction field while training/classification.
  6. User should be able to provide the number of features to be used while training/classification.
  7. User should be able to provide the number of classes to be used while training/classification.
  8. User should be able to provide the file set name to save the training model.
  9. User should be able to provide the path of the file set.

 

User Stories

  1. User should be able to train the data.
  2. User should be able to classify the test data using the model build while training the data.


Example

Suppose the Trainer plugin gets below records to train the Logistic Regression Model:

StarterDessertTip
100
111
010
000


Trained on the above records, trainer plugin will provide the create regression model and save it to a Fileset location provided by the  user.

Implementation Tips


Design 

Logistic Regression Trainer

:

Input Json Format

Code Block
languagejs
linenumberstrue
{
  "name": "LogisticRegressionTrainer",
  "type": "sparksink",
  "properties": {
        "fileSetName": "logical-regression-model",
        "path": "/home/cdap/model",
        "fieldsToClassify": "Starter,Dessert",
        "predictionField": "Tip",
        "numFeatures": "2",
        "numClasses": "2"
   }
}

Table of Contents

Table of Contents
stylecircle

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature