Go to content ALT+c

Neural Network Programming  (Apr. 2019) (Old site; new site is at https://scinet.courses)
Login to register

Friday May 31, 2024 - 03:42

4.1 Assignment 1

Due date: Thursday, May 2nd, 2019 at midnight

Consider the HEPMASS data set, a sample which can be found in the CSV file found here.  This sample data set consists of 29 columns and 200,000 rows.  These data represent simulations of events in a High Energy Physics experiment (particle collider).

The purpose of this assignment is to build a neural network which will predict whether a given event is a "signal" or "background".  To make this prediction, use as input the values from the columns whose names begin with "f" ("f0", "f1", "f2", etc.). Do not use the "mass" column. Use the "label" column as the target of the data (1 = "signal", 0 = "background").

Create a Python script, called "hep_nn.py", which performs the following steps:

  • reads in the HEP data set given in the link above (the 'pandas' package may be helpful here).  You may assume that the above CSV file is colocated with the script; the file name may be hard-coded.
  • separates the input and output data from the data set (you may hard-code the columns for this assignment),
  • splits the input and output data into training and testing data sets,
  • builds a neural network, using Keras, to predict the label of the HEP events,
  • trains the network on the training data,
  • evaluates the network on the test data, and prints out the result.
  • creates a plot of the model's training loss as a function of epoch.

Experiment with your script, varying the parameters in your model (number of hidden layers, number of nodes per layer, activation functions, presence/absence of regularization or dropout or batch normalization, cost function, optimization algorithm) to get the best model you can find.  You should run the training until the loss stops improving, as demonstrated by your plot. The best model I have found consistently returns a test result of about 82% accuracy.  See if you can do better.

Your script will be tested from the Linux command line, thus:

$ python hep_nn.py
Using Theano backend.
Reading HEP file.
Building network.
Training network.
The test score is [0.4090987194061279, 0.8174]
$

The script will be graded on functionality, but also on form.  This means your script should use meaningful variable names and be well commented.


Submit your 'hep_nn.py', and the plot of your training loss, to the 'Assignment Dropbox'

Assignments will be graded on a 10 point basis.
Due date is May 2nd 2019 (midnight), with 0.5 penalty point per day off for late submission until the cut-off date of May 9th, at 11:00am.

Last Modified: Wednesday Apr 24, 2019 - 09:22. Revision: 21. Release Date: Wednesday Apr 24, 2019 - 17:00.


Content Navigation


Course Calendar


Related



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.