Go to content ALT+c

Intro to Programming in Python for Biochemistry  (Sept.2020) (Old site; new site is at https://scinet.courses)

Friday May 31, 2024 - 05:00

7.6 Structure and clustering

In this assignment, you'll be trying out the k-means and agglomerative clustering methods on protein structures. Please note that this is meant as an exercise and is not intended to give very meaningful results.

Your task is to write a script that

  • uses Biopython's Bio.PDB module to download the protein structure with PDBID 1FV1.
  • extracts the 3D positions of all the atoms; this will be the data that you should try to cluster.
  • perform both k-means and agglomerative clustering with the number of cluster set to 3, 4, 5, and 6.
  • Produce plots of the results. The plots should be 3d projections, i.e., scatter plots of the X and Y coordinates, with the colour of each point determined by the cluster number. Another plot should be a scatter plot of Y and Z and a third of X and Z.
  • Produce these three plots for each of two clustering methods and each of the four numbers of clusters, but only submit those with a number of clusters that seems to make the most sense to you. That means 3 'best' plots for the k-means method, and 3 'best' plots for the clustering method.

Your script may combine the three plots for the same case using subplots if you want.

Submit your script and the six plots (or two plots with 3 subplots) by October 23, 2020 at 23:55 PM.

Last Modified: Friday Oct 16, 2020 - 17:49. Revision: 2. Release Date: Friday Oct 16, 2020 - 17:00.


Content Navigation


Forum Posts


Course Calendar


Related



Questions? Contact Support.
Web site engine's code is copyright © ATutor®.
Modifications and code of added modules are copyright of SciNet.