Skip to content
Snippets Groups Projects
Select Git revision
  • b84920decaa958eb8b1c46dcd024ec54464aca0c
  • master default protected
2 results

README.jinja.md

Blame
  • Tue Herlau's avatar
    tuhe authored
    b84920de
    History
    Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    Unitgrade-devel

    Note: This is the development version of unitgrade. If you are a student, please see http://gitlab.compute.dtu.dk/tuhe/unitgrade.

    Unitgrade is an automatic report and exam evaluation framework that enables instructors to offer automatically evaluated programming assignments. Unitgrade is build on pythons unittest framework so that the tests can be specified in a familiar syntax and will integrate with any modern IDE. What it offers beyond unittest is the ability to collect tests in reports (for automatic evaluation) and an easy and 100% safe mechanism for verifying the students results and creating additional, hidden tests. A powerful cache system allows instructors to automatically create test-answers based on a working solution.

    • 100% Python unittest compatible
    • No configuration files
    • No limitations: If you can unittest it, it works
    • Tests are quick to run and will integrate with your IDE
    • Cache and hint-system makes tests easy to develop
    • Granular security model:
      • Students get public unittests for easy development of solutions
      • Students use a tamper-resistant file to create submissions which are uploaded
      • Instructors can automatically verify the students solution using a Docker VM and run hidden tests
    • Automatic Moss anti-plagiarism detection
    • CMU Autolab integration (Experimental)

    Install

    Simply use pip

    pip install unitgrade-devel

    This will install unitgrade-devel (this package) and all dependencies to get you started.

    Videos

    For videos see the /videos directory : https://gitlab.compute.dtu.dk/tuhe/unitgrade_private/-/tree/master/videos or here - First test: https://youtu.be/jC9AzZA5FcQ - Framework and hints: https://youtu.be/xyY9Qan1b1Q - MOSS plagiarism check: https://youtu.be/Cp4PvOnYozo - Hidden tests and Docker: https://youtu.be/vP6ZqeDwC5U - Jupyter notebooks: https://youtu.be/B6nzVuFTEsA - Autolab: https://youtu.be/h5mqR8iNMwM

    Instructions and examples of use

    The examples can be found in the /examples directory: https://gitlab.compute.dtu.dk/tuhe/unitgrade_private/-/tree/master/examples

    A simple example

    Unitgrade makes the following assumptions:

    • Your code is in python
    • Whatever you want to do can be specified as a unittest

    Although not required, it is recommended you maintain two version of the code:

    • A fully-working version (i.e. all tests pass)
    • A public version distributed to students (some code removed))

    I use codesnipper (see http://gitlab.compute.dtu.dk/tuhe/snipper) to synchronize the two versions automatically.
    Let's look at an example. Suppose our course is called cs101, in which case we make three files in our private folder instructor:

    instructor/cs101/homework.py # This contains the students homework
    instructor/cs101/report1.py  # This contains the tests
    instructor/cs101/deploy.py   # A private file to deploy the tests

    The homework

    The homework is just any old python code you would give to the students. For instance:

    {{ homework1_py }}

    The test:

    The test consists of individual problems and a report-class. The tests themselves are just regular Unittest (we will see a slightly smarter idea in a moment). For instance:

    {{report1_py}}

    A number of tests can be collected into a Report, which will allow us to assign points to the tests and use the more advanced features of the framework later. A complete, minimal example:

    {{report1_all_py}}

    Deployment

    The above is all you need if you simply want to use the framework as a self-check: Students can run the code and see how well they did. In order to begin using the framework for evaluation we need to create a bit more structure. We do that by deploying the report class as follows:

    {{deploy_py}}
    • The first line creates the report1_grade.py script and any additional data files needed by the tests (none in this case)
    • The second line set up the students directory (remember, we have included the solutions!) and remove the students solutions. You can check the results in the students folder.

    If you are curious, the grade script looks like this:

    '''WARNING: Modifying, decompiling or otherwise tampering with this script, it's data or the resulting .token file will be investigated as a cheating attempt.'''
    import bz2, base64
    exec(bz2.decompress(base64.b64decode('QlpoOTFBWSZTWY/Cr/0ANxB/gHb2RFR//////+//vv////5gQb3d9962+(etc. etc.)')))

    Using the framework as a student

    After you run the deploy-script, the student directory will contain three files

    students/cs101/homework1.py      # Homework files without solutions (see for yourself)
    students/cs101/report1.py        # Identical to the instructor-report
    students/cs101/report1_grade.py  # Grade-script which runs the tests in report1.py and generates the .token-file. 

    You can now upload the student directory to the students. The students can run their tests either by running cs101/report1.py in their IDE or by typing:

    python -m cs101.report1

    in the command line. This produces a detailed output of the test and the program is 100% compatible with a debugger. When the students are happy with their output they can run (using command line or IDE):

    python -m cs101.report1_grade

    This runs an identical set of tests and produces the file Report1_handin_10_of_10.token the students can upload to get credit.

    • The report1_grade.py includes all tests and the main parts of the framework and is obfuscated by default. You can apply a much strong level of protection by using e.g. pyarmor.
    • The .token file includes the outcome of the tests, the time taken, and all python source code in the package. In other words, the file can be used for manual grading, for plagirism detection and for detecting tampering.
    • You can easily use the framework to include output of functions.
    • See below for how to validate the students results

    How safe is Unitgrade?

    There are three principal ways of cheating:

    • Break the framework and submit a .token file that 'lies' about the true number of points
    • 'Overfit' the tests by checking for specific inputs and hard-code the output
    • Plagiarism

    The degree to which the above problems needs to be mitigated depends on the course, but there are easy ways to mitigate them, but to address the three ways of cheating I recommend the following:

    • Automatically re-run the students tests on your computer using Docker (see below) to automatically detect difference in their (claimed) outcome and the (actual) outcome
    • Include a few hidden tests. If the students tests pass, but hidden tests with minor input-argument variation fail, something is probably up
    • Use the build-in Moss plagiarism input to get a detailed plagiarism report (see below)

    I think the most important things to keep in mind are the following:

    • The _grade.py-script is self-contained (i.e. contains an independent copy of all tests)
    • The _grade.py-script and .token file is not in an easily editable format.
    • The .token file will contain a copy of all the students source code, as well as any intermediary outputs returned by tests

    This means that if a student begins to tamper with the framework, all the evidence of the tampering will be readily available, and any inconsistencies will be very difficult to explain away. Therefore, unlike for a report, you cannot submit code as a .pdf file, and you cannot afterwards claim you mistook the Download folder for the Desktop and accidentially uploaded your friends version of some of the code.

    If this is not enough, you can consider using pyarmor on the _grade.py script to create a very difficult challenge for a prospective hacker.

    Example 2: The framework

    One of the main advantages of unitgrade over web-based autograders it that tests are really easy to develop and maintain. To take advantage of this, we simply change the class the questions inherit from to UTestCase (this is still a unittest.TestCase) and we can make use of the chache system. As an example:

    {{ report2_py }}

    Note we have changed the test-function to self.assertEqualC (the C is for cache) and dropped the expected result. What unitgrade will do is to evaluate the test on the working version of the code, compute the results of the test, and allow them to be available to the user. All this happens in the deploy.py script from before.

    Nicer titles

    Titles can be set either using python docstrings or programmatically. An example:

    {{ report2_b_py }}

    When this is run, the titles are shown as follows:

    {{ deploy_txt }}

    What happens behind the scenes when we set self.title is that the result is pre-computed on the instructors machine and cached. This means the last test will display the correct result regardless of how reverse_list has been implemented by the student. The titles are also shown correctly when the method is run as a unittest.

    Caching computations

    The @cache-decorator offers a direct ways to compute the correct result on an instructors computer and submit it to the student. For instance:

    {{ report2_c_py }}

    The @cache decorator will make sure the output of the function is pre-computed when the test is set up, and that the function will simply return the correct result regardless of the function body. This is very helpful in a few situations:

    • if you have exercises that depend on each other, and you want students to have access to the expected result of older methods which they may not have implemented correctly.
    • If you want to use functions the students write to set up appropriate tests without giving away the solution.
    • To simply print out the correct result so it is apparent to the student

    Finally, notice how one of the tests has a return value. This will be automatically saved in the .token file (this is useful for open-ended questions or for security).

    Example 3: Hidden and secure tests

    To use unitgrade as part of automatic grading, it is recommended you check the students output locally and use hidden tests. Fortunately, this is very easy.

    Let's start with the hidden tests. As usual we write a complete report script (report3_complete.py), but this time we use the @hide-decorator to mark tests as hidden:

    {{ report3_complete_py }}

    For simplicity, non-hidden test will always pass, and the hidden test will always fail: This makes it easy to interpret the results in the following.

    Next we need to create students report and grade scripts. This can done as follows:

    {{ deploy_docker_a_py }}

    Just to check, let's have a quick look at the students report script report3.py:

    {{ report3_py }}

    The grade script works as normal, and just to make the example self-contained, let's generate the students .token-file as follows:

    {{ deploy_docker_b_py }}

    Setting up and using Docker

    We are going to run the students tests in a Docker virtual machine so that we avoid any underhanded stuff, and also because it makes sure we get the same result every time (i.e., we can pass the task on to TAs). To do that, you first have to install Docker (easy), and then build a Docker image. We are going to use one of the pre-baked images from https://gitlab.compute.dtu.dk/tuhe/unitgrade_private/-/tree/master/docker_images, which simply consists of a lean Linux distribution with python 3.8 and whatever packages are found in requirements.txt. If you need more it is very easy to add. To download and build the Docker image simply run:

    {{ deploy_docker_c_py }}

    This takes about 2 minutes but only needs to be done once. If you are keeping track we have the following:

    • A grade script with all tests, report3_complete_grade.py, which we build when the file was deployed
    • A (student) .token file we simulated, but in general would have downloaded from DTU Learn
    • A Docker image with the right packages

    Next we feed this into unitgrade:

    {{ deploy_docker_d_py }}

    Behind the scenes, this code does the following:

    • Load the docker image
    • Create a tmp dir where the files in the student .token files will be placed
    • Put the report3_complete_grade.py script at the right location in the source tree (unitgrade can guess this)
    • Run report3_complete_grade.py and collect the resulting token file

    Just to show it works we will load both .token-files and print the results:

    {{ deploy_docker_e_py }}

    The results (shown in a (points_obtained, possible_points) format) will be printed as:

    {{example_docker_instructor_cs103_docker_results_txt }}

    As expected, the (failed) hidden tests reduces the total points obtained. It will be easy to check, for instance by calling the hidden tests def test_something_hidden when the regular test, test_something, passes and the hidden test fails.

    Moss plagiarism detection

    You can easily apply Moss to the students token files. First get moss from https://theory.stanford.edu/~aiken/moss/ and create two directories:

    whitelist/   # Whitelisted files. Code from these files are part of the handouts to students
    submissions/ # Where you dump student submissions.

    The whitelist directory is optional, and the submissions directory contains student submissions (one folder per student):

    /submissions/<student-id-1>/Report1_74_of_144.token
    /submissions/<student-id-2>/Report1_130_of_144.token
    ...

    The files in the whitelist/student directory can be either .token files (which are unpacked) or python files, and they may contain subdirectories: Everything will be unpacked and flattened. The simplest way to set it up is simply to download all files from DTU learn as a zip-file and unzip it somewhere. When done just call moss as follows:

    {{ moss_example_py }}

    This will generate a report. You can see the example including the report here: https://lab.compute.dtu.dk/tuhe/unitgrade_private/-/tree/master/examples/example_moss

    Smart hinting

    To help students get started, unitgrade will collect hints to solve failed tests from across the codebase and display them. Consider the following homework where two problems depends on each other and the instructor has given a couple of hints: (example taken from example_hints):

    {{ homework1_hints_py }}

    The report_file also contains a single hint:

    {{ report1hints_py }}

    When students run this homework it will fail and display the hints from the two methods: alt text|small

    What happens behind the scenes is that a code-coverage tool is run on the instructors computer to determine which methods are actually used in solving a problem, and then the hint-texts of those methods are collected and displayed. This feature requires no external configuration; simply write Hints: in the source code.

    CMU Autolab support (Experimental)

    CMU Autolab is a mature, free and opensource web-based autograder developed at Carnegie Mellon University and used across the world. You can find more information here: https://autolabproject.com/. It offers all features you expect from an online autograder

    • Web-based submission of homework
    • Class-management
    • Build in TA feedback mechanism
    • Class monitoring/statistics
    • Automatic integration with enrollment data (Autolab supports LDAP and Shibboleth) means Autolab can be plugged in to existing IT infrastructure (including DTUs)
    • CLI Tools

    An important design choice behind CMU Autolab is the grading is entirely based on Makefiles and Docker VMs. I.e., if you can make your autograding scheme work as Makefile that runs code on a Docker image you specify it will work on Autolab. This makes it very easy to let third-party platforms work with an unmodified version of Autolab. The following contains all steps needed to compile a Unitgrade test to Autolab

    Step 1: Set up Autolab

    Simply follow the guide here: https://docs.autolabproject.com/installation/overview/ to set up Autolab. I used the 'manual' installation, but it should also work with the Docker-compose installation.

    Step 2: Compile a unitgrade test to Autolab lab-assignment format

    Autolab calls handins for lab assignments, and allow you to import them as .tar-files (see the Autolab documentation for more information). We can build these automatically in a few lines as this example demonstrates. The code for the example can be found in examples/autolab_example. It consists of two steps. The first is that you need to build the Docker image for Autolab/Tango used for grading. This is exactly like our earlier example using Docker for Unitgrade, except the image contains a few additional autolab-specific things. You can find the image here:

    Concretely, the following code will download and build the image (note this code must be run on the same machine that you have installed Autolab on)

    {{ deploy_autolab_a_py }}

    Next, simply call the framework to compile any _grade.py-file into an Autolab-compatible .tar file that can be imported from the web interface. The script requires you to specify both the instructor-directory and the directory with the files the student have been handed out (i.e., the same file-system format we have seen earlier).

    {{ deploy_autolab_b_py }}

    This will produce a file cs102.tar. Whereas you needed to build the Docker image on the machine where you are running Autolab, you can build the lab assignments on any computer.

    Step 3: Upload the .tar lab-assignment file

    To install the cs102.tar-file, simply open your course in Autolab and click the INSTALL ASSESSMENT button. Click Browse and upload the cs102.tar file: alt text|small

    You will immediately see the page for the assignment where you can begin to upload solutions! The solutions are (of course!) .token files, and they will be automatically unpacked and run on Autolab.

    To test it, press the big upload square and select the .token file for the second assignment found in examples/example_framework/instructor/cs102/Report2_handin_18_of_18.token. The file will now be automatically evaluated and the score registered as any other Autolab assignment:

    alt text|small

    The students can choose to view both the console output or a nicer formatted overview of the individual problems:

    alt text|small

    and TAs can choose to annotate the students code directly in Autolab -- we are here making use of the fact the code is automatically included in the top of the .token-file.

    alt text|small

    Citing

    {{bibtex}}