Project 2: Software Vulnerabilities
===================================

.. toctree::
   :hidden:

.. |ss| raw:: html

   <strike>

.. |se| raw:: html

   </strike>

- **Due:**  |ss| 04/06/2025 |se| 04/13/2025 (Sun) 11:59pm

Introduction
------------

In this project,
you will perform a series of software vulnerability exploits.
You will explore unsafe and insecure programming techniques
and evaluate the efficacy of operating system defenses against them.

Getting started
~~~~~~~~~~~~~~~

**If you have not finished setting up your project environment,
please follow the instructions in**
:doc:`Project Setup <tools>`
**to set it up first.**

We expect you will work on this project remotely in the project server
through SSH.

.. code-block:: sh

    [host] $ ssh NetID@csa-chk22b.utdallas.edu

Once you are successfully logged in,
fetch the source code of Project 2 in the project server.
To do that,
use Git to commit changes you've made since handing in Project 1 (if any),
fetch the latest version of the course repository, and then create a
local branch called ``project2`` based on
our project2 branch ``origin/project2``:

.. code-block:: sh

    $ cd ~/infosec
    $ git pull
    Already up-to-date.
    $ git add -A
    $ git commit -am 'changes to project1 after handin'
    Created commit 734fab7: changes to lab1 after handin
         3 files changed, 28 insertions(+), 7 deletions(-)
    $ git push
    Enumerating objects: 5, done.
    Counting objects: 100% (5/5), done.
    Delta compression using up to 4 threads
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 308 bytes | 308.00 KiB/s, done.
    Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
    remote:
    To ssh://s3lab.utdallas.edu:2224/cxk200010/infosec.git
       88682b1..494dc56  lab1 -> lab1
    $ git checkout -b project2 origin/project2
    Branch project2 set up to track remote branch refs/remotes/origin/project2.
    Switched to a new branch "project2"
    $ git clean -dfx
    $

The ``git checkout -b`` command shown above actually does two things: it
first creates a local branch ``project2`` that is based on the
``origin/project2`` branch provided by the course staff, and second, it
changes the contents of your project directory to reflect the files
stored on the ``project2`` branch.
Git allows switching between existing
branches using ``git checkout branch-name`` (change ``branch-name`` to the branch name you want to switch to), though you should commit any
outstanding changes on one branch before switching to a different one.
The ``git clean -dfx`` command deletes all unsaved files from
the ``project1`` branch (e.g., ``*.class`` files), which you do not need
for this project.

You will now need to bring your ``student.info`` file
from the ``project1`` branch
into the ``project2`` branch, as follows:

.. code-block:: sh

    $ git checkout project1 student.info
    $ git commit -am "Bring student.info from project1"


Updates for x64
~~~~~~~~~~~~~~~

.. note::

    The environment where your will exploit software vulnerabilities
    is based on the Intel x64 architecture (64-bit).

Your ``project2`` branch needs additional updates to
make it ready for the x64 environment.
To fetch the updates, run the following commands:

.. code-block:: sh

    $ cd ~/infosec
    $ git remote add infosec ssh://git@s3lab.utdallas.edu:2224/instructor/infosec.git
    $ git fetch infosec
    remote: Enumerating objects: 8, done.
    remote: Counting objects: 100% (8/8), done.
    remote: Compressing objects: 100% (8/8), done.
    remote: Total 8 (delta 0), reused 0 (delta 0), pack-reused 0
    Unpacking objects: 100% (8/8), 2.30 KiB | 785.00 KiB/s, done.
    From ssh://s3lab.utdallas.edu:2224/instructor/infosec
     * [new branch]      project1   -> infosec/project1
     * [new branch]      project2   -> infosec/project2
     * [new branch]      project3   -> infosec/project3
    $ git checkout remotes/infosec/project2 targets
    Updated 5 paths from 904b0eb
    $ git commit -am "Updates for x64"
    [project2 c23b800] Update project2 targets
     5 files changed, 160 insertions(+), 138 deletions(-)
     rewrite targets/target1.c (77%)
     rewrite targets/target2.c (75%)
     rewrite targets/target3.c (72%)
     rewrite targets/target4.c (85%)
    $ git push
    Enumerating objects: 15, done.
    Counting objects: 100% (15/15), done.
    Delta compression using up to 4 threads
    Compressing objects: 100% (8/8), done.
    Writing objects: 100% (8/8), 1.59 KiB | 1.59 MiB/s, done.
    Total 8 (delta 4), reused 0 (delta 0)
    remote:
    remote: To create a merge request for project2, visit:
    remote:   http://s3lab.utdallas.edu/cxk200010/infosec/-/merge_requests/new?merge_request%5Bsource_branch%5D=project2
    remote:
    To ssh://s3lab.utdallas.edu:2224/cxk200010/infosec.git
       4708d7c..c23b800  project2 -> project2
    $ git remote rm infosec

You can confirm if the updates were successful by comparing the contents
of the files in the ``targets`` directoy (e.g., ``targets/target2.c``)
with those in the instructor's
respository `here <http://s3lab.utdallas.edu/instructor/infosec/-/tree/project2/targets>`__.
If the file contents differ,
follow the instructions again to update the files correctly.


Target programs
~~~~~~~~~~~~~~~

Project 2 includes two directories: `targets` and `exploits`.
In the `targets` directory,
you will find the source files of target programs
(``target1.c``, ``target2.c``, ``target3.c``, and ``target4.c``).
These programs are written in C and contain hidden
**software vulnerabilities** that you need to identify and exploit.
**Please do not modify the source code of the target programs.**
The source code of your exploits
needs to be placed in the `exploits` directory.

You need to access a virtual machine (VM) that is set up to run
the targets and your exploits, namely ``attackme``.
This VM has the tools (e.g., ``gcc`` and ``gdb``) that
we expect you to use to examine the target programs and write
the exploits.
**We do not provide full-featured hacking and code analysis
tools, such as Ghidra and IDA in this VM.**
You will not gain your knowledge/techniques to
identify and exploit the software vulnerabilities
if you use these automated tools,
which is against the main goal of this project.
If you need legitimate tools and the VM does not have the tools
installed, you can request the CS 6324 staff to install them,
subject to the instructor's approval.

You can access the VM through SSH from inside the project server:

.. code-block:: sh

    $ ssh attackme
    [attackme] $

The VM shares your home directory (``~/``) with the project server.
Any changes in your home directory including project files
in the `infosec` directory will be synchronized between the VM
and project server automatically.

To compile the source code of the target programs,
run the following commands.

.. code-block:: sh

    [attackme] $ cd ~/infosec/targets
    [attackme] $ make
    gcc -ggdb -Wall    target1.c   -o target1
    gcc -ggdb -Wall    target2.c   -o target2
    gcc -ggdb -Wall    target3.c   -o target3
    gcc -ggdb -Wall    target4.c   -o target4
    target4.c: In function 'main':
    target4.c:12: warning: unused variable 'a'
    target4.c:12: warning: unused variable 'b'
    target4.c:12: warning: unused variable 'c'
    target4.c:12: warning: unused variable 'd'


Hand-in procedure
~~~~~~~~~~~~~~~~~

You will turn in your project by pushing your progress
to the repository and tag the final version of the project.

When you are ready to hand in your project code and report,
please place the report in a file called ``report-project2.pdf``
in the top level of your directory before handing in your work.
After that,
add your report to the Git repository with
``git add report-project2.pdf`` and ``git commit``.

If you have obtained help of any kind while working on this project,
make sure to write the names or URLs of your sources in
``references-project2.txt`` in the top level of your directory,
and add it to the repository with ``git add references-project2.txt``
and ``git commit``.

You need to update ``Makefile`` in the ``exploits`` directory.
This ``Makefile`` will be used by the CS 6324 staff to
run and grade your exploits.
Your exploits can be written in any programming/scripting
language that the VM supports
(e.g., Bash, C, Perl, and Python).

For example, you can create a shell script for your ``exploit1`` and
add the following command into ``Makefile`` to run it:

.. code-block:: sh

    exploit1:
         @echo "# exploit1: Add your commands to run the exploit below"
         /bin/bash exploit1.sh

Or, you can add the following commands into ``Makefile``
for us to run your ``exploit1`` written C:

.. code-block:: sh

    exploit1:
         @echo "# exploit1: Add your commands to run the exploit below"
         gcc exploit1.c -o exploit1
         ./exploit1

When the CS 6324 staff runs the following command in the VM,
your ``Makefile`` should allow running your ``exploit1``:

.. code-block:: sh

    $ make exploit1

**Failure to follow these rules will not give you any credit.**

.. note::

    **Tip**
    If you write an exploit in C,
    you should use a function like ``execve`` to launch the
    target, but not a function like ``system``.
    When using ``execve``, pass in ``NULL`` for the environmental variables
    so that it will be consistent and repeatable from run to run.

**Make sure your exploits run successfully in the VM**
since that is where we will grade them.
Run the following commands to check if your exploits run
successfully in the ``attackme`` VM.

.. code-block:: sh

    [attackme] $ cd ~/infosec/exploits
    [attackme] $ make

After checking your exploits run successfully in the VM,
add the new exploit files and any other new files you have created
into the Git repository,
commit, and push. For example,

.. code-block:: sh

    [attackme] $ git add exploit1.sh exploit2.c
    [attackme] $ git commit -am 'project2 complete'
    [attackme] $ git push

Tag your final commit as ``project2-final`` and
push the tag to the repository to submit your progress.
**We only grade the commit with this tag.**

.. code-block:: sh

    [attackme] $ git tag project2-final
    [attackme] $ git push
    [attackme] $ git push origin --tags

    # if you want to change the final tag,
    [attackme] $ git tag -d project2-final # this will delete the local tag
    Deleted tag 'project2-final' (was 75411c7)
    [attackme] $ git push origin :refs/tags/project2-final # this will delete the remote tag
    To ssh://s3lab.utdallas.edu:2224/cxk200010/infosec.git
    - [deleted]         project2-final


Simple Command Line Buffer Overflow (20 pts)
--------------------------------------------

``target1`` is a program that takes a directory as input,
and tells the user how to use the command
``ls`` to list the contents of the directory.
You will login as a normal user, and your goal is to pass
an argument to the program so it will start a shell
by exploiting **a buffer overflow vulnerability**.

Suppose that this program is setuid root,
then it would be possible to start a "root" shell.
Although the target program does not actually
have the setuid bit set or owned by root
(to avoid potential security issues),
please assume that it is setuid root.

.. note::

    You are required to exploit a buffer overflow vulnerability
    to start a shell.
    For example, passing "/bin/bash" as a command line argument
    to the target program to open a shell
    (without exploiting the vulnerability)
    will not give you any credit.

- In the ``exploits`` directory,
  write an exploit program (e.g., ``exploit1.c`` or ``exploit1.sh``)
  that passes an attack string to the
  target and performs the attack.
  Update ``Makefile`` to (compile and) run your exploit.
  ``make exploit1`` should run your exploit successfully.
- Identify the exact vulnerability in the program that you exploited
  (i.e. function name and line number).
  Explain why it is a vulnerability.
- Explain your attack strategy.
  That is, explain how you determined the correct input to pass
  and what commands are executed.


Buffer Overflow to Rewrite a Return (20 pts)
--------------------------------------------

The attack
~~~~~~~~~~

``target2`` is a program that takes a customer's name as the input,
and prints a coupon.
Assume that each customer can only execute the program once,
so he/she can only get one coupon.
Your goal is to pass some argument to the program so it will
repeatedly print coupons.
In other words, the argument will make the program execute the
function ``coupon`` repeatedly.

.. note::

    To get full credit, the function coupon has to execute
    an infinite number of times.
    If it only executes twice, then you will get half the points.

- In the ``exploits`` directory, write an exploit program
  (e.g., ``exploit2.c`` or ``exploit2.sh``) that passes
  the attack string to the target and performs the attack.
  Update ``Makefile`` to (compile and) run your exploit.
  ``make exploit2`` should run your exploit successfully.
- Identify the specific bug/vulnerability that made your attack possible
  (i.e. function name and line number).
  Explain why it is a vulnerability.
- Describe your attack strategy.
  That is, describe the memory addresses involved in your attack,
  and explain how the attack made the program print
  an unlimited number of coupons.

The defense
~~~~~~~~~~~

The project server that hosts the ``attackme`` VM
has an updated operating system with some stack defenses activated.

- Recompile ``target2`` outside the VM.
- Repeat the attack on ``target2`` outside the VM.
  Did the attack work? Comment on your results (i.e. explain why).
- Propose **two** different operating system and/or
  compiler/programming language defenses that can be used to
  prevent this attack from working.
  Discuss the advantages, disadvantages, and
  feasibility of the proposed defenses.


Return to LibC (20 pts)
-----------------------

The attack
~~~~~~~~~~

``target3`` is a program that scans several network packets and
checks if the traffic (concatenation of the packets) matches
any virus signatures.
Suppose ``target3`` is setuid root.
You will login as a normal user, and the goal is to pass
argument(s) to the program to start a root shell
by exploiting **a return-to-libc vulnerability**.

You need to assume that the stack is not executable.
Therefore, you cannot change the return address to the shellcode
in the stack.

- Draw the layout of the stack frame corresponding to the function
  ``is_virus`` directly after the local variables are initialized.
  For each element on the stack, provide its size.
- In the ``exploits`` directory,
  write an exploit program (e.g., ``exploit3.c`` or ``exploit3.sh``)
  that performs the attack.
  Update ``Makefile`` to (compile and) run your exploit.
  ``make exploit3`` should run your exploit successfully.
- Identify the specific bug in the program and vulnerability
  that made your attack possible
  (i.e. function name and line number).
  Explain why it is a vulnerability.
- Describe your attack strategy.
  That is, explain what memory addresses you used and
  how you figured out those addresses.

The defense
~~~~~~~~~~~

Try repeating the above attack on the project server outside the VM.
The attack should become more difficult now.

- Are you able to get the attack to work?
  If so, explain your method.
  Otherwise, explain what prevented you from completing the attack.
- What specific mechanism(s) make the attack more difficult?


Format String Attacks (40 pts)
------------------------------

``target4`` has **a format-string vulnerability**;
your task is to develop
a scheme to exploit this vulnerability.

The target program asks the user to provide an input,
which will be saved in a buffer called ``user_input``.
The program then prints out the buffer using ``printf``.
Unfortunately, there is a format-string vulnerability in the way
the ``printf`` is called on the user inputs.
We want to exploit this vulnerability and
see how much damage we can achieve.

The program has **two secret values** stored in its memory,
and you are interested in these values.
However, the secret values are unknown to you,
nor can you find them from reading the binary code.

.. note::

    For the sake of simplicity, we hardcode the secrets
    using constants 0x44 and 0x55, but you can pretend that
    you don't have the source code or the secrets.

Although you do not know the secret values,
in practice, it is not so difficult to find out
their memory addresses
(the range or the exact value).
The values are at consecutive memory addresses
because in many operating systems,
memory addresses remain unchanged any time you run the program.

- Draw the layout of the stack frame corresponding to the main function
  directly after the local variables are initialized.
  For each element on the stack, provide its size.
- Provide the specific inputs (i.e. both the integer and the string)
  that you need in order to crash the program.
  Write the inputs in ``exploit4.txt`` **at line 1** in the form of
  ``int string``
  That is, **the integer number**, followed by **a space**,
  followed by **the string**.
  Explain why the program crashes with your inputs.
- Provide the specific inputs (i.e. both the integer and the string)
  that you need in order to print the **address** of the variable
  ``secret[0]``.
  Write the inputs in ``exploit4.txt`` **at line 2** in the form of
  ``int string``.
  Explain why you think this is the correct address.

.. note::

  **Tip**
  You can use GDB to verify that your answer is correct.

- Provide the specific inputs (i.e. both the integer and the string)
  that you need in order to print the **value** of ``secret[0]``.
  Write the inputs in ``exploit4.txt`` **at line 3** in the form of
  ``int string``.
  Explain your strategy.
- Based on your knowledge of how arrays are stored on the heap,
  calculate the **address** of ``secret[1]``.
  Write the inputs in ``exploit4.txt`` **at line 4** in the form of
  ``int string``.
  Explain your strategy.
- Provide the specific inputs (i.e. both the integer and the string)
  that you need in order to print the **value** of ``secret[1]``.
  Write the inputs in ``exploit4.txt`` **at line 5** in the form of
  ``int string``.
  Explain your strategy.
- Provide the specific inputs (i.e. both the integer and the string)
  that you need in order to modify the values of **both**
  ``secret[0]`` and ``secret[1]``.
  Write the inputs in ``exploit4.txt`` **at line 6** in the form of
  ``int string``.
  Explain your strategy.
- Does Address Space Layout Randomization (ASLR) make this attack more
  difficult? Explain.
- What other operating system defenses can be used to
  prevent this attack? Explain.

Failure to follow the format rules on ``exploit4.txt``
will not give you any credit.

``Makefile`` in the ``exploits`` directory has commands
to print out the content of ``exploit4.txt``
when ``make exploit4`` is executed.
**Please do not change that part.**


Using GDB
---------

GDB is the best way to understand how a target program
executes internally when certain inputs given at run-time
in this project.
You can also identify the memory addresses of variables and buffers
to exploit, and print out the content of the stack memory,
including the return address of a function call.
See the `GDB
manual <http://sourceware.org/gdb/current/onlinedocs/gdb/>`__ for a full
guide to GDB commands. Here are some particularly useful commands for
this project.

Ctrl-c
    Halt the machine and break in to GDB at the current instruction.
c (or continue)
    Continue execution until the next breakpoint or ``Ctrl-c``.
si (or stepi)
    Execute one machine instruction.
b function or b file\:line (or breakpoint)
    Set a breakpoint at the given function or line.
b \*\ *addr* (or breakpoint)
    Set a breakpoint at the EIP *addr*.
set print pretty
    Enable pretty-printing of arrays and structs.
info registers
    Print the general purpose registers, ``eip``, ``eflags``, and the
    segment selectors.
x/\ *N*\ x *addr*
    Display a hex dump of *N* words starting at virtual address *addr*.
    If *N* is omitted, it defaults to 1. *addr* can be any expression.
x/\ *N*\ i *addr*
    Display the *N* assembly instructions starting at *addr*. Using
    ``$eip`` as *addr* will display the instructions at the current
    instruction pointer.