Skip to content
Snippets Groups Projects
ex1_5_1.py 3.49 KiB
Newer Older
  • Learn to ignore specific revisions
  • bjje's avatar
    bjje committed
    # exercise 1.5.1
    
    Stas Syrota's avatar
    Stas Syrota committed
    import importlib_resources
    
    bjje's avatar
    bjje committed
    import numpy as np
    import pandas as pd
    
    # Load the Iris csv data using the Pandas library
    
    Stas Syrota's avatar
    Stas Syrota committed
    filename = importlib_resources.files("dtuimldmtools").joinpath("data/iris.csv")
    
    
    # Print the location of the iris.csv file on your computer. 
    # You should inspect it manually to understand the format and content
    print("\nLocation of the iris.csv file: {}".format(filename))
    
    # Load the iris.csv file using pandas
    
    bjje's avatar
    bjje committed
    df = pd.read_csv(filename)
    
    # Pandas returns a dataframe, (df) which could be used for handling the data.
    
    Stas Syrota's avatar
    Stas Syrota committed
    # We will however convert the dataframe to numpy arrays for this course as
    
    bjje's avatar
    bjje committed
    # is also described in the table in the exercise
    
    Stas Syrota's avatar
    Stas Syrota committed
    raw_data = df.values
    
    bjje's avatar
    bjje committed
    
    # Notice that raw_data both contains the information we want to store in an array
    
    Stas Syrota's avatar
    Stas Syrota committed
    # X (the sepal and petal dimensions) and the information that we wish to store
    
    bjje's avatar
    bjje committed
    # in y (the class labels, that is the iris species).
    
    # We start by making the data matrix X by indexing into data.
    
    Stas Syrota's avatar
    Stas Syrota committed
    # We know that the attributes are stored in the four columns from inspecting
    
    bjje's avatar
    bjje committed
    # the file.
    
    Stas Syrota's avatar
    Stas Syrota committed
    cols = range(0, 4)
    
    bjje's avatar
    bjje committed
    X = raw_data[:, cols]
    
    # We can extract the attribute names that came from the header of the csv
    attributeNames = np.asarray(df.columns[cols])
    
    # Before we can store the class index, we need to convert the strings that
    
    Stas Syrota's avatar
    Stas Syrota committed
    # specify the class of a given object to a numerical value. We start by
    
    bjje's avatar
    bjje committed
    # extracting the strings for each sample from the raw data loaded from the csv:
    
    Stas Syrota's avatar
    Stas Syrota committed
    classLabels = raw_data[:, -1]  # -1 takes the last column
    # Then determine which classes are in the data by finding the set of
    # unique class labels
    
    bjje's avatar
    bjje committed
    classNames = np.unique(classLabels)
    
    # We can assign each type of Iris class with a number by making a
    # Python dictionary as so:
    
    
    Stas Syrota's avatar
    Stas Syrota committed
    classDict = dict(zip(classNames, range(len(classNames))))
    
    bjje's avatar
    bjje committed
    
    # The function zip simply "zips" togetter the classNames with an integer,
    
    Stas Syrota's avatar
    Stas Syrota committed
    # like a zipper on a jacket.
    
    bjje's avatar
    bjje committed
    # For instance, you could zip a list ['A', 'B', 'C'] with ['D', 'E', 'F'] to
    
    Stas Syrota's avatar
    Stas Syrota committed
    # get the pairs ('A','D'), ('B', 'E'), and ('C', 'F').
    # A Python dictionary is a data object that stores pairs of a key with a value.
    # This means that when you call a dictionary with a given key, you
    
    bjje's avatar
    bjje committed
    # get the stored corresponding value. Try highlighting classDict and press F9.
    
    Stas Syrota's avatar
    Stas Syrota committed
    # You'll see that the first (key, value)-pair is ('Iris-setosa', 0).
    # If you look up in the dictionary classDict with the value 'Iris-setosa',
    
    bjje's avatar
    bjje committed
    # you will get the value 0. Try it with classDict['Iris-setosa']
    
    # With the dictionary, we can look up each data objects class label (the string)
    
    Stas Syrota's avatar
    Stas Syrota committed
    # in the dictionary, and determine which numerical value that object is
    
    bjje's avatar
    bjje committed
    # assigned. This is the class index vector y:
    
    y = np.array([classDict[cl] for cl in classLabels])
    
    # In the above, we have used the concept of "list comprehension", which
    # is a compact way of performing some operations on a list or array.
    
    Stas Syrota's avatar
    Stas Syrota committed
    # You could read the line  "For each class label (cl) in the array of
    
    bjje's avatar
    bjje committed
    # class labels (classLabels), use the class label (cl) as the key and look up
    # in the class dictionary (classDict). Store the result for each class label
    
    Stas Syrota's avatar
    Stas Syrota committed
    # as an element in a list (because of the brackets []). Finally, convert the
    # list to a numpy array".
    # Try running this to get a feel for the operation:
    
    bjje's avatar
    bjje committed
    # list = [0,1,2]
    # new_list = [element+10 for element in list]
    
    
    Stas Syrota's avatar
    Stas Syrota committed
    # We can determine the number of data objects and number of attributes using
    
    bjje's avatar
    bjje committed
    # the shape of X
    N, M = X.shape
    
    
    Stas Syrota's avatar
    Stas Syrota committed
    # Finally, the last variable that we need to have the dataset in the
    
    bjje's avatar
    bjje committed
    # "standard representation" for the course, is the number of classes, C:
    C = len(classNames)