Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lab 2: Expressions

# ASSIGNMENT CONFIG
init_cell: true 
export_cell: true 
files:
    - dotplot.png
    - dotplot_bomb.png
    - desserts.csv
    - error.jpg
    - gradescope.png
    - d8error.py
    - errorConfig.json
export_cell:
    pdf: false
    force_save: false
solutions_pdf: true
template_pdf: false
generate:
    points_possible: 100
    show_stdout: true
    zips: false
  Cell In[1], line 4
    files:
          ^
SyntaxError: invalid syntax
# Don't change this cell; just run it. 

import numpy as np
from datascience import *
#import d8error

Lab 2: Expressions

Welcome to Lab 2 for Data Science Focused for Social Sciences! You can’t learn technical subjects without hands-on practice, so labs are an important part of the course.

Collaborating on labs is more than okay -- it’s encouraged! You should rarely remain stuck for more than a few minutes on questions in labs, so ask an instructor or classmate for help. (Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it.) Please don’t just share answers, though.

Today’s lab

In today’s lab, you’ll learn how to:

  1. Navigate Jupyter notebooks (like this one);

  2. Write and evaluate some basic expressions in Python, the computer language of the course; and

  3. Learn some introductory data analysis.

This lab covers parts of Chapter 3 of the online textbook. You should read the examples in the book, but not right now. Instead, let’s get started!


Part 1: Jupyter Notebooks

This webpage is called a Jupyter Notebook. A notebook is a place to write programs and view their results, and also to write text.

Text cells

In a notebook, each rectangle containing text or code is called a cell.

Text cells (like this one) can be edited by double-clicking on them. They’re written in a simple format called Markdown to add formatting and section headings. You don’t need to learn Markdown, but you might want to.

After you edit a text cell, click the “run cell” button at the top that looks like ▶| or hold down shift + return to confirm any changes. (Try not to delete the instructions of the lab.)

Computer Programming Fundamentals

In computing, source code is text (usually plain text) that conforms to a human-readable programming language and specifies the behavior of a computer. A programmer writes code to produce a program that runs on a computer. A programming script is a relatively short and simple set of programming source code.

Programming syntax refers to the set of rules that dictate the structure and format of a programming language. It defines how commands and instructions are written in a way that the computer can understand and execute. Programming syntax defines the correct way to write code in a language so that it can be understood and executed by computers and programming syntax differs between programming languages.

The Python Programming Language is a popular programming language. It was created by Guido van Rossum, and released in 1991. Python was designed for readability, and has some similarities to the English language with influence from mathematics.

Computer program output is the information or result produced by a program. Computer program output can be displayed to the user of the computer program as text, images, audio, or video.


Coding Examples

To better understand coding practices, doing coding exercises is one of the best ways to learn! Here are some practice problems to help familiarize yourself with python.

Write and Run the code below:

print (“Hello, world!”)

#type code here
Write and Run the code below:

print (1)

print(1.0)

print (1 + 2)

print(1.0 + 2.0)

print (“1” + “3”)

print (1 - 3)

print (2 * 3)

print (2.0 * 3.0)

print (2 / 3)

print (2.0 / 3.0)

print(2 ** 3)

#type code here
Write and Run the code below:

print (“2 + 3 =” , 2 + 3)

#type code here
  • Please describe the different output result from [ print (“2 + 3 =” , 2 + 3) ]

  • Type your answer here


    The print() function outputs the specified text placed inside of the parenthiesis as plain text.

    Escape Character or Escape Sequence is a character that invokes an alternative interpretation on the following characters in a character sequence.

    Escape characters are important to computer programming because keyboard characters can have multiple meaning in programming language syntax.

    \t - Tab

    \\ - Backslash

    \` - Single Quote

    \" - Double Quote

    \n " - New Line


    Write and Run the code below:

    print (“hello\tworld”)

    print (“\”)

    print ("\’”)

    print (“"”)

    print (“hello\nworld”)

    #type code here

    Part 2: Data Analysis

    Overview - Population, Sample, & Data

    Data science involves recognizing patterns and making predictions from datasets. The analysis usually starts with a research question. Typically, the research question is something we want to know about a population. The population is the entire group we want to know something about. The population may be people, but it may be other things such as vehicles, objects or animals. For example, we may want to answer the questions:

    1. “What percent of vehicles in the US are hybrid?” (Population: US vehicles).

    2. “What is the average height of a eucalyptus tree?” (Population: eucalyptus trees)

    In most cases, the population is a large group. Often, the population is so large that we cannot collect information from every individual in the population, so we select a sample from the population. We collect data from this sample. Data is information or measurements that will help us answer the research question.

    The sample needs to represent the population well. For example if we are investigating the heights of eucalyptus trees, we want to be sure not to include any other type of trees in our analysis and we want to be sure to take samples from different regions.

    To make sense of the data we collect from the sample, we summarize it using graphs and different numerical measures, such as percentages or averages. Datasets contain the data collected and are often organized in table format.

    Data consist of individuals and variables that give us information about those individuals. An individual can be an object or a person. A variable is an attribute, such as a measurement or a label. There are two types of variables: quantitative and categorical.

    Categorical variables take category or label values and place an individual into one of several groups. Each observation can be placed in only one category, and the categories are mutually exclusive Quantitative variables take numerical values and represent some kind of measurement.

    Coding Examples

    The table frozen_desserts.csvcontains data on 20 different frozen dessserts. Each row represents one such frozen dessert.

    Run the next cell to load the frozen_desserts table and see its output.

    # Just run this cell
    frozen_desserts = Table.read_table('desserts.csv')
    frozen_desserts.show()
    Loading...
    Use the print function to answer the following questions:
    1. Use print() to output the individuals in the dataset?****

    2. Use print() to output the column names of the categorical variables in the dataset?

    3. Use print() to output the column names of the quantitative variables in the dataset?

    4. Would you consider this Data Set a sample of frozen desserts or the entire population of frozen desserts?

    #Type your answer here
    
    Write a programming script using the print() function. Output 6 Desserts and the coresponding Grams of Sugar associated with each Dessert. Utilize as many print() functions as necessary.

    Hint: Each Dessert Corresponds to a row of the table of the data set.

    Expected Sample Output:

    Lemon Italian Ice 10 Grams of Sugar

    Vanilla Ice Cream 19 Grams of Sugar

    Chocolate Frozen Yogurt 20 Grams of Sugar

    Coconut Gelato 22 Grams of Sugar

    Raspberry Sherbet 19 Grams of Sugar

    Cookies and Cream 18 Grams of Sugar

    #Type your answer here
    Used Escape Characters to produce the same ouput with one print
    #Type your answer here

    Part 3: Visualizations Introduction

    Graph - Dotplot

    In data analysis, our goal is to describe patterns in the data and create a useful summary about a group. A table is not a useful way to view data because patterns are hard to see in a table. Thus, creating a graph of the distribution of the variable is usually the first step in data analysis.

    One type of graph is called a dotplot. A dotplot gives a better summary of the distribution of grams of sugar. In a dotplot, each dot represents one individual. Let’s look at the dotplot of the frozen desserts.

    Example Dotplot

    Here, each dot is a frozen dessert. The numbers on the horizontal axis are the variable values. The variable in this case is sugar in grams per serving. The vertical axis gives the count of desserts.

    In a dotplot we can see the variable values and how many individuals have each value. For example, 2 frozen desserts have 19 grams of sugar and 3 frozen desserts have 21 grams of sugar.

    Answer the following questions with True or False about the distribution of frozen desserts
    1. The sugar content for these frozen desserts range from 10 to 29 grams.

    2. For this group of desserts, typical sugar content ranges from 14 years to 30 grams

    3. More than half of the frozen desserts have over 20 grams of sugar.

    4. It is unusual for one of frozen desserts to have more than 25 grams of sugar.

    #Type your answer here

    Shape, Center and Spread

    When we describe patterns in data, we use descriptions of shape, center, and spread. We also describe exceptions to the pattern. We call these exceptions outliers. Outliers are notably deviations from the trend.

    Shape

    Common descriptions of shape are:

    A right-skewed distribution has a lot of data at lower variable values with smaller amounts of data at higher variable values. Data cluster on the left of the distribution with a tail of data tapering off to the right.

    A left skewed distribution has a lot of data at higher variable values with smaller amounts of data at lower variable values. Data cluster on the right of the distribution with a tail of data tapering off to the left.

    A Symmetric (or bell-shaped) distribution has a central peak where data is concentrated, with a tail in both directions.

    A uniform distribution has the same amount of data for each value. So the distribution looks rectangular.

    Center

    When we describe a distribution of a quantitative variable, it is helpful to identify a typical value. We choose a single value of the variable to represent the entire group. This is one way to think about the center of the distribution.

    Spread

    We also want to describe how much the data varies among individuals in the group. Variability is another word for spread. We describe the spread in two ways:

    1. Find the range of the data, by looking at the smallest value and the largest value.

    2. Find the interval of typical values to represent common variable values for the group.

    Look back at the dotplot of the frozen desserts. Use the print function to output the following: Make sure to ouptut the answer as a complete sentence.
    1. Identify the shape of the distribution of frozen desserts.

    2. Identify the center of the distribution of frozen desserts.

    3. Identify the range of the distribution of frozen desserts.

    4. Identify the interval of typical values of the distribution of frozen desserts.

    5. Identify any outliers in the distribution of frozen desserts.

    #Type your answer here
    Produce the same output as question 3.2 but in only 1 print() function.
    #Type your answer here

    Measuring Center

    One measure of center is the average or mean. This is found by adding all the data values in the distribution and dividing by the total number of values. We usually use the mean as a measure of center when the distribution is symmetrical.

    Another measure of center is the median. The median is the middle of the data when all the values are listed in order. The median divides the data into two equal-sized groups. There is as much data below the median as above it. If the dataset has an even number of values, the median is found by adding the 2 middle numbers and dividing by 2. We usually use the mean as a measure of center when the distribution is skewed.

    Answer the following questions:
    1. Use the print() function to calculate and output the mean for the Grams of Sugar Column of the Frozen Desserts dataset.

    2. Use the print() function to output the median for the Grams of Sugar Column of the Frozen Desserts dataset.

    3. Use the print() function to output the better measure of center for the Grams of Sugar Column of the Frozen Desserts dataset, mean or median? Be sure to include a sentence supporting your answer

    #Type your answer here

    Now let’s assume a “Super Sugar bomb” ice cream is added to our frozen desserts dataset. This ice cream has 100 grams of sugar. Here’s the updated dotplot.

    Updated Dotplot
    Answer the following questions:
    1. Use the print() function to calculate and output the mean for the Grams of Sugar Column of the Frozen Desserts dataset.

    2. Use the print() function to output the median for the Grams of Sugar Column of the Frozen Desserts dataset.

    3. Use the print() function to output the better measure of center for the Grams of Sugar Column of the Frozen Desserts dataset, mean or median? Be sure to include a sentence supporting your answer

    #Type your answer here

    Part 4: Programming Keywords and Practice

    • Here are some keyword references from Lecture or Online resources:

    • Programming Data Types

    • Integer

    • Float

    • String

    • Boolean

    • None

    • Programming Variables (Data Containers)

    • (=) Assignment Operator

    • Programming Comment

    Write and Run the code below:

    Also, Add a programming comment labeling the variables in the source code.

    x = 30

    y = 10

    print(x + y)

    print(x – y)

    print(x / y)

    print(x * y)

    print(“x + y =”, x+y)

    #Type your answer here

    More Programming and Output Practice

    Write and Run the code below:

    num = 10

    print(num)

    print(“num =” + 10)

    num = 10
    print(num)
    print("num =" + 10)
    10
    
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Cell In[22], line 3
          1 num = 10
          2 print(num)
    ----> 3 print("num =" + 10)
    
    TypeError: can only concatenate str (not "int") to str
    Please describe why an error occured

    Type your answer here

    Write and Run the code below:

    num = 10

    print(num)

    print (“num = “ , num)

    print(“num =” + str(num))

    message = “I love to code “

    print(message + “ everyday “)

    #Type your answer here
    Please describe the process for formatting output and data type alignment.

    Type your answer here


    More Programming Keywords

    Here are more keywords that are referenced in the Lecture:

    • Built in Functions

    • int()

    • float()

    • str()

    • sum()

    • sorted()

    • help()

    The help() built in function is for interactive use.

    Place the name of built-in function or Python keyword inside of the paranthesis.

    • Write and Run the code below:

    help(print)
    help(sum)
    help(int)
    help(float)
    help(str)

    help(print)
    help(int)
    help(float)
    help(str)
    help(sum)
    help(sorted)
    Help on built-in function print in module builtins:
    
    print(...)
        print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
        
        Prints the values to a stream, or to sys.stdout by default.
        Optional keyword arguments:
        file:  a file-like object (stream); defaults to the current sys.stdout.
        sep:   string inserted between values, default a space.
        end:   string appended after the last value, default a newline.
        flush: whether to forcibly flush the stream.
    
    Help on built-in function sum in module builtins:
    
    sum(iterable, /, start=0)
        Return the sum of a 'start' value (default: 0) plus an iterable of numbers
        
        When the iterable is empty, return the start value.
        This function is intended specifically for use with numeric values and may
        reject non-numeric types.
    
    Help on class int in module builtins:
    
    class int(object)
     |  int([x]) -> integer
     |  int(x, base=10) -> integer
     |  
     |  Convert a number or string to an integer, or return 0 if no arguments
     |  are given.  If x is a number, return x.__int__().  For floating point
     |  numbers, this truncates towards zero.
     |  
     |  If x is not a number or if base is given, then x must be a string,
     |  bytes, or bytearray instance representing an integer literal in the
     |  given base.  The literal can be preceded by '+' or '-' and be surrounded
     |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
     |  Base 0 means to interpret the base from the string as an integer literal.
     |  >>> int('0b100', base=0)
     |  4
     |  
     |  Built-in subclasses:
     |      bool
     |  
     |  Methods defined here:
     |  
     |  __abs__(self, /)
     |      abs(self)
     |  
     |  __add__(self, value, /)
     |      Return self+value.
     |  
     |  __and__(self, value, /)
     |      Return self&value.
     |  
     |  __bool__(self, /)
     |      True if self else False
     |  
     |  __ceil__(...)
     |      Ceiling of an Integral returns itself.
     |  
     |  __divmod__(self, value, /)
     |      Return divmod(self, value).
     |  
     |  __eq__(self, value, /)
     |      Return self==value.
     |  
     |  __float__(self, /)
     |      float(self)
     |  
     |  __floor__(...)
     |      Flooring an Integral returns itself.
     |  
     |  __floordiv__(self, value, /)
     |      Return self//value.
     |  
     |  __format__(self, format_spec, /)
     |      Default object formatter.
     |  
     |  __ge__(self, value, /)
     |      Return self>=value.
     |  
     |  __getattribute__(self, name, /)
     |      Return getattr(self, name).
     |  
     |  __getnewargs__(self, /)
     |  
     |  __gt__(self, value, /)
     |      Return self>value.
     |  
     |  __hash__(self, /)
     |      Return hash(self).
     |  
     |  __index__(self, /)
     |      Return self converted to an integer, if self is suitable for use as an index into a list.
     |  
     |  __int__(self, /)
     |      int(self)
     |  
     |  __invert__(self, /)
     |      ~self
     |  
     |  __le__(self, value, /)
     |      Return self<=value.
     |  
     |  __lshift__(self, value, /)
     |      Return self<<value.
     |  
     |  __lt__(self, value, /)
     |      Return self<value.
     |  
     |  __mod__(self, value, /)
     |      Return self%value.
     |  
     |  __mul__(self, value, /)
     |      Return self*value.
     |  
     |  __ne__(self, value, /)
     |      Return self!=value.
     |  
     |  __neg__(self, /)
     |      -self
     |  
     |  __or__(self, value, /)
     |      Return self|value.
     |  
     |  __pos__(self, /)
     |      +self
     |  
     |  __pow__(self, value, mod=None, /)
     |      Return pow(self, value, mod).
     |  
     |  __radd__(self, value, /)
     |      Return value+self.
     |  
     |  __rand__(self, value, /)
     |      Return value&self.
     |  
     |  __rdivmod__(self, value, /)
     |      Return divmod(value, self).
     |  
     |  __repr__(self, /)
     |      Return repr(self).
     |  
     |  __rfloordiv__(self, value, /)
     |      Return value//self.
     |  
     |  __rlshift__(self, value, /)
     |      Return value<<self.
     |  
     |  __rmod__(self, value, /)
     |      Return value%self.
     |  
     |  __rmul__(self, value, /)
     |      Return value*self.
     |  
     |  __ror__(self, value, /)
     |      Return value|self.
     |  
     |  __round__(...)
     |      Rounding an Integral returns itself.
     |      
     |      Rounding with an ndigits argument also returns an integer.
     |  
     |  __rpow__(self, value, mod=None, /)
     |      Return pow(value, self, mod).
     |  
     |  __rrshift__(self, value, /)
     |      Return value>>self.
     |  
     |  __rshift__(self, value, /)
     |      Return self>>value.
     |  
     |  __rsub__(self, value, /)
     |      Return value-self.
     |  
     |  __rtruediv__(self, value, /)
     |      Return value/self.
     |  
     |  __rxor__(self, value, /)
     |      Return value^self.
     |  
     |  __sizeof__(self, /)
     |      Returns size in memory, in bytes.
     |  
     |  __sub__(self, value, /)
     |      Return self-value.
     |  
     |  __truediv__(self, value, /)
     |      Return self/value.
     |  
     |  __trunc__(...)
     |      Truncating an Integral returns itself.
     |  
     |  __xor__(self, value, /)
     |      Return self^value.
     |  
     |  as_integer_ratio(self, /)
     |      Return integer ratio.
     |      
     |      Return a pair of integers, whose ratio is exactly equal to the original int
     |      and with a positive denominator.
     |      
     |      >>> (10).as_integer_ratio()
     |      (10, 1)
     |      >>> (-10).as_integer_ratio()
     |      (-10, 1)
     |      >>> (0).as_integer_ratio()
     |      (0, 1)
     |  
     |  bit_count(self, /)
     |      Number of ones in the binary representation of the absolute value of self.
     |      
     |      Also known as the population count.
     |      
     |      >>> bin(13)
     |      '0b1101'
     |      >>> (13).bit_count()
     |      3
     |  
     |  bit_length(self, /)
     |      Number of bits necessary to represent self in binary.
     |      
     |      >>> bin(37)
     |      '0b100101'
     |      >>> (37).bit_length()
     |      6
     |  
     |  conjugate(...)
     |      Returns self, the complex conjugate of any int.
     |  
     |  to_bytes(self, /, length, byteorder, *, signed=False)
     |      Return an array of bytes representing an integer.
     |      
     |      length
     |        Length of bytes object to use.  An OverflowError is raised if the
     |        integer is not representable with the given number of bytes.
     |      byteorder
     |        The byte order used to represent the integer.  If byteorder is 'big',
     |        the most significant byte is at the beginning of the byte array.  If
     |        byteorder is 'little', the most significant byte is at the end of the
     |        byte array.  To request the native byte order of the host system, use
     |        `sys.byteorder' as the byte order value.
     |      signed
     |        Determines whether two's complement is used to represent the integer.
     |        If signed is False and a negative integer is given, an OverflowError
     |        is raised.
     |  
     |  ----------------------------------------------------------------------
     |  Class methods defined here:
     |  
     |  from_bytes(bytes, byteorder, *, signed=False) from builtins.type
     |      Return the integer represented by the given array of bytes.
     |      
     |      bytes
     |        Holds the array of bytes to convert.  The argument must either
     |        support the buffer protocol or be an iterable object producing bytes.
     |        Bytes and bytearray are examples of built-in objects that support the
     |        buffer protocol.
     |      byteorder
     |        The byte order used to represent the integer.  If byteorder is 'big',
     |        the most significant byte is at the beginning of the byte array.  If
     |        byteorder is 'little', the most significant byte is at the end of the
     |        byte array.  To request the native byte order of the host system, use
     |        `sys.byteorder' as the byte order value.
     |      signed
     |        Indicates whether two's complement is used to represent the integer.
     |  
     |  ----------------------------------------------------------------------
     |  Static methods defined here:
     |  
     |  __new__(*args, **kwargs) from builtins.type
     |      Create and return a new object.  See help(type) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  denominator
     |      the denominator of a rational number in lowest terms
     |  
     |  imag
     |      the imaginary part of a complex number
     |  
     |  numerator
     |      the numerator of a rational number in lowest terms
     |  
     |  real
     |      the real part of a complex number
    
    
    Write and Run the code below:

    help(sum)
    help(sorted)

    #Type your answer here
    Please describe the output from help

    Type your answer here

    The values produced by built-in functions can be assigned to variables.

    Write and Run the code below:

    x = int(4.0)

    print(x)

    x = float(4)

    print(x)

    x = str(4)

    print(x)

    #Type your answer here
    Describe the outputs produced by the built-in functions above.

    Type your answer here


    Additional Programming Keywords

    Here are additional keywords that are referenced in the Lecture:

    • Objects

    • Data Structures

    • Python List

    Write and Run the code below:

    Also, Add a programming comment labeling the variables in the source code.

    sample = [25,34, 100, 900, 200, 50]
    print(sample)

    #Type your answer here

    Sum a list

    A list of numbers can be added together using the built-in sum() function

    Example:

    x = sum([2,4,6,8])

    print(x)

    would output 20.

    Write and run the code below by using the sum() function

    sample = [25, 34, 100, 900, 200, 50]

    print(sample)

    #Type your answer here

    Mean Average Review

      • A mean average can be caluclated in many ways using Python. Update code to utilize sum + list to sum values from the previous problem

    Example:

    x = sum([2, 4, 6, 8])

    mean_avg = x / 4

    print(mean_avg)

    Write and Run the code below: (Output the Mean of the list below using the sum() function)

    sample = [25, 34, 100, 900, 200, 50]

    print(sample)

    #Type your answer here
    Write and Run the code below: (Output the Mean of the list below using the sum() function)

    sample_2 = [-166.0, 1000.67, 5000.23, 98753.5, -1150.98, 230.2, 1.5]

    print(sample_2)

    #type your answer here

    Median Average Calculation Section

    A median is the middle value in an ordered or sorted list.

    Use the sorted() built-in function to sort a list

    Example:

    x = sorted([100, 2, 6, 10, 4])

    print (x)

    The output will be the same list but in numerical order.

    A Data Set can have:

    1. Odd Number of entries: Median is the middle data entry

    2. Even Number of entries: Median is the mean of the two middle data entries

    sample = [25,34, 100, 900, 200, 50]
    print(sample)

    Now we will practice finding the median for a list with an odd number of entries.

    For the given list, output the sorted list using the sorted() built-in function AND the median value below

    sample = [25, 34, 1000, 200, 50]

    #Type your answer here
    For the given list, output the sorted list using the sorted() built-in function AND the median value below

    sample_2 = [-166.0, 1000.67, 5000.23, 98753.5, -1150.98, 230.2, 1.5]

    print(sample_2)

    #Type your answer here