# ASSIGNMENT CONFIG
init_cell: true
export_cell: true
files:
- dotplot.png
- dotplot_bomb.png
- desserts.csv
- error.jpg
- gradescope.png
- d8error.py
- errorConfig.json
export_cell:
pdf: false
force_save: false
solutions_pdf: true
template_pdf: false
generate:
points_possible: 100
show_stdout: true
zips: false Cell In[1], line 4
files:
^
SyntaxError: invalid syntax
# Don't change this cell; just run it.
import numpy as np
from datascience import *
#import d8errorLab 2: Expressions¶
Welcome to Lab 2 for Data Science Focused for Social Sciences! You can’t learn technical subjects without hands-on practice, so labs are an important part of the course.
Collaborating on labs is more than okay -- it’s encouraged! You should rarely remain stuck for more than a few minutes on questions in labs, so ask an instructor or classmate for help. (Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it.) Please don’t just share answers, though.
Today’s lab¶
In today’s lab, you’ll learn how to:
Navigate Jupyter notebooks (like this one);
Write and evaluate some basic expressions in Python, the computer language of the course; and
Learn some introductory data analysis.
This lab covers parts of Chapter 3 of the online textbook. You should read the examples in the book, but not right now. Instead, let’s get started!
Part 1: Jupyter Notebooks¶
This webpage is called a Jupyter Notebook. A notebook is a place to write programs and view their results, and also to write text.
Text cells¶
In a notebook, each rectangle containing text or code is called a cell.
Text cells (like this one) can be edited by double-clicking on them. They’re written in a simple format called Markdown to add formatting and section headings. You don’t need to learn Markdown, but you might want to.
After you edit a text cell, click the “run cell” button at the top that looks like ▶| or hold down shift + return to confirm any changes. (Try not to delete the instructions of the lab.)
Computer Programming Fundamentals¶
In computing, source code is text (usually plain text) that conforms to a human-readable programming language and specifies the behavior of a computer. A programmer writes code to produce a program that runs on a computer. A programming script is a relatively short and simple set of programming source code.
Programming syntax refers to the set of rules that dictate the structure and format of a programming language. It defines how commands and instructions are written in a way that the computer can understand and execute. Programming syntax defines the correct way to write code in a language so that it can be understood and executed by computers and programming syntax differs between programming languages.
The Python Programming Language is a popular programming language. It was created by Guido van Rossum, and released in 1991. Python was designed for readability, and has some similarities to the English language with influence from mathematics.
Computer program output is the information or result produced by a program. Computer program output can be displayed to the user of the computer program as text, images, audio, or video.
Coding Examples¶
To better understand coding practices, doing coding exercises is one of the best ways to learn! Here are some practice problems to help familiarize yourself with python.
print (“Hello, world!”)
#type code hereprint (1)
print(1.0)
print (1 + 2)
print(1.0 + 2.0)
print (“1” + “3”)
print (1 - 3)
print (2 * 3)
print (2.0 * 3.0)
print (2 / 3)
print (2.0 / 3.0)
print(2 ** 3)
#type code hereprint (“2 + 3 =” , 2 + 3)
#type code herePlease describe the different output result from [ print (“2 + 3 =” , 2 + 3) ]
Type your answer here
The print() function outputs the specified text placed inside of the parenthiesis as plain text.
Escape Character or Escape Sequence is a character that invokes an alternative interpretation on the following characters in a character sequence.
Escape characters are important to computer programming because keyboard characters can have multiple meaning in programming language syntax.
\t - Tab
\\ - Backslash
\` - Single Quote
\" - Double Quote
\n " - New Line
print (“hello\tworld”)
print (“\”)
print ("\’”)
print (“"”)
print (“hello\nworld”)
#type code herePart 2: Data Analysis¶
Overview - Population, Sample, & Data¶
Data science involves recognizing patterns and making predictions from datasets. The analysis usually starts with a research question. Typically, the research question is something we want to know about a population. The population is the entire group we want to know something about. The population may be people, but it may be other things such as vehicles, objects or animals. For example, we may want to answer the questions:
“What percent of vehicles in the US are hybrid?” (Population: US vehicles).
“What is the average height of a eucalyptus tree?” (Population: eucalyptus trees)
In most cases, the population is a large group. Often, the population is so large that we cannot collect information from every individual in the population, so we select a sample from the population. We collect data from this sample. Data is information or measurements that will help us answer the research question.
The sample needs to represent the population well. For example if we are investigating the heights of eucalyptus trees, we want to be sure not to include any other type of trees in our analysis and we want to be sure to take samples from different regions.
To make sense of the data we collect from the sample, we summarize it using graphs and different numerical measures, such as percentages or averages. Datasets contain the data collected and are often organized in table format.
Data consist of individuals and variables that give us information about those individuals. An individual can be an object or a person. A variable is an attribute, such as a measurement or a label. There are two types of variables: quantitative and categorical.
Categorical variables take category or label values and place an individual into one of several groups. Each observation can be placed in only one category, and the categories are mutually exclusive Quantitative variables take numerical values and represent some kind of measurement.
Coding Examples¶
The table frozen_desserts.csvcontains data on 20 different frozen dessserts. Each row represents one such frozen dessert.
Run the next cell to load the frozen_desserts table and see its output.
# Just run this cell
frozen_desserts = Table.read_table('desserts.csv')
frozen_desserts.show()Use print() to output the individuals in the dataset?****
Use print() to output the column names of the categorical variables in the dataset?
Use print() to output the column names of the quantitative variables in the dataset?
Would you consider this Data Set a sample of frozen desserts or the entire population of frozen desserts?
#Type your answer here
Hint: Each Dessert Corresponds to a row of the table of the data set.
Expected Sample Output:
Lemon Italian Ice 10 Grams of Sugar
Vanilla Ice Cream 19 Grams of Sugar
Chocolate Frozen Yogurt 20 Grams of Sugar
Coconut Gelato 22 Grams of Sugar
Raspberry Sherbet 19 Grams of Sugar
Cookies and Cream 18 Grams of Sugar
#Type your answer here#Type your answer herePart 3: Visualizations Introduction¶
Graph - Dotplot¶
In data analysis, our goal is to describe patterns in the data and create a useful summary about a group. A table is not a useful way to view data because patterns are hard to see in a table. Thus, creating a graph of the distribution of the variable is usually the first step in data analysis.
One type of graph is called a dotplot. A dotplot gives a better summary of the distribution of grams of sugar. In a dotplot, each dot represents one individual. Let’s look at the dotplot of the frozen desserts.
Example Dotplot
Here, each dot is a frozen dessert. The numbers on the horizontal axis are the variable values. The variable in this case is sugar in grams per serving. The vertical axis gives the count of desserts.
In a dotplot we can see the variable values and how many individuals have each value. For example, 2 frozen desserts have 19 grams of sugar and 3 frozen desserts have 21 grams of sugar.
The sugar content for these frozen desserts range from 10 to 29 grams.
For this group of desserts, typical sugar content ranges from 14 years to 30 grams
More than half of the frozen desserts have over 20 grams of sugar.
It is unusual for one of frozen desserts to have more than 25 grams of sugar.
#Type your answer hereShape, Center and Spread¶
When we describe patterns in data, we use descriptions of shape, center, and spread. We also describe exceptions to the pattern. We call these exceptions outliers. Outliers are notably deviations from the trend.
Shape¶
Common descriptions of shape are:
A right-skewed distribution has a lot of data at lower variable values with smaller amounts of data at higher variable values. Data cluster on the left of the distribution with a tail of data tapering off to the right.
A left skewed distribution has a lot of data at higher variable values with smaller amounts of data at lower variable values. Data cluster on the right of the distribution with a tail of data tapering off to the left.
A Symmetric (or bell-shaped) distribution has a central peak where data is concentrated, with a tail in both directions.
A uniform distribution has the same amount of data for each value. So the distribution looks rectangular.
Center¶
When we describe a distribution of a quantitative variable, it is helpful to identify a typical value. We choose a single value of the variable to represent the entire group. This is one way to think about the center of the distribution.
Spread¶
We also want to describe how much the data varies among individuals in the group. Variability is another word for spread. We describe the spread in two ways:
Find the range of the data, by looking at the smallest value and the largest value.
Find the interval of typical values to represent common variable values for the group.
Identify the shape of the distribution of frozen desserts.
Identify the center of the distribution of frozen desserts.
Identify the range of the distribution of frozen desserts.
Identify the interval of typical values of the distribution of frozen desserts.
Identify any outliers in the distribution of frozen desserts.
#Type your answer here#Type your answer hereMeasuring Center¶
One measure of center is the average or mean. This is found by adding all the data values in the distribution and dividing by the total number of values. We usually use the mean as a measure of center when the distribution is symmetrical.
Another measure of center is the median. The median is the middle of the data when all the values are listed in order. The median divides the data into two equal-sized groups. There is as much data below the median as above it. If the dataset has an even number of values, the median is found by adding the 2 middle numbers and dividing by 2. We usually use the mean as a measure of center when the distribution is skewed.
Use the print() function to calculate and output the mean for the Grams of Sugar Column of the Frozen Desserts dataset.
Use the print() function to output the median for the Grams of Sugar Column of the Frozen Desserts dataset.
Use the print() function to output the better measure of center for the Grams of Sugar Column of the Frozen Desserts dataset, mean or median? Be sure to include a sentence supporting your answer
#Type your answer hereNow let’s assume a “Super Sugar bomb” ice cream is added to our frozen desserts dataset. This ice cream has 100 grams of sugar. Here’s the updated dotplot.
Updated Dotplot
Use the print() function to calculate and output the mean for the Grams of Sugar Column of the Frozen Desserts dataset.
Use the print() function to output the median for the Grams of Sugar Column of the Frozen Desserts dataset.
Use the print() function to output the better measure of center for the Grams of Sugar Column of the Frozen Desserts dataset, mean or median? Be sure to include a sentence supporting your answer
#Type your answer herePart 4: Programming Keywords and Practice¶
Here are some keyword references from Lecture or Online resources:
Programming Data Types
Integer
Float
String
Boolean
None
Programming Variables (Data Containers)
(=) Assignment Operator
Programming Comment
Also, Add a programming comment labeling the variables in the source code.
x = 30
y = 10
print(x + y)
print(x – y)
print(x / y)
print(x * y)
print(“x + y =”, x+y)
#Type your answer hereMore Programming and Output Practice¶
num = 10
print(num)
print(“num =” + 10)
num = 10
print(num)
print("num =" + 10)10
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[22], line 3
1 num = 10
2 print(num)
----> 3 print("num =" + 10)
TypeError: can only concatenate str (not "int") to strType your answer here
num = 10
print(num)
print (“num = “ , num)
print(“num =” + str(num))
message = “I love to code “
print(message + “ everyday “)
#Type your answer hereType your answer here
More Programming Keywords¶
Here are more keywords that are referenced in the Lecture:
Built in Functions
int()
float()
str()
sum()
sorted()
help()
The help() built in function is for interactive use.
Place the name of built-in function or Python keyword inside of the paranthesis.
Write and Run the code below:
help(print)
help(sum)
help(int)
help(float)
help(str)
help(print)
help(int)
help(float)
help(str)
help(sum)
help(sorted)Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Help on built-in function sum in module builtins:
sum(iterable, /, start=0)
Return the sum of a 'start' value (default: 0) plus an iterable of numbers
When the iterable is empty, return the start value.
This function is intended specifically for use with numeric values and may
reject non-numeric types.
Help on class int in module builtins:
class int(object)
| int([x]) -> integer
| int(x, base=10) -> integer
|
| Convert a number or string to an integer, or return 0 if no arguments
| are given. If x is a number, return x.__int__(). For floating point
| numbers, this truncates towards zero.
|
| If x is not a number or if base is given, then x must be a string,
| bytes, or bytearray instance representing an integer literal in the
| given base. The literal can be preceded by '+' or '-' and be surrounded
| by whitespace. The base defaults to 10. Valid bases are 0 and 2-36.
| Base 0 means to interpret the base from the string as an integer literal.
| >>> int('0b100', base=0)
| 4
|
| Built-in subclasses:
| bool
|
| Methods defined here:
|
| __abs__(self, /)
| abs(self)
|
| __add__(self, value, /)
| Return self+value.
|
| __and__(self, value, /)
| Return self&value.
|
| __bool__(self, /)
| True if self else False
|
| __ceil__(...)
| Ceiling of an Integral returns itself.
|
| __divmod__(self, value, /)
| Return divmod(self, value).
|
| __eq__(self, value, /)
| Return self==value.
|
| __float__(self, /)
| float(self)
|
| __floor__(...)
| Flooring an Integral returns itself.
|
| __floordiv__(self, value, /)
| Return self//value.
|
| __format__(self, format_spec, /)
| Default object formatter.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getnewargs__(self, /)
|
| __gt__(self, value, /)
| Return self>value.
|
| __hash__(self, /)
| Return hash(self).
|
| __index__(self, /)
| Return self converted to an integer, if self is suitable for use as an index into a list.
|
| __int__(self, /)
| int(self)
|
| __invert__(self, /)
| ~self
|
| __le__(self, value, /)
| Return self<=value.
|
| __lshift__(self, value, /)
| Return self<<value.
|
| __lt__(self, value, /)
| Return self<value.
|
| __mod__(self, value, /)
| Return self%value.
|
| __mul__(self, value, /)
| Return self*value.
|
| __ne__(self, value, /)
| Return self!=value.
|
| __neg__(self, /)
| -self
|
| __or__(self, value, /)
| Return self|value.
|
| __pos__(self, /)
| +self
|
| __pow__(self, value, mod=None, /)
| Return pow(self, value, mod).
|
| __radd__(self, value, /)
| Return value+self.
|
| __rand__(self, value, /)
| Return value&self.
|
| __rdivmod__(self, value, /)
| Return divmod(value, self).
|
| __repr__(self, /)
| Return repr(self).
|
| __rfloordiv__(self, value, /)
| Return value//self.
|
| __rlshift__(self, value, /)
| Return value<<self.
|
| __rmod__(self, value, /)
| Return value%self.
|
| __rmul__(self, value, /)
| Return value*self.
|
| __ror__(self, value, /)
| Return value|self.
|
| __round__(...)
| Rounding an Integral returns itself.
|
| Rounding with an ndigits argument also returns an integer.
|
| __rpow__(self, value, mod=None, /)
| Return pow(value, self, mod).
|
| __rrshift__(self, value, /)
| Return value>>self.
|
| __rshift__(self, value, /)
| Return self>>value.
|
| __rsub__(self, value, /)
| Return value-self.
|
| __rtruediv__(self, value, /)
| Return value/self.
|
| __rxor__(self, value, /)
| Return value^self.
|
| __sizeof__(self, /)
| Returns size in memory, in bytes.
|
| __sub__(self, value, /)
| Return self-value.
|
| __truediv__(self, value, /)
| Return self/value.
|
| __trunc__(...)
| Truncating an Integral returns itself.
|
| __xor__(self, value, /)
| Return self^value.
|
| as_integer_ratio(self, /)
| Return integer ratio.
|
| Return a pair of integers, whose ratio is exactly equal to the original int
| and with a positive denominator.
|
| >>> (10).as_integer_ratio()
| (10, 1)
| >>> (-10).as_integer_ratio()
| (-10, 1)
| >>> (0).as_integer_ratio()
| (0, 1)
|
| bit_count(self, /)
| Number of ones in the binary representation of the absolute value of self.
|
| Also known as the population count.
|
| >>> bin(13)
| '0b1101'
| >>> (13).bit_count()
| 3
|
| bit_length(self, /)
| Number of bits necessary to represent self in binary.
|
| >>> bin(37)
| '0b100101'
| >>> (37).bit_length()
| 6
|
| conjugate(...)
| Returns self, the complex conjugate of any int.
|
| to_bytes(self, /, length, byteorder, *, signed=False)
| Return an array of bytes representing an integer.
|
| length
| Length of bytes object to use. An OverflowError is raised if the
| integer is not representable with the given number of bytes.
| byteorder
| The byte order used to represent the integer. If byteorder is 'big',
| the most significant byte is at the beginning of the byte array. If
| byteorder is 'little', the most significant byte is at the end of the
| byte array. To request the native byte order of the host system, use
| `sys.byteorder' as the byte order value.
| signed
| Determines whether two's complement is used to represent the integer.
| If signed is False and a negative integer is given, an OverflowError
| is raised.
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| from_bytes(bytes, byteorder, *, signed=False) from builtins.type
| Return the integer represented by the given array of bytes.
|
| bytes
| Holds the array of bytes to convert. The argument must either
| support the buffer protocol or be an iterable object producing bytes.
| Bytes and bytearray are examples of built-in objects that support the
| buffer protocol.
| byteorder
| The byte order used to represent the integer. If byteorder is 'big',
| the most significant byte is at the beginning of the byte array. If
| byteorder is 'little', the most significant byte is at the end of the
| byte array. To request the native byte order of the host system, use
| `sys.byteorder' as the byte order value.
| signed
| Indicates whether two's complement is used to represent the integer.
|
| ----------------------------------------------------------------------
| Static methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| denominator
| the denominator of a rational number in lowest terms
|
| imag
| the imaginary part of a complex number
|
| numerator
| the numerator of a rational number in lowest terms
|
| real
| the real part of a complex number
help(sum)
help(sorted)
#Type your answer hereType your answer here
The values produced by built-in functions can be assigned to variables.
x = int(4.0)
print(x)
x = float(4)
print(x)
x = str(4)
print(x)
#Type your answer hereType your answer here
Additional Programming Keywords¶
Here are additional keywords that are referenced in the Lecture:
Objects
Data Structures
Python List
Also, Add a programming comment labeling the variables in the source code.
sample = [25,34, 100, 900, 200, 50]
print(sample)
#Type your answer hereSum a list¶
A list of numbers can be added together using the built-in sum() function
Example:
x = sum([2,4,6,8])
print(x)
would output 20.
sample = [25, 34, 100, 900, 200, 50]
print(sample)
#Type your answer hereMean Average Review¶
A mean average can be caluclated in many ways using Python. Update code to utilize sum + list to sum values from the previous problem
Example:
x = sum([2, 4, 6, 8])
mean_avg = x / 4
print(mean_avg)
sample = [25, 34, 100, 900, 200, 50]
print(sample)
#Type your answer heresample_2 = [-166.0, 1000.67, 5000.23, 98753.5, -1150.98, 230.2, 1.5]
print(sample_2)
#type your answer hereMedian Average Calculation Section¶
A median is the middle value in an ordered or sorted list.
Use the sorted() built-in function to sort a list
Example:
x = sorted([100, 2, 6, 10, 4])
print (x)
The output will be the same list but in numerical order.
A Data Set can have:
Odd Number of entries: Median is the middle data entry
Even Number of entries: Median is the mean of the two middle data entries
sample = [25,34, 100, 900, 200, 50]
print(sample)
Now we will practice finding the median for a list with an odd number of entries.
sample = [25, 34, 1000, 200, 50]
#Type your answer heresample_2 = [-166.0, 1000.67, 5000.23, 98753.5, -1150.98, 230.2, 1.5]
print(sample_2)
#Type your answer here