Welcome to DS10!
The course is officially listed as CS10 and STAT10, but we call it DS10 or Data Science 10. The course will count towards the Harvard College Quantitative Reasoning with Data requirement.
What is this class about?
This course is about combining data, statistical methods, and computation to gain insights and make useful inferences and predictions. This course will take a holistic approach to helping you understand the key elements of data science, from data collection and exploratory data analysis to modeling, evaluation, communication of results, and telling a data story. You will be discussing case studies, developing Python skills for data science, and working on team projects in design sprints to provide you with hands-on experience with the data science process. Throughout the course we will emphasize critical analytical thinking skills and data acumen.
By the end of the course, you will be able to use data and reproducible data science methods to answer questions and guide decision-making. If you want to understand how modern data science works, this is the course for you!
Learning Outcomes
After completion of this course you will be able to:
- Articulate core objectives and quantitative goals to be addressed using data
- Appreciate the complexity of the data science processes
- Understand and apply simple but powerful data science tools
- Use statistics and visualization to effectively explore data
- Identify and appreciate the strengths and limitations of data science methods
- Communicate the outcome of data analysis with effective data stories
- Apply basic programming skills for data science in Python
- Judge and incorporate ethical considerations in the data science process
- Work efficiently in teams on data science case studies and a final project
Limited Enrollment
Because this is the first year we are teaching DS10, and to provide the best possible learning experience, the course is limited to 100 students. If you are interested in joining this course please fill out this survey by Wednesday, January 29, 11:59 pm EST. We will notify you on Friday, January 31. Because of the transdisciplinary nature of data science, we will use this survey to assemble a diverse class.
Prerequisites
This course has NO prerequisites. However, please be aware that learning data science and a new programming language like Python is a time consuming process!
Textbooks
We will be using one required textbook for this class to cover data science methodologies and an optional reference book for data science programming in Python.
The Art of Statistics: How to Learn from Data, by David Spiegelhalter (Required) This book gives a good overview of many of the issues that come up in data science. David Spiegelhalter draws on real world examples to introduce concepts from statistics and data science. We will use this book for (mandatory) weekly readings.
Python Data Science Handbook: Essential Tools for Working with Data, by Jake VanderPlas (Optional) We are providing Python notebooks that will cover all the programming material that you need for this course. However, if you are interested in delving deeper into Python as a data science programming language, we recommend this book.
You can purchase both books from the Coop bookstore or any other online retailer.
Course Components
DS10 has four main threads:
- Discussions of real-life case studies that show the data science process at work.
- Learning fundamental methods and tools of data science.
- Weekly labs to teach you Python programming skills for data science.
- Practicing the data science process by conducting projects from beginning to end.
Lectures
The class meets twice a week for lectures and joint class activities. Attending lectures is a crucial component of learning the material presented in this course. Please arrive on time, as we will start promptly with a short graded quiz about the required readings. At the end of each lecture we will ask you to fill out and submit a one-minute reflection to collect feedback. Filling in the one-minute reflection will count towards your participation grade.
Labs
You are required to attend weekly programming labs during regular class meeting times on Fridays. Labs are interactive tutorials using notebooks that give you an introduction to programming data science in Python. There will be multiple TFs at hand to help you complete the Python notebooks. You will work in pairs, and you will hand in one lab solution per pair. Completed lab notebooks have to be submitted, and we will release lab solutions after the due date.
Readings & Quizzes
Each lecture includes required readings to ensure that you are prepared for the activities in class. You are expected to complete the assigned readings before class. Short online quizzes given at the beginning of each class will test your knowledge of the readings. Reading quizzes are part of your final grade.
Homework
Weekly homework assignments are going to provide an opportunity to improve your data science and programming skills. See the homework as an opportunity to learn, and not to “earn points”. The homework will be graded holistically to reflect this objective. Each student needs to hand in their own homework solution and adhere to our collaboration policy (see below).
Design Sprint
A big part of the lectures will be a design sprint. You will work in groups of 2-3 students to ask questions, explore data, and communicate your findings for a given topic and pre-defined dataset over the course of several weeks. The design sprint is good practice for the final project for which you will be using the same process. Your team will submit graded design sprint milestones as part of your homework.
Final Project
A major part of the course is a final data science project. You will work in small groups of 2-3 students to complete a data science project that answers questions you have about some topic of your own choosing. You will follow the same design sprint process that we introduced in class. The final result will be a Python notebook and a data story (article) with a 2-minute video that communicate your project findings to a general audience. A small number of projects will win a Best Project prize.
Peer Assessment
Assessment of your participation in the projects for a large class is challenging. We will use peer evaluation to allow you to assess the members of your project teams as well as yourself. Peer assessment is an element of the final project (and thus part of your project grade).
Course Policies
Grading
This course can be taken for a letter grade only, there is no pass/fail option. The course grade comprises:
- Participation (20%)
- Labs (10%)
- Homework Assignments (20%)
- Design Sprint (15%)
- Group Project (35%) (including peer assessment)
Participation includes reading quizzes, lecture attendance, participating in in-class activities, and submitting one-minute reflections.
Labs includes lab attendance, actively working in pairs, and submitting lab notebooks.
We will drop your lowest reading quiz score and your lowest homework grade.
Design sprint and group project receive team grades. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member based on peer assessments (see above).
Regrading Policy
Any concerns about grading errors must be noted in writing. Requests for regrading have to be submitted within 2 days of when you received your graded assignment. Use a dedicated private channel on Piazza and explain in detail why you are requesting a regrade. We will reply within a week. Use the same thread to communicate about this request. Please note that regrading may also result in you losing points.
Due Dates & Late Policy
No submissions (labs, homework, quizzes, project milestones, etc.) will be accepted for credit after the deadline. Homeworks will be posted on Fridays and will be due the following Friday. Lab notebooks are due on Mondays. All submissions are due at 11:59 pm EST on the due date.
If you have a verifiable medical condition that interferes with your coursework please let us know as soon as possible. You will need to provide a written note from a medical professional confirming your inability to participate in course work.
Collaboration Policy
We expect you to adhere to the {Harvard Honor Code}[URL] at all times. Failure to adhere to the honor code and our policies may result in serious penalties, up to and including automatic failure in the course and reference to the ad board.
You may discuss your homework and labs with other students in the class, but you are expected to be intellectually honest and give credit where credit is due. In particular:
- You have to write your homework solutions entirely on your own.
- You cannot share written materials or code for homework with anyone else.
- You may not submit the same or similar work to this course that you have submitted or will submit to another course.
- You may not provide or make available solutions to individuals who take or may take this course in the future.
- You may use third-party libraries and example code so long as you give proper attribution. Do not remove any original copyright notices and headers.
Devices in Class
We will use smartphones and laptops throughout the course to facilitate activities and project work in-class. However, research and student feedback clearly shows that using devices on non-class related activities not only harms your own learning, but other students’ learning as well. Therefore, we only allow device usage during activities that require devices. At all other times, you should not be using your device.
Accessibility
Any student receiving accommodations through the Accessible Education Office should email or present their AEO letter to one of the instructors as soon as possible. Failure to do so may prevent us from making appropriate arrangements.
Course Resources
Online Materials
All class handouts, slides, homework, labs, and required readings will be posted on Canvas.
Discussion Forum
We use Piazza as our discussion forum and for all announcements, so it is important that you are signed up as soon as possible. Piazza should always be your first resource for seeking answers to your questions. You can also post privately so that only the staff sees your message.
Office Hours
The TFs will provide office hours for questions that you might have throughout the semester. As office hours are usually heavily attended, please consult Piazza as a first option to get help.
Credits
This course has been developed by the instructors together with Salma Abu Ayash, Salma Abdel Magid, Allen Downey, Robert Haussman, and Aditya Ranganathan. Hanspeter Pfister has received generous support to develop this course through a 2019 SEAS Learning Incubator fellowship. All lab notebooks were developed by Prof. Allen Downey, who spent fall 2019 as a visitor at Harvard and heavily influenced the design of the course for the better. The course design greatly benefited from the students who took the 2019 Harvard STAT250 course. We have drawn on materials and examples found online and tried our best to give credit by citing the original sources. Please contact us if you find materials where credit is missing or that you would rather have removed.