Showing posts with label Big Idea - Data. Show all posts
Showing posts with label Big Idea - Data. Show all posts

Wednesday, May 15, 2013

Data Visualization Videos

Funny how sometimes you come across several unrelated sources and find a theme. Today it is data visualizations.





Monday, April 15, 2013

Data Unit: Day 8

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

Done!

Cartoon from the National Archive 1919  - Redrawing Europe's Map


Today the students finished the data portfolios.

I had been thinking this had to be a paper - it is funny in teaching, we tend to fall back to that, worksheets and papers.

When I came in today and reread the requirements I found:

A collaboratively developed artifact that communicates a detailed description of your  group’s investigation, the questions, and your collective findings.  You may use any form  of digital artifact (e.g., a report, video, presentation, visualization, or combinations of  these) that allows you to best communicate your investigation and findings.  You and  your partner will each submit the same artifact.  
So much better. Some groups did Prezis, some did Power Points. Some went ahead and wrote a paper.

They finished these with enough time to their individual reflections.

An individually written document that addresses the investigation. Each group member  must write her/his own individual document. In writing the individual document you  must adhere to the Task description above and the Requirements description below in  supplying details of your investigation.  

So with the proper setup we finished it in three class days.

Friday, April 12, 2013

Data Unit: Day Seven

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

So far the Data Portfolio has gone really well. Students are excited about their data sets and questions. Today we moved on to using these data sets to explore the questions they posed last class.


Number Crunching on the Eniac



Today I also realized I have been worrying about this too much. I was coming at this as a math major that hated statistics and absorbed as little of it as possible. I was worried about all the high end statistical analysis they need to do. Really I was making it too complicated. 

By giving them some positive experiences with data early in the unit, and steering the process of asking questions and selecting a data set they were in a position to handle the computation part of it with minimal trouble.

I have also realized there are some institutional barriers to getting this portfolio done. Some things teachers really need to think about ahead of time:


  • Computer Equipment and software. Due to budget cuts we still use Office 2003. Students have had difficulty with some of the larger data sets and Excel.
  • Internet filtering: Many data sets were blocked by our school filters. Also sites like Google Docs  are blocked, which is making collaboration between students difficult.
  • Limited storage space on the school network makes it hard to store the datasets. 
  • Many of my students do not have Internet access at home. In addition they live an hour or more away from school and staying after is not possible for many families. This is limiting their ability to collaborate outside of school. I am having to provide a lot of class time for them to write.


Bumpass Virginia - a real place

(You can see all the CS Principles documents here)

Here is the listing of what we did:

  • Work with Partner: today they started looking at their data and answering their questions. Today was a half day due to Kindergarten registration so we only had 45 minutes. Several groups are having to come back to the lab to finish up.
  • The prompt for the Data Portfolio rubric is:
apply computational tools and techniques to answer your questions, e.g., by finding patterns in the data, by transforming or translating the data, or by finding connections between the data and other sources of knowledge

Thursday, April 11, 2013

Data Unit: Day Six

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

Today we finally started working on the Data Portfolio. Today the focus is selecting a data set and creating the questions. My plan is next class they start really working with the data.


From the national Archives - the intent was a photo of a mine,  but found this instead  Toilet at head of stairs leading from basement in vacant house of company housing project. Industrial Collieries Corporation, Barrackville #41 Mine, Barrackville, Marion County, West Virginia., 06/13/1946 - hilarious!


They were really engaged in this today. I let them choose three people they would like to work with, and three they would not. I then assigned partners based on these requests. It was pretty funny, every student either selected them self as someone they want to work with or someone they did not want to work with. We all had a good chuckle at this.

The list of what we did is below. They really chose some interesting data sets. Many chose to start with Data.gov and search from there. Some of the topics they have chosen:
  • Israel Palestine
  • Traffic Fatalities as relate to Speeding
  • Crime Statistics
  • Government Spending in the US
  • Average Income per State
  • Violent Crime Rates
  • Mining accidents

The mining accidents was a bit of a head-scratcher, but it has turned out to be pretty interesting.



Here is the listing of what we did:

  • Fast Start:
    1. Do the Partner Selection survey 
    2. From the work we did before break describe the kinds of question we can answer with data? What kinds of questions can not be answered?
  • Journal Response: Building your Questions - Today we are building your questions about data. With your partner brainstorm 3 areas you would like to investigate with data. The link we used last class to some data sets is below.
  • Meet with teacher to discuss data set selection - we discussed:
    • What is the source?
    • Is it at least 5000 pieces of data?
    • Do you think this data set can answer your questions? How?
  • Journal Response: Develop a set of 3&5 questions that will be the focus of the investigation and submit them here. 
  • Online: Submit your dataset link
  • Start writing: They started on the collaborative part of the paper.
  • The first part of your paper will include: (this is from the data portfolio rubric)
    • Overview of your investigation: a description of the intent of the investigation and how it will be used to gain insight and knowledge;
    • The set of 3 to 5 questions that you will answer.
    • Explanation and justification of how the data and other sources used in your investigation (if any) are appropriate for exploring and answering the questions.
    • Information about the data set(s): a description of each data set; the URL of the data set; the date on which you accessed the data; and where possible a reference to the data set from  a written work (e.g., an article, book, or blog post).
    • Type these out in word and submit the file here. ALL partners must have a  copy of this file and submit it before leaving today.

Saturday, April 6, 2013

Crunching Public Data - new Online Course

Code Lesson: Crunching Public Data

This course is available for pre-enrollment at Code Lesson.

The course Description:

Go beyond the spreadsheet! Crunching Public Data is an introductory programming course intended specifically for public- and private-sector knowledge workers who want to make sense of data stored in tabular format, then analyze and visualize it in meaningful ways. In the course we make use of actual data from the US government site data.gov to give students real-world experience with data analysis, particularly with data sets in the megabyte to gigabyte range that are larger than spreadsheets can typically handle.

This has a lot of promise for the CS Principles Data Portfolio - can't wait to see what they cover. The plan is to use Python to do the processing.

Friday, March 29, 2013

Data Unit: Day Five

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)


Today is the end of the marking period, so we only had class for 45 minutes.

We started by posting some of the questions they developed last class about the crime data set we have been looking at. I use the LinoIt website a lot for these kinds of activities. It lets them post sticky notes on a website. Some of these questions are shown above. The advantage is you can sort the sticky notes into piles.

Next I had them read through the questions and see if they could come up with two broad categories for the questions. We had to talk it through a bit, but they decided some of the questions could be answered with the data they were given, and some needed more information.

In general this is the difference between being the original researcher and being the end user of data. For the portfolio we are the end user.

For that last part of class we watched Rick Smolan's talk at the Ted Youth conference earlier this year. Skip ahead to minute 11.

When we come back from break they will be starting with their partner on the data portfolio.


Thursday, March 28, 2013

Data Unit: Day Four

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

We only had half a class today. As a part of the CS Principles Pilot we have to do pre and post surveys about the course and that ate up a lot of teaching time.




Their activity for the rest of class was to start forming questions around data.

Data is Big Idea III: Data and information facilitate the creation of knowledge. 


A. People use computer programs to process information to gain insight andknowledge.
  1. Computers can be used to find patterns in, and test hypotheses about, digitally represented information.
  2. Insight and knowledge can result from translating and transforming digitally represented information.
B. Computing facilitates exploration and the discovery of connections ininformation.
  1. Big Data (use of large datasets) provides new opportunities and new challenges for extracting information and knowledge.
  2. Scalability, of systems and analytical approaches, is an important consideration when datasets are large.
  3. Metadata can increase the effective use of data or a dataset by providing additional information about various aspects of that data.
C. Computational manipulation of information requires consideration ofrepresentation, storage, security, and transmission.
  1. There are trade-offs involved in the many possible ways to represent digital and non-digital information as digital data.
  2. Data is stored in many formats depending on its characteristics—such as size and intended use—so that it can be manipulated computationally.

Again, be careful. Only one of the Key Concepts mentions Big Data. There is a lot of talk about the Big Data aspects of CS Principles. Yes, the students need to work with big data as a part of the portfolio, but not everything in data needs to be BIG.


We are on Srpring Break next week, so I am trying to get them ready so when we get back they can finish the Data Portfolio item.

Today I had them work with a partner with the crime data set from last class and come up with five questions they might answer with the data. 

Talk about unexpected results. I had assumed it was obvious what kinds of questions we could answer with a data set. How many times have you done this as a teacher? Made an assumption about what they knew and then were totally surprised by the results.

This is one of the major benefits of regular journaling as a part of the course. It helps you spot these underlying assumptions early and correct them.

Having them write out questions very quickly showed who was on the tight track and who needed some more direction. In general I got two types of questions.

First type of question: 
  • Is the funding for the local police departments increasing or decreasing within the small cities?
  • What types of violent crimes were committed?

Second type of question: 
  • Does higher amounts of college education result in lower crime rate? 
  • Does an increase in annual police funding relate to educational level?


See the difference between the two?

Many of the students were asking questions for more data, or information not given in the spreadsheet.

This shows that we need to talk a little bit before we start this part of the unit about what kinds of questions can and cannot be answered with data sets. The first set of questions are really more for reporters, not statisticians. Our focus here is really looking for connections and correlations between the data already collected.

And think about what would have happened had we started right into the data portfolio. That process asks them to develop questions then find an appropriate data set to answer those questions. Clearly I need to include some instruction on what types of questions are appropriate.

My original plan was to then experiment with the same data set in a database rather than a spreadsheet and compare results. I am abandoning that idea in the interest of time. At  this point they need to start working through their portfolios. Now that I know we need to go back and talk about what kinds of questions they can ask that needs to be our focus.

Next class is a half day so I will only see them for 45 minutes. we will spend the time really looking at the questions they will develop as they start on their portfolios after break.



Tuesday, March 26, 2013

Data Unit: Day Three

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)



Today we started working with some real data sets. I want them to have some experience with a smaller data set before we start the portfolio assessment, which has them work with a data set of at least 5000 pieces of data. We're using one available from Houghton Mifflin about Crime stats. The site has several good smaller data sets that are good for practice.

Today I let them download it as a spreadsheet. I am having them start with Excel, then we'll try some queries on the same data using Access next class.

At this point I am not imposing questions for them to answer, I want them to try to form the questions and explore on their own. This is a bit experimental, and we may need to loop back and have them do specific queries. It will depend on how far they are able to get on their own. I think this aspect will vary year to year depending on the strengths and motivational level of your students.


Here is the listing of what we did:


Thursday, March 21, 2013

Data Unit: Day Two

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

Today we are diving into data.

Coloring

The idea is to get a bit more specific and build some vocabulary around data. We also started looking at our first data sets and talking about what they could be used for.

I started by asking them what they thought the difference is between data and information. They had some great responses, including:

Data is the hard code in a computer, the numbers, the statistics.
Information is the actual wording of what we as people know. 
Data is something that has no format. Information has been converted into audio, text, or video. Data is more scientific. 
Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized.When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. 
After they copied the notes (topics listed below) They did a coloring activity to show the difference between analog and digitized data. They blended two colors using colored pencils then took a digital picture of the color they created. They used the Paint program to zoom in and look at the individual pixels of color the camera captured.

Next we looked at some small data sets and I had them pose questions about the data. After that they completed the Exit Ticket.



Here is the listing of what we did:

  • Fast Start:
    • Discussion Board Entry: What do you think is the difference between data and information?
  • Notes: Data 
    • These covered:
      • data  vs information
      • connected back to the binary we did in unit 1 and that data is stored in binary
      • Data compression
      • Analog vs Digital
    • NPR article on MP3 Compression
  • Activity: color! 
  • Data Sets - with a partner 
    • You will hand in one sheet of paper. Grab a tablet for your group.
    • Navigate to:Data Sets
    • With a partner pick one of the data sets from this website...brainstorm at least five questions you could answer using this data set. 
    • When done - look at the website here . Which of these are data sets? How could you gather information from these sets of data? What problems might you run into?
  • Exit Ticket Day 2 
    • Do you use data in your life (outside of school?) If so how?
    • How about information?



Wednesday, March 20, 2013

Data Unit: Day One

(This is a day by day of the Data: Lets do the Numbers Unit in CS Principles. The overview is here.)

The idea is to grab their attention and get them really thinking about what data means in their lives.


I started with an activity from an Equity Workshop the CSTA held during this year's SIGCSE conference.

OK - real quick. Are you a member of the CSTA? If you are reading this you should be. Go ahead - it's free. I'll wait.

So the activity goes like this - turn off the monitors and grab paper and a pencil. I do this as our Fast Start, so the instructions are on the board when they walk in.

Tell them you are about to show them some data and they will respond to the following:

  • What is your immediate Reaction?
  • What questions do you have?
I gave them five minutes to respond on paper. Then they have five minutes to discuss with their elbow partner.

Here is the data:
Blood drive at the local high school reveals 20% of the students were HIV positive. 
I picked this because it is something every high schooler will have an opinion about. We had a blood drive this week, so they had lots of questions.

Their questions were fascinating...they wanted to know where the school was. One student hypothesized that it might be in Africa where Aids is more common.

The wanted to know who collected the data. They pointed out only kids 16 and older can give blood so that might throw the results.

Several groups also asked "is it true?" while working with their partner, but none of them were willing to ask that out loud with the whole group.

The whole point is to get them to really react to data. This is in fact not a true story - I found it on snopes.com (Snopes Article).

Then we talked about WHY they assumed it was true, or were uncomfortable questioning the truth of the data. The two reasons they gave were because their teacher told them and it had a % sign there. We all do that some time.

We then watched the Ted Talk:  Why Google won't Protect you from Big Brother

To pull everything together they did a journal entry about what we covered. I asked them to:

  • Reflect on what we have talked about today
  • Think about:
    • Why is facebook free?
    • If data can be false (like our activity) what does this mean for companies collecting data about you?
    • Do you own your data?




Here is the listing of what we did:

    Starting on Data
    Today we are setting the stage for the Data project
  • Fast Start: This activity is from the CSTA equity workshop I attended at SIGCSE last weekend
  • Gallery Walk - Show off you best Code in the Browser piece (from work they did while I was @ SIGCSE)
  • Ted talk - Why Google won't Protect you from Big Brother
  • Look at Google transparency report
  • Journal response
    • Reflect on what we have talked about today
      Think about:
      Why is Facebook free?
      If data can be false (like our activity) what does this mean for companies collecting data about you?
      Do you own your data?
  • Homework - Data Log
    • Between now and next class keep track of times data is being collected about you and your family
    • As a group brainstorm times where data might be collected - what to look for

Monday, March 18, 2013

Big Data - Research

When I start off with a new topic in class I like to gather as much information as possible. Most of it will never make it to the notes of activities the kids see, but I find the more stuff floating around the ol' noggin the better I do with teaching the topic.

Data in particular is difficult. I majored in math - which was great for the theoretical underpinnings of mathematics (totally practical on a daily basis). The field of Operational Research was brand new when I was going through college, so again, not a huge foundation for this whole data thing.

Some of the resources I have been using, in no particular order:

Friday, March 15, 2013

Data Unit: Lets do the Numbers

Today we started the unit on data in the CS Principles

Sketch for Day 1 - (I am a visual planner)

This is probably the trickiest of all of the CS Principles Big Ideas. Data is just a big toic - a lot of ideas under one umbrella. For most Computer Science teachers it is not a topic that was a part of our training and it is not something we are used to teaching. It is a fairly new topic to computer science in general and I do think we are not yet sure how to approach the topic so that it is understandable and engaging for high school kids and also reachers the rigor of a college level course.

My approach for this Unit ("Lets do the Numbers") is to cover a few topics. We'll do the Data portfolio this unit. We need to look at number calculations in coding. This includes some ideas like roudoff error and will also tie back to the topics we did at the beginning of the year around data representation. If we have the time we'll also do some things around modeling and simulation.

This is a change for this year. Last year we got bogged down with coding and lost sight of the data aspects. really they don't need to amster all teh minutia of numeric processing, but you do want them to connect that processing to some pretty big changes happening in computer science right now.

A note of caution on Data - not all Data is BIG. There is a lot of buzz around the idea of Big data and it is important to to implement this with fidelity to the actual CS Principles standards. Big Idea II is Data, and it states:

Data: Data and information facilitate the creation of knowledge.

There are seven "Supporting Concepts" for this. Only one of them mentions Big Data.

(As an aside "supporting concepts??"- I know - sometimes I also wish we just got a topic list like the regular APCS and be done with it. Trust me, you'll get used to it).

So What? The point is not everything your kids will be doing for this topic involve big data sets. A lot of it really is just engaging with the topic and making connections outside the computer science lab.

Here are the thinking points I want my kids walking away with for the data topics:

  • Is it true?
  • What data is being collected about me?
  • How has data changed scientific research?
  • How is data stored?
  • Is data always good?

And really, aren't these basic civics topics for anyone living on the Internet?



The Unit --> Here's a list of what we are doing for the unit


Wednesday, July 25, 2012

Google Science Fair Winners

Google just awarded their Science Fair 2012 awards. The grand prize winner was Brittany Wenger from Florida. She won $50,000 scholarship, an internship and a trip to the Galapagos Islands. Not too shabby.



Her project focused on breast cancer detection. She wrote a program in Java that analyzed large data sets from the cloud looking for patterns.

"For Wenger, one of the highlights of the experience was meeting famed computer scientist Vint Cerf, who talked with her at length about computer science and neural networks."


Google's site also has some materials for teachers interested in the competition.

Tuesday, July 24, 2012

I Know What You Did on the Web

This graphic from Baynote Shows a summary of what common online companies like Facebook and Google are doing with your data. This would make a great discussion board topic relating to Big Data for CS Principles.

Saturday, July 21, 2012

Mapping the Internet

Whenever we start a new topic I love having an activity to set the tone. I came across the Mapping the Internet a few weeks back.

The projects asks people to "Please draw a map of the Internet, as you see it. Indicate your "home".

Yesterday I wrote about HTML in my Internet Unplugged unit. I think this would be a fun way to start off that unit.

My Daughter's Maps
They have a taxonomy of these maps. Which of course is data - another big idea from CS Principles. From here it would be interesting to collect the student's drawings and do some discussion board topics about what they drew. Were there common themes? How did your map change after learning about the physical parts of the Internet? Did it change after learning HTML and CSS?


Another thing I try to do is set a presence for computer science in my school. Wherever we can I do art projects and hang them in the halls around the building. This would be another great way to get computer science out of the lab.



Wednesday, June 27, 2012

Aneesh Chopra and Big Data Resources

Much to my family's annoyance I listen to A LOT of podcasts. We have two pre-teen girls and they mostly want to listen to "Call me Maybe" right now rather that chit chat.


Heavy in rotation for the past two weeks has been the Entrepreneurial Thought Leaders Podcast from Stanford University.

This talk by Aneesh Chopra, the United States Chief Technology Officer, is a great overview of some of the ways the US government is trying to keep up with technological changes. I was lucky enough to hear him speak at one of the UVA Tapestry Workshops a few years back when he was still serving as Secretary of Technology here in Virginia. He is always very interesting and I end up walking away with a huge list of thing to look up.

In this talk he includes several sources particularly relevant to teaching the CS Principles Data topic.

  • http://www.data.gov/ - free clearing house of data collected by the government. In the talk he mentions several entrepreneurs using this data to start companies.
  • Healthcare - there is some good discussion here on privacy and how large data sets can be used to help develop treatments. This was one of the topics my students really responded to this year.
  • Education - This gives a great example of high school students using data to save money for their school district by looking at electricity usage.
There is a lot more, these were the highlights. The advantage is Stanford breaks the podcast down into small snippets of video making it very usable in a classroom.

While I am on the subject, if you have the chance to get to one of the Tapestry Workshops you need to go. Great information on expanding computer science to a broader group of students. They have expanded to several universities - so there is sure to be one near you. (I am speaking at UVA briefly this Thursday, but don't let that scare you away, the rest of the presenters are amazing.)

Tuesday, June 26, 2012

CS Principles Bookshelf

One of the things I get asked most often about the pilot is "What textbook are you using?" In reality limiting the course to one book isn't possible. Part of the whole point of the curriculum is broadening the focus beyond just teaching coding.

Also, computer science is by its very nature is not static. Having a great set of reference books to work from has been essential this past year.


These three were my main resources:
  • Computer Science: An Overview by J. Glenn Brookshear
    • This book did a good job of covering the basics, plus it covered some other areas like artificial intelligence and computer graphics.
  • Computer Science Illuminated by Nell Dale and John Lewis
    • This one takes a layered approach starting with the information layer of how data is stored and processed and then moved up through applications. Each chapter had a good set of thought questions, beyond just simple vocabulary and multiple choice, that related to the topics covered.
  • Explorations In Computer Science by Mark Meyer
    • This books takes a laboratory approach with a series of hands on applets. Great for brainstorming how to approach a topic.
Some other books I used. If you haven't guessed I am a huge reader, so this is just the tip of the iceberg.
  • Introduction to Computational Science by Angela B. Shiflet and George W. Shiflet-
    • This book was new to me this year. One of the hardest areas to teach with the CS Principles course was Large Datasets. While the book is over the head of your average high school student, it is a fabulous resource for teachers. It is all about how computers are used in modeling and simulation and has an entire module on errors in modeling. Each topic covered has several case studies that range from Drug Dosage to Skydiving to Mushroom Fairy Rings. Great examples to pull from.
  •  Code: The Hidden Language of Computer Hardware and Software by Charles Petzold
    • This covers the first steps of representing data as a code - like Morse code all the way through graphics. It is clear and easy to follow with fun examples. It is a it a level high school students could follow most of it, although some passages get a little more mathy than most students are comfortable with. This is one of the books I came back to again and again.
  • The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography by Simon Singh
    • This is one of my all time favorites. His website also has lots of great interactive tools for teaching on this topic. I have used the digital version in class for several years. I also have developed a set of  hands-on cryptography activities whenever the network is down at school-like Caesar sticks and Morse code.  Understanding some of the history of how humans used encryption to send messages makes the abstraction of computer science easier to understand

Monday, June 4, 2012

Obama Talks Computer Science

OK - this is an oldie but goodie. Then Senator Obama:


We are finishing the year in CS Principles by programming arrays and looking at data. Here Obama is being interviewed at Google. The question starts "What is the most efficient way to sort..."

Sunday, November 27, 2011

Top 20 Programming Languages

This is pretty interesting, this site lists the top 20 programming languages, and tracks their popularity over time.

Pascal is up 0.991% in the past year. Who knew?


This could be a fun way to look at the idea of "Big Data" as a part of the CS Principles course.

Tuesday, August 30, 2011

Digital Music

NPR story on the development of MP3's

One of the big themes of the new CS Principles course is how data is stored digitally. This article from NPR covers how MP3's were developed and some of the limitaione. I plan on having the students do a discussion board response to this after we have covered binary numbers.