The term “Big Data” has become ubiquitous in higher education, especially around discussions of using data to help with student success. But what exactly is big data; we have had loads of data around for a very long time. If you do an online search you will get many different definitions including:
According to Wikipedia, “big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.”
"In higher education, we must be careful that we are not trying to find patterns that do not exist"
Microsoft states, “Big data is the term increasingly used to describe the process of applying serious computing power— the latest in machine learning and artificial intelligence—to seriously massive and often highly complex sets of information.”
The National institute of Health suggests, “Big Data is more than just very large data or a large number of data sources. Big Data refers to the complexity, challenges, and new opportunities presented by the combined analysis of data.”
One thing they all have in common is that they are large, complex, growing exponentially, and unwieldly. The collection of unstructured data has increased the amount of data collected tremendously. The world creates 100 terabytes of data every day, and it is estimated that 35 zettabytes of data will be created by 2020. A zettabyte is equal to 1 trillion gigabytes or 1021 bytes.
Data is collected from many sources in addition to traditional databases like digital pictures, videos, social media posting, cell phones, web pages, emails, sensors, and many others.
Higher education institutions are collecting traditional data about demographics, classes, and grades. More recently they are also collecting social media data, behavioral data like trips to the cafeteria, engagement metrics, and student location with GPS metrics and geocaching. The purpose of collecting all this data is to data mine and build predictive analytics to help increase student success. Higher Education predictive analytic models are founded on the notion that students with similar profiles and backgrounds tend to make similar choices and have the same needs. In addition, many vendors are collecting even more information about student activities and behaviors.
Amazon, Google, Starbuck are just a few examples of companies that collect large amounts of data on our everyday activity. They use it to increase sales while making it easy for us to spend our money with them. The potential of what higher education could do with large amounts of student activity data offers a compelling reason to start collecting more, even without the knowledge of how it could be used. Using data to upsell to students and determine ways to enhance success is at our finger tips. An example is mapping a student’s pattern of going to study hall, tutoring, classes, or even the cafeteria. If the student’s pattern changes it could be a sign of something wrong. A big question to ask is if collecting this data is crossing a line of privacy. Will institutions waver on the edge of paternalism?
According to Scientific America, people have what is called patternicity, we see patterns where they really do not exist. We have heard of people seeing images of Jesus in their toast or a cloud, they may see a pattern in stock market numbers. This is because of the priming effect which helps our brain and senses interpret stimuli based on expected models. Seeing patterns can be very helpful in solving problems; unfortunately, we do not have a detector in our brain that notifies us when a pattern does not really exist. In higher education, we must be careful that we are not trying to find patterns that do not exist.
Education by its nature is all about ethics. We expect students to be honest, do their own homework, and above all not plagiarize. For those in academia who do research, there are tenants that pertain to ethics including informed-consent, respecting confidentiality, and protecting individuals from harm. With this in mind we must make sure that institutions are not collecting data just because we can and it shows a pattern. We must analyze carefully if the interventions we are creating based on patterns found in our data sets are helping students and not just conforming to the expectations of society and the institution.
Higher Education institutions are no different than any other business that needs to survive. Behind student success goals, institutions conform to a system that values students getting good grades and having continued progress toward finishing a degree for the institutions to build revenue and stay in business. Without continued growth in enrollment, and students persisting to graduation, institutions of higher learning will struggle with funding. At the end of the day higher education institutions need to get their product to market, which is graduating students. Understanding why data needs to be collected, what can be determined with it, and how to protect it must be considered before we begin the process of mass collection and analysis.