Big Data vs Your Data

(and its impact on your digital photos)

Have you ever wondered what all the fuss is about big data? Have you considered how big data impacts YOUR data? We are going to address the issues the big data trend creates, specifically as it relates to your digital photo library. Note for you fans of the TV series, Person of Interest, we will not confirm whether the U.S. Government or some other organization really has “the machine” that is feared and omnipresent in every episode.

Let’s start by seeing if we can quantify what “big data” encompasses. How big IS “big data?” In 2013, estimates from various sources reached four zettabytes of data generated worldwide.1 What is a zettabyte? Suffice to say, it means a LOT of bytes – or units of information – where one byte equals one character of text. To give you some context, imagine that every person in the United States took a digital photo every second of every day for over a month. All of those photos put together would equal about one zettabyte.

These zettabytes are comprised of more than 500 million photos are uploaded and shared every day, along with more than 200 hours of video every minute. Wrap your head around that! And then, consider that the volume of information that people create themselves— from voice calls, emails and texts to uploaded pictures, video, and music—pales in comparison to the amount of digital information created about them each day.

These trends will continue. We are only in the very nascent stage of the so-called “Internet of Things,” when our appliances, our vehicles and a growing set of “wearable” devices will be able to communicate with one other. Technological advances have driven down the cost of creating, capturing, managing, and storing information to one-sixth of what it was in 2005.

The “Internet of Things”

The “Internet of Things” is a term used to describe the ability of devices to communicate with one other using embedded sensors that are linked through wired and wireless networks. These devices could include your thermostat, your car, or a pill you swallow so the doctor can monitor the health of your digestive tract. These connected devices use the Internet to transmit, compile, and analyze data.2

There are many definitions of “big data” which may differ depending on whether you are a computer scientist, a financial analyst, or an entrepreneur pitching an idea to a venture capitalist. Most definitions reflect the growing technological ability to capture, aggregate, and process an ever-greater volume, velocity, and variety of data. In other words, “data is now available faster, has greater coverage and scope, and includes new types of observations and measurements that previously were not available.”3 More precisely, big datasets are “large, diverse, complex, longitudinal, and/or distributed datasets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.”4

What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world. Unprecedented computational power and sophistication make possible unexpected discoveries, innovations, and advancements in our quality of life. But these capabilities, most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.

Part of the challenge, too, lies in understanding the many different contexts in which big data comes into play. Big data may be viewed as property, as a public resource, or as an expression of individual identity.5

Big data applications may be the driver of America’s economic future or a threat to cherished liberties. Big data may be all of these things. Used well, big data analysis can boost economic productivity, drive improved consumer and government services, thwart terrorists, and save lives.6

For example, a genetic researcher at the Broad Institute found that having a large number of genetic datasets makes the critical difference in identifying the meaningful genetic variant for a disease. In this research, a genetic variant related to schizophrenia was not detectable when analyzed in 3,500 cases, and was only weakly identifiable using 10,000 cases, but was suddenly statistically significant with 35,000 cases. As the researcher observed, “There is an inflection point at which everything changes.”7

Privacy laws can restrict access to data necessary for healthcare advances such as this. What is the value of a life (or many lives) versus the individual right to privacy of their data? This will always be a point of contention in determining public policy.

Of course every human should have a common concern for personal privacy. Especially when we are regularly reading about hackers accessing private credit card info from supposedly “secure” computers at large retail companies. Or with the NSA eavesdropping on virtually every type of communication that travels across a wire. The following excerpt is from an article in a New York Times article8 recently and mentioned on CNN Headline News:

The N.S.A. achieved a technical breakthrough in 2010 when analysts first matched images collected separately in two databases — one in a huge N.S.A. database code-named Pinwale, and another in the government’s main terrorist watch list database, known as Tide — according to N.S.A. documents. That ability to cross-reference images has led to an explosion of analytical uses inside the agency. The agency has created teams of “identity intelligence” analysts who work to combine the facial images with other records about individuals to develop comprehensive portraits of intelligence targets.

But I digress from our initial goal, and that was to evaluate the risks presented by big data from a personal perspective relative to our digital photos. When we take a digital photo with most current devices – especially mobile phones where most digital photos originate today – the data collected and stored with the photo includes among other items:

  • Camera type and settings
  • Date/time
  • GPS (location)

If the photos are tagged with people then the image metadata also may contain face coordinates and a name. A caption may also be added to the photo. If the photos never left your camera storage card or your personal computer then you would have no exposure and no privacy risk. The only risk you would have would be losing the data if it wasn’t backed up on another storage device.

The reality is that we are social creatures, and social creatures love to share things and digital photos are one of the most popular things we share amongst our species.

In the new digital world, information can be captured, copied, shared, and transferred at high fidelity and retained indefinitely. Volumes of data that were once impossible to store for any length of time, are now saved cheaply and effectively, forever. Furthermore, digital data often concerns multiple people, making personal control impractical. For example, who owns a photo—the photographer, the people represented in the image, the person who first posted it, or the site to which it was posted? The spread of these new technologies are fundamentally changing the relationship between a person and the data about him or her.

As soon as you upload your photos to Apple’s cloud, Google’s cloud, Facebook’s cloud (or plug in the name of your favorite cloud) service, all bets are off for the privacy of your information. In the case of Facebook, ownership of the photo transfers to Facebook. Similarly, as soon as you “privately” send a digital photo to a friend, all bets are off if they share that photo with others.

Let’s review some of the things “bad people” could do with your photos.

  1. Identity theft. With your name and a good photo they could create fake id to obtain a credit card or mortgage using your identity – or one of your family members.
  2. Finding out that you are on vacation. If you are posting pics from your vacation then someone could determine that you are away from home.
  3. Post fake items using your image. There have been numerous reports of people using someone else’s appearance to support their dating profile or other social network presence. This is more of a nuisance but could have some serious implications in certain situations where foul play resulted during a blind date.
  4. Capture other personal information based on your photos. Examples include: pictures of your car and your plates are visible; pictures of your house and your street number is visible; pictures of your kids with tags allowing “them” to know who is in your family. Likewise, valuable property could be targeted for theft if made highly visible.

Some of these items are pretty scary. So what can you do? Here are some basic “best practices” you can follow to manage and mitigate your risks.

  1. When posting pictures to Facebook and Google+ be wary of the content of those pics (and Instagram and… any service where the pics may end up in the general public via the InterWeb).
  2. Generally don’t tag the photos with more than first names if you are posting to Facebook or Google+.
  3. Store your photos on private cloud storage sites where;
    • You know the company is going to be around for a long time – an established firm with a good reputation.
    • You have a contractual relationship with them – in other words, not a free site where they could shut down at any time.
  4. Be careful posting vacation photos. Share them with your close friends, but not the world!
  5. Periodically – at least twice a year – copy the photos off your phone memory card. And then clean the card to avoid identity theft issues identified earlier in this article. A phone is one of the easiest things to lose, and if you lose thousands of songs it is no big deal because they can be replaced. But if you have thousands of pictures and they only exist on your phone then you wouldn’t be able to replace them and THAT could be a very big deal.
  6. Use photo organization tools that sync the photos on your phone, with the photos in your private cloud, or with a photo library on your personal computer (for total private control). This gives you a backup of the images that represent the tangible record of your memories. Those moments of your life can be enjoyed by you and others for years to come.

There is no doubt that the world of big data will keep growing. It’s my hope that these tips will help you manage your piece of it.

  1. Mary Meeker and Liang Yu, Internet Trends, Kleiner Perkins Caulfield Byers, 2013
  2. Big Data: Seizing Opportunities, Preserving Values, US Government Report
  3. Liran Einav and Jonathan Levin, “The Data Revolution and Economic Analysis”, Working Paper, No. 19035, National Bureau of Economic Research, 2013
  4. National Science Foundation, Solicitation 12-499: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA), 2012
  5. Harvard Professor of Science & Technology Studies Sheila Jasanoff argues that framing the policy implications of big data is difficult precisely because it manifests in multiple contexts that each call up different operative concerns, including big data as property (who owns it); big data as common pool resources (who manages it and on what principles); and big data as identity (it is us ourselves, and thus its management raises constitutional questions about rights).
  6. Big Data: Seizing Opportunities, Preserving Values, US Government Report
  7. Manolis Kellis, “Importance of Access to Large Populations,” Big Data Privacy Workshop: Advancing the State of the Art in Technology and Practice, Cambridge, MA, March 3, 2014
  8. New York Times – May 31, 2014
Posted in blog, Privacy