Category Archives: Big Data

Why Consumerization of IT Demands a Change in our Work Culture

Posted on March 3, 2016 | 2 comments

Struggling with the increasing number of mobile devices within the enterprise? You’re not alone. I have spent many of hours thinking about this challenge – and concluded that it’s not so much a technology challenge as a cultural one.

A decade ago, when McKinsey & Company analyzed the effect of IT investment, it found that companies only became more productive when investments in technology were matched with new ways to work. It seems odd now, but at the time, academics were questioning whether computers had made us more productive at all: we had found some ways to make individual jobs more efficient, but we were not working less, or creating more as a group.

Collaborative working based on Internet communications helped to solve that problem. But it showed us that, often, we know the answer – we just have to be given the freedom to express it. During the Internet revolution we suddenly found that our computers at home were easier to use than the ones at work. While organizations struggled to communicate and share information, the world wide web created a global community for whom it was natural.

Now we have a new opportunity. Millions of us are showing the way from the bottom up, using the devices and applications we like best. In many cases, our employers treat this as a problem or a risk that needs to be stopped. But if we can absorb the positive parts of that change into work culture, the reward is that we can be more creative, produce higher quality work, and improve the working experience of employees.

The effects of consumer technology are already being felt: 70 percent of companies have changed at least one business process and 20 percent of companies have changed at least four or more business processes, change management specialist Avanade reports.

To look at how consumer IT can change the way we work, simply look out of the window. When your colleagues arrive for work, they’re checking their personal information, their social networks, arranging their day – on the move, in a few seconds, without thinking about it as a task. Often they are using security that’s hidden from them, accessing data without needing to know where it is stored, and linking to groups of friends or relatives to solve problems and make decisions. We don’t need instruction books for our devices any more. At its best, consumer technology is intuitive by design, in a way that workplace applications rarely are.

But how do we design work that is intuitive, opportunistic, creative, and doesn’t need a manual? That has, so far, been less successful. There’s also evidence that many companies are working harder, but not necessarily more efficiently. The US labor force survey shows that the impact of flexible working has mostly been that we do more overtime.

But maybe that’s to be expected. It took years to apply the benefits of the PC revolution to our office culture, and now we’re starting down a similar road with consumer technology.

Maybe the inhibitors are that we still measure the same things as we did when centralized IT departments were a novelty. We exercise control in silos, leading to inflexible security, and accidentally making it difficult to build cross-functional teamwork. Now that CoIT is a fact, we could potentially measure and incentivize other things – satisfaction with work, the ease of getting to the information we need, whether our processes are really as simple as they could be.

These measures are often foreign to an IT department, but are second nature to games or smartphone designers, where the need for consumers to understand intuitively, to want to complete the task, and to feel that their device is safe and secure without limiting their imagination, are the essential measures of success.

But this goes far beyond the IT department. It asks fundamental questions of management: how much do we trust our staff to organize their own work? What are teams for, and how do we make them work? Is efficiency doing the same thing a bit faster, or thinking of new ways to do things? What is the office for, and when should I be in it? The decisions we make, and way in which we communicate those decisions to our supposedly Empowered colleagues, will decide the success of consumerization. As with the PC revolution, it’s not about the device, it’s how we work together when those devices are put in our hands.

2 Comments

Posted in Big Data, Cyber Security, Security, Technology

Tagged Collaborative, consumer technology, devices

Encrypt or not to encrypt?

Posted on March 2, 2016 | Leave a comment

Overview of and comments on backdoors, frontdoors and the debate about it

To me, privacy means that I can decide to keep my data private, and neither an NSA or government agent, nor a Facebook/Dropbox/Google employee can see what is in there, if I don’t want it. This concept of privacy is not compatible with any kind of ‘doors’ – front, back or other – to user data.

Since UK Prime Minister David Cameron suggested earlier this year to ban encryption, policy debates intensified in the US, EU, UK and elsewhere about back- or frontdoors built into encryption systems. Certain parties argue that they need front- or backdoors to access tech companies’ data, to prevent and fight criminal activity. But both backdoors and frontdoors violate end user privacy and if that wouldn’t be enough reason against these doors, they also undermine the world’s overall cybersecurity. Let’s see why.

What are frontdoors and backdoors? – definition and examples

First, let’s get two expressions straight: what is a backdoor and what is a frontdoor.

A backdoor is a covert way to provide an entity with a higher level access to a system than what it should normally have. A backdoor is usually hidden as random security bug, but instead of being an accidental mistake, it is planted intentionally. The key thing is that backdoor is hidden, even from the system operator, which makes them uncontrollable and hence, dangerous. Someone who is not supposed to do so, can exploit them.

A frontdoor is a way to give higher access to a system, but it in a way that it is known to the participants or at least by the system operator. It is also assured, that only that entity can use the frontdoor. This is like a master key in a hotel for the maid.

Snowden uncovered several secret operations of the NSA, and that started the current debate on encryption backdoors and frontdoors. NSA director Michael S. Rogers is argues for “front doors with a big lock”, meaning that in case of an investigation, the FBI or other authorities should have a legal and technical way, to access encrypted content. Washington Post created a graphic about the proposal – basically, White House is considering two platforms, one where the authorities can recover encrypted data using a key escrow, and another, where the recovery key is split between platform vendor and the authority. In my view, neither of the proposed options provides sufficient solution. Especially, that they do not guarantee non-US citizens, that they are not monitored by (for them) a foreign government.

If you are new to the security industry, this debate might sound new, but the NSA has a long track record: Der Spiegel reported in 1996 that a swiss national pride, Crypto AG placed backdoors in their renowned crypto machines, due to pressure from the NSA. Another, more recent example is the SP 800-90A standard proposal: researchers suspected that the NSA might have included a backdoor in one of the newly standardized pseudo random generator (namely, Dual_EC-DRBG) . This backdoor could’ve enabled the NSA to monitor anybody, regardless of their citizenship or if they are using a strong encryption algorithm or not.

Also, we should not forget to mention the Gemalto SIM encryption key database hack: a joint effort by the NSA and the British GCHQ. To understand why this action is controversial, we need to understand how the GSM (and 4G/3G) network works. The SIM card stores a symmetric encryption key, which is used to encrypt the traffic in the air. Due to the nature of symmetric ciphers, the same key need to be used to decrypt the content by the GSM core network. For that reason, the SIM keys are stored in a secure, central database, called Home Location Register, or HLR. HLRs are under the jurisdiction of the geolocal authority. That means, the NSA already had a sort of control over the domestic encryption keys, as a default. But then why did they need to hack a respected vendor? Because it enabled them to get any user’s data, without leaving a single mark. The former was actually admitted by General Keith Alexander in his keynote speech at Black Hat conference in 2013, while he denied any covert domestic content monitoring.

These things all undermine the credibility of intelligence agencies, and in general, triggers sometime unfounded suspicion. Not all secret operations are necessarily evil: after DES was introduced by IBM in the early 1970s, which later became the predominant block cipher in the industry, NSA tweaked its structure. NSA lowered the key size, and changed the deep structure (the S-boxes) of it without explanation. Many believed that NSA planted some backdoors, but it turned out later, that the change actually increased the security of DES: NSA already discovered possible attacks, and prepared against it.

All in all, backdoors in crypto systems are not recent inventions, we have seen several suspicious activities by government agencies throughout the past decades. Let me explain why backdoors and frontdoors are bad.

Why backdoors and frontdoors are bad? – the objective technical reasons

It’s not only ethical, philosophical and political problems that are involved with backdoors and frontdoors. There are also several technical reasons why it is extremely difficult to accomplish exceptional governmental access, without insecuring the whole Internet. A recent MIT report by respected security scientists mentions quite a few challenges that a general governmental “frontdoor” would have to face. They state that introducing any frontdoors to encryption systems in rush could lead to a disaster without proper specification and proper system design.

Our world increasingly relies on a trustworthy connection through the internet: individuals and business are banking online, companies transfer crucial business data through this network, governments communicate with their citizens online and so on. Due to this high economic dependence, we need to protect whatever goes through. The Internet could become so widespread, because it adapted to the arising security challenges step by step. Frontdoors and backdoors would undermine its security and would zero the work has been done so far.

There are 4 main technical issues with backdoors and frontdoors:
1.New protocols: The installation of frontdoors & backdoors requires complete new security protocols, new research and development
2.Non-immune governmental agencies: Government agencies are not immune to attacks. Imagine the risk of a terrorist hacking a government agency and gaining access to all data about the US population.
3.National governments versus global citizens: In our globalized world, who would decide which government has the frontdoor?
4.High costs and uncertain results: A system that provides governmental frontdoors is complex and expensive. Who will take the bill of that cost?

1.New protocols: The installation of frontdoors & backdoors requires complete new security protocols, new research and development. The current security systems have been designed in a way that there is no exceptional access in the system. And more or less, they have been functioning OK so far. Forward secrecy is a good example: without this solution if any time in the future any party is compromised, all traffic could be decrypted. Current security protocols are not the best, but with backdoors, most of the accomplishments, like forward security, would be ruined. Also, a new protocol that includes frontdoors needs to be analyzed thoroughly before implementation – it may take years. We’ve seen that most ad-hoc, non-analyzed protocols were cracked later on, just remember WEP, the Wi-Fi encryption.
2.Non-immune governmental agencies: The assumption that a governmental agency is unhackable or not vulnerable is naïve, and proven to be wrong. Its employees are humans too: they can quit, gossip, can be bribed or worse. Just think about Snowden: he walked away with a bunch of classified information. A few years ago, John Anthony Walker, a US officer was convicted of spying for the Soviet Union for almost 20 years: between 1968 and 1985. No organization is unhackable: embarrassingly, even Hacking Team, a government supplier of surveillance and tracking software was hacked in 2015. Damages can be major: in the recent breach of the US Office of Personnel Management, 21.5M social security numbers of government personnel were leaked. If some organization had a frontdoor to all the communication over the internet, a breach would mean a breach of the entire Internet – a breach nobody saw before.
3.National governments versus global citizens: In our globalized world, who would decide which government has access to users’ data? Or is this only a privilege of the NSA? If the NSA has access to users’ data, wouldn’t China or Russia have the right to claim the same? If you are a US citizen , who is working on a project in Europe, should the European government has access to all your personal data? And what if you are working in China or Russia? And what if you are not just in Europe for a short project, you are actually living there as an expat? If you say no to any of those questions, then why should US government have right to access to any foreign citizen’s data? I know these are provocative questions. But in our globalized world, people are working, buying and living in multiple countries. International trading could be completely killed by introducing frontdoor requirements on country level: a US company with factories in Pakistan, suppliers in China and retailers in the EU would have to trust all those governments, because if backdoors and frontdoors were implanted, they would all have the rights to access their confidential business data. .
4.High costs and uncertain results: Digital Right Management (DRM) systems are good examples for how a key management at a global scale can go wrong. Hollywood and the publishing industry has been trying to introduce a proper Digital Right Management platform to prevent piracy, without a breakthrough yet. The similarities between DRM and the frontdoors are the following:
•Both require complex cryptographic key management, as the content in DRM is encrypted or at least scrambled a bit.
•Key management needs to have a global scale, without any exception: if a title is published without DRM in a small number, it can make pirates to copy and distribute that exception.
•The key management is actually implemented by vendors, who are not having interest to make it right; e.g. a DVD player vendor is not incentivized by properly protecting the DRM key. At the same time, those vendors are under serious competitive cost pressure.

Despite the billions of dollars and many years of research, all DRM systems has been cracked so far, just think about DVD: pirates have found the weaknesses and the way around. In case of any frontdoor technique the stakes are much-much higher, so it would be really motivating to many criminal hackers. We also have less experience to defend these systems than in case of DRM, so any leak can be disastrous to all industries, not just “some” revenue loss for the publishing industry.

Conclusion:

So the answer to the question in the title is “yes, let’s encrypt”. I think encryption is crucial from multiple perspectives: security is important for the Internet ecosystem, and weakening that security can be a complete backfire for our freedom, economy and personal security. Also, any backdoor and frontdoor plans raise political, philosophical and ethical questions which leads to a debate, that I think no one wants to take on. Legislative authorities try to address to these new issues, but if different countries take a different direction to this, it will undermine the potential growth of the global economy and the Internet ecosystem.

Is It Time to Join the Cloud?

Posted on February 27, 2016 | 3 comments

Wondering if joining the cloud is the right move for your company? It’s a question that many CTOs have considered recently as the advantages of cloud computing are frequently heralded as the next evolution of managing an IT infrastructure. Moving your IT infrastructure to a cloud could save your company money on hardware and software costs, it could save time by providing maintenance and management for your data, and it can save resources by eliminating the needs and requirements for equipment storage.

However, any shift from the norm is met with some fair amount of healthy skepticism. And no company should hastily make the switch to cloud computing without first examining the disadvantages as well as the advantages. The reality is there is no one-size-fits-all answer when it comes to cloud computing and ultimately the decision must be made by examining the needs and operational requirements of each company and comparing these to the available cloud computing services.

Understanding the benefits of cloud computing first comes by examining the three different levels of service that cloud computing can provide: infrastructure-as-a-service, platform-as-a-service and software-as-a-service.

Infrastructure-as-a-service, also known as hardware as a service, uses virtual machines to connect to a partitioned space on the cloud servers. The local computers connect through via the Internet to the cloud server that does all the heavy lifting. The obvious benefit here is that it eliminates the cost of an in-house infrastructure – companies do not need to invest in capital expenditures like servers, data center space, and network equipment to get up and running. You can still use your own software, but it is all run on the cloud instead of your local computers. This is a great option for small, startup companies because they can immediately have access to an enterprise-grade infrastructure for a fixed monthly fee. Some vendors of hardware as a service include Rackspace, Sunguard, Cloudscaling, Amazon, Google and IBM.

The next level of cloud computing is platform-as-a-service. This option provides you with a development platform where you can develop software applications for the web. The cloud provider takes care of handling the loading for you and ensures your applications are elastic with the number of users. Think of Facebook as an example of a platform as a service provider. Third party developers can write new applications that Facebook makes available on their social application platform. Google also provides APIs to developers to build web applications. This service is useful for software development companies because the cloud provider facilitates the development of applications without the cost and complexity of buying and managing the underlying hardware and software provision hosting capabilities. You have all the facilities required to complete the development life cycle, from development, to testing, to deployment, to hosting, to maintenance in the same integrated development environment. This is a useful solution for companies that want to focus exclusively on software development because it relieves their platform woes. For companies that already use a platform internally, the platform-as-a-service advantage is that the cloud platforms are designed to scale linearly. Cloud development platforms have guidelines that help the application scale to accommodate any number of users. SalesForce.com’s Force.com is an example of a platform as a service vendor

The highest level of cloud computing is software-as-a-service, also called software on demand. Here, companies simply use software on a cloud rather than buy it, license it, upgrade it, and patch it on their local machines. Anyone using a service like Yahoo Mail or Google Docs is already using software as a service cloud computing. This is the most popular form of cloud computing because it is highly flexible and minimizes the maintenance of software. This service is best suited for companies that are not specifically in the technology business and simply need their software to be available and require little maintenance. Even companies who already have their own software should look into using software-as-a-service if they spend a lot of time on the maintenance of in-house software. There are many providers of software-as-a-service, including Amazon, Microsoft and Google.

Now, while many of these cloud computing services sound beneficial, there are still some disadvantages to take into account before jumping into a cloud. Keep in mind that all of these services require an Internet connection. If your connection goes out, you won’t be able to connect to the cloud and use the hardware, platform, and/or software that your company requires to operate. In this case, companies may want to still invest in some local infrastructure so operations do not come to a crashing halt.

Another concern is that some companies are apprehensive about turning all their data over to a third party (not to mention, it can be a chore to migrate massive amounts of data to a cloud). How can they be sure their data is protected? What if the cloud server is hacked? While these questions should be investigated, remember that cloud computing services live and die by their reputations, so information assurance is a high priority for all of them.

These fears of cloud computing stem from the fact that your company is at the mercy of a third party. There is a loss of control and it is not a predictive as having a local infrastructure. If something goes wrong, you have to depend on your cloud provider to respond and troubleshooting can be very complicated. Many companies are still reluctant to give up control over their data.

But with these warnings in mind, cloud computing has many general advantages that all companies can appreciate. Company data is backed up and secured by your cloud provider. Less equipment and hardware saves space and reduces electricity costs. Users have access to the same data and software no matter how geographically diverse. With less time spent on “keeping the lights on” with in-house maintenance, CTOs can better spend their time and resources on future growth. And with a fixed cost structure for the service, you can better allocate your IT budget.

Companies may want to consider first testing the waters by using an existing cloud offering as an extension of their in-house architecture. Then, if the company is comfortable with the service, they can move new projects to cloud-based services. Finally, the company can migrate their existing applications to the cloud if the cloud is reliable and it makes sense economically.

In the end, it is up to each individual CTO to determine if the advantages of cloud computing make sense for their company. This can only be determined with a thorough assessment of the costs and requirements of their technology needs and comparing it to the costs and risks of a cloud computing service. While it may be economical for some companies, it may not be for others. But for every company, it is worth at least worth the time and effort to look into.

3 Comments

Posted in Big Data, Cyber Security, Data Breach, Public Cloud, Security, Technology

Tagged Amazon, cloud, cloud computing, google, IaaS, IBM, Sungard

If Amazon were in Apple’s position, would it unlock its cloud for the feds?

Posted on February 24, 2016 | 1 comment

There’s an easy way to protect your data in the cloud.

As Apple continues to resist FBI demands to unlock a terrorist suspect’s phone, it raises a question: What if Amazon Web Services was ordered to provide access to a customer’s cloud? Would AWS hand the data over to the feds?

+MORE AT NETWORK WORLD: Tim Cook issues internal memo on ongoing FBI/iPhone saga | VMware turns to IBM in the public cloud +

Amazon’s terms of service provide us a clue. AWS says it complies with legally binding orders when compelled to do so. Here’s a statement from Amazon’s FAQ on cloud data privacy (which is not written specifically about the Apple-FBI issue):

“We do not disclose customer content unless we’re required to do so to comply with the law or a valid and binding order of a governmental or regulatory body. Governmental and regulatory bodies need to follow the applicable legal process to obtain valid and binding orders, and we review all orders and object to overbroad or otherwise inappropriate ones.”

Most of the time, when ordered to hand over data, Amazon does so. In 2015 AWS received 1,538 subpoenas from law enforcement officials, according to information the company recently began making public. Just over half the time (in 832 cases, or 54% of the time) AWS complied fully with those orders. Another quarter of the time (in 399 cases) Amazon partially responded to the request for information, while in the remaining 20% of cases AWS did not respond to the subpoena.

For customers who are concerned about Amazon handing over their data to the government, there are protections that can be put in place. “There’s a huge market focused on encrypting data stored in the cloud, and giving the customers the keys,” explains 451 Research analyst Adrian Sanabria. If customers use a third-party encryption service to scramble their data and manage the keys themselves, then even if Amazon did hand over the data to the feds, it would be useless. “Yes, it does sometimes create some issues with flexibility and breaking functionality, but it is there as an option if you want it, and (if done properly) AWS (or the government) can’t decrypt the data,” Sanabria says.

+ MORE ON APPLE: Apple and the FBI will need to compromise, Cisco’s CEO says +

AWS offers multiple different encryption methods, including ones that are built in automatically to some services – like S3, the Simple Storage Service, and others that customers manage themselves, such as the Hardware Security Module (HSM). AWS’s marketplace offers a variety of additional encryption and security services from independent software vendors.

Amazon says that it notifies customers when there’s been a request for their data to be handed over, unless there’s a compelling reason not to do that; for example if its clear the cloud service is being used for an illegal purpose.

AWS is more stringent about not providing other types of information to the government. In the second half of 2015 alone, AWS received 249 “National security requests” but did not comply with any of them. AWS also received 78 requests from non-U.S. entities, the vast majority of which (60) the company did not respond to.

AWS did not respond to a request to comment on this story.

Microsoft Azure basically has the same policy, according to the company’s website, saying “We do not provide any government with direct or unfettered access to your data except as you direct or where required by law.”

Even with all the concern over providers or the government being able to access data, Sanabria estimates that only a minority of cloud users encrypt data and manage their own keys.

1 Comment

Posted in Big Data, Network, Public Cloud, Security, Technology

Tagged Amazon, Apple, AWS, FBI

Internet of Things sparks healthcare cybersecurity concerns, HIMSS16 speaker says

Posted on February 24, 2016 | 4 comments

As connectivity continues to expand, cybersecurity should be top of mind for CIOs, CISOs and other hospital executives, according to Eric Miller of Ascension.

The Internet of Things is set to explode. Forecasters expect more than 6 billion objects connected to the Internet this year and some expect 50 billion by 2020. But with connectivity comes risk.

For healthcare providers trying to leverage what is emerging as the IoT for healthcare – that growing universe of wearable sensors, networked devices and home monitoring systems deployed to collect medical data and even treat patients – ineffective cybersecurity can have potentially dangerous consequences.

“The Internet of Things is different from the Internet of Things for healthcare in terms of risk,” said Eric Miller, senior director of IT at Ascension Information Services.

Miller pointed to a recent initiative in which white hat hackers working with the Mayo Clinic were easily able to hack into numerous connected medical devices, including an infusion pump that delivers drugs and fluids into patients.

One of the hired hackers, in fact, was able to connect an infusion pump to his computer network and manipulate the dosage remotely.

Miller and Paul Unbehagan, chief architect of Avaya, will discuss technologies that enable the security of connected devices and how providers can recognize and mitigate these cyber security risks during a HIMSS16 session on March 1, 2016.

“Our goal is to show how to reduce the risk from connected medical devices in a manageable way,” Miller added. “There’s a process side to it and a technology side, and we will discuss both,” Miller said.

The session will cover how providers can get a handle on the number and types of Internet of Things for healthcare devices connected to their network; how to apply risk models to device classifications in order to clarify the threat level; how to implement automation to manage the security of the growing number of connected devices; how to evaluate inventory management options against existing technologies; and how to create an implementation plan.

“We want attendees to leave this session with an understanding of how to improve their risk posture for the existing Internet of Things for healthcare as well as the connected devices to come,” he said.

“The Internet of Healthcare Things” will be held Tuesday, March 1, from 1 – 2 p.m. PST in the Sands Expo Convention Center Human Nature Theater.

4 Comments

Posted in Big Data, Cyber Security, Hacking, Hacks, Healthcare

Tagged IoT, medical, network, wearable sensors

For Your Eyes Only: Experts Explore Preventing Inadvertent Disclosures During Discovery

Posted on February 23, 2016 | 3 comments

The Altep, kCura and Milyli webinar explored best practices for safeguarding information, as well as technological tools for redaction

There may be a number of “Scott’s” in Chicago, but there are fewer with a specific last name attached, and there is only one with that specific Social Security Number. This information – or a telephone number, or a fingerprint, or even the MAC address of a computer – can be used to identify and verify a person.

But of course, for as valuable as personally identifiable information (PII) may be for you, it’s just as valuable to a malicious actor looking to steal and utilize it for nefarious purposes. That’s why, when conducting discovery, protecting that information should be of the utmost importance for organizations, law firms, and discovery vendors.

Three of those legal technology companies joined together to put that security forth in a recent webinar called“How to Prevent the Disclosure of PII.” The webinar’s panel included Hunter McMahon, vice president of legal and consulting services, Altep; Scott Monaghan, technical project manager, Milyli; Aileen Tien, advice specialist, kCura; and Judy Torres, vice president of information services, Altep.

In order to prevent disclosure, the panelists asked one important question: What exactly is PII? “It really comes down to what information can identify you as an individual,” McMahon said. This includes information that can be categorized into different categories based on how specific and how personal it is , leading McMahon to notenote that data holders should examined PII to determine if it is sensitive, private, or restricted.

When examining PII in the system, it’s also important to examine what regulations and laws the PII falls under. This can include a number of different federal regulations, HIPAA/HITECH (health PII), GLBA (financial PII), Privacy Act (PII held by Federal Agencies), and COPPA (children’s PII). Forty-seven states also have their own information laws, including varying guidelines on breach notification, level of culpability, and more.

Once that information is known, said the panelists, those conducting discovery should turn to the next question: What are the processes in place to protect the data? “Documents that are in the midst of discovery are really an extension of your retention policy… so you have to think about that risk the same way,” McMahon noted.

Torres explained that the proper approach to take to PII is that it will always be in a document set, if it seems unlikely that PII exists in a system. For example, she said not to assume that because a data set concerns only documents accessed during work hours, it will not contain PII.

“Most people, when they’re working, are also working the same time as those people they need to send documents to,” Torres explained. In one case, looking at data from Enron’s collapse, the documents in the case contained 7500+ instances of employee PII, including that of employee’s spouses and children, as well as home addresses, credit card numbers, SSN, and dates of birth.

In order to combat this data lying in the system, it’s important to take a proactive approach, the panel said. “The approach is much like data security in that it’s not going to be perfect, but you can help reduce the risk,” McMahon added.

To protect it in review, those conducting discovery can limit access to documents with PII, limit the ability to print, and limit the ability to download native files. Likewise, teams can employ safeguards during review such as training review teams on classifications of PII, training reviewers on PII workflow, implementing a mechanism for redaction and redaction quality control, and establishing technology encryption.

And even if not using human review, abiding these protocols can be important, “I see such a trend of more cases using assisted review, so you’re not necessarily having human eyes on every document. So it makes sense to make our best effort to protect PII on documents that may not necessarily have human review,” Torres said.

Properly conducting redactions to make sure nothing is missed can be a pain for reviewers as well, but Tien walked the webcast’s viewers through an introduction of regular expressions (reg-ex), one of the most common technology tools for PII redaction. In short, reg-ex is a pattern searching language that allows one to construct a single search string to search for a pattern of characters, such as three numbers, or three letters.

For one example, Social Security Numbers have a very specific format: XXX-XX-XXXX. Reg-ex can be used to find all constructions of this type, using an input like the following: [0-9]{3} – [0-9]{2} – [0-9]{4}

“With practice, you’ll be able to pick this up like any foreign language,” Tien said.

See post Sneaky PII: What’s Hiding in Your Data?

3 Comments

Posted in Big Data, eDiscovery, Hacking, Hacks, Security

Tagged discovery, HIPAA, law firms, PII, social security numbers

The NSA’s SKYNET program may be killing thousands of innocent people

Posted on February 17, 2016 | 1 comment

“Ridiculously optimistic” machine learning algorithm is “completely bullshit,” says expert.

An MQ-9 Reaper sits on the flightline.

In 2014, the former director of both the CIA and NSA proclaimed that “we kill people based on metadata.” Now, a new examination of previously published Snowden documents suggests that many of those people may have been innocent.

Last year, The Intercept published documents detailing the NSA’s SKYNET programme. According to the documents, SKYNET engages in mass surveillance of Pakistan’s mobile phone network, and then uses a machine learning algorithm on the cellular network metadata of 55 million people to try and rate each person’s likelihood of being a terrorist.

Patrick Ball—a data scientist and the executive director at the Human Rights Data Analysis Group—who has previously given expert testimony before war crimes tribunals, described the NSA’s methods as “ridiculously optimistic” and “completely bullshit.” A flaw in how the NSA trains SKYNET’s machine learning algorithm to analyse cellular metadata, Ball told Ars, makes the results scientifically unsound.

Somewhere between 2,500 and 4,000 people have been killed by drone strikes in Pakistan since 2004, and most of them were classified by the US government as “extremists,” the Bureau of Investigative Journalism reported. Based on the classification date of “20070108” on one of the SKYNET slide decks (which themselves appear to date from 2011 and 2012), the machine learning program may have been in development as early as 2007.

In the years that have followed, thousands of innocent people in Pakistan may have been mislabelled as terrorists by that “scientifically unsound” algorithm, possibly resulting in their untimely demise.

The siren song of big data

SKYNET works like a typical modern Big Data business application. The program collects metadata and stores it on NSA cloud servers, extracts relevant information, and then applies machine learning to identify leads for a targeted campaign. Except instead of trying to sell the targets something, this campaign, given the overall business focus of the US government in Pakistan, likely involves another branch of the US government—the CIA or military—that executes their “Find-Fix-Finish” strategy using Predator drones and on-the-ground death squads.

Enlarge / From GSM metadata, we can measure aspects of each selector’s pattern-of-life, social network, and travel behaviour

In addition to processing logged cellular phone call data (so-called “DNR” or Dialled Number Recognition data, such as time, duration, who called whom, etc.), SKYNET also collects user location, allowing for the creation of detailed travel profiles. Turning off a mobile phone gets flagged as an attempt to evade mass surveillance. Users who swap SIM cards, naively believing this will prevent tracking, also get flagged (the ESN/MEID/IMEI burned into the handset makes the phone trackable across multiple SIM cards).

Enlarge / Travel patterns, behaviour-based analytics, and other “enrichments” are used to analyse the bulk metadata for terroristiness

Even handset swapping gets detected and flagged, the slides boast. Such detection, we can only speculate (since the slides do not go into detail on this point), is probably based on the fact that other metadata, such as user location in the real world and social network, remain unchanged.

Given the complete set of metadata, SKYNET pieces together people’s typical daily routines—who travels together, have shared contacts, stay overnight with friends, visit other countries, or move permanently. Overall, the slides indicate, the NSA machine learning algorithm uses more than 80 different properties to rate people on their terroristiness.

The program, the slides tell us, is based on the assumption that the behaviour of terrorists differs significantly from that of ordinary citizens with respect to some of these properties. However, as The Intercept’s exposé last year made clear, the highest rated target according to this machine learning program was Ahmad Zaidan, Al-Jazeera’s long-time bureau chief in Islamabad.

Enlarge / The highest scoring selector who travelled to Peshawar and Lahore is “PROB AHMED ZAIDAN”, Al-Jazeera’s long-time bureau chief in Islamabad.

As The Intercept reported, Zaidan frequently travels to regions with known terrorist activity in order to interview insurgents and report the news. But rather than questioning the machine learning that produced such a bizarre result, the NSA engineers behind the algorithm instead trumpeted Zaidan as an example of a SKYNET success in their in-house presentation, including a slide that labelled Zaidan as a “MEMBER OF AL-QA’IDA.

Feeding the machine

Training a machine learning algorithm is like training a Bayesian spam filter: you feed it known spam and known non-spam. From these “ground truths” the algorithm learns how to filter spam correctly.

In the same way, a critical part of the SKYNET program is feeding the machine learning algorithm “known terrorists” in order to teach the algorithm to spot similar profiles.

The problem is that there are relatively few “known terrorists” to feed the algorithm, and real terrorists are unlikely to answer a hypothetical NSA survey into the matter. The internal NSA documents suggest that SKYNET uses a set of “known couriers” as ground truths, and assumes by default the rest of the population is innocent.

Pakistan has a population of around 192 million people, with about 120 million cellular handsets in use at the end of 2012, when the SKYNET presentation was made. The NSA analysed 55 million of those mobile phone records. Given 80 variables on 55 million Pakistani mobile phone users, there is obviously far too much data to make sense of manually. So like any Big Data application, the NSA uses machine learning as an aid—or perhaps a substitute, the slides do not say—for human reason and judgement.

SKYNET’s classification algorithm analyses the metadata and ground truths, and then produces a score for each individual based on their metadata. The objective is to assign high scores to real terrorists and low scores to the rest of the innocent population.

Enlarge / A sample travel report produced by SKYNET

To do this, the SKYNET algorithm uses the random forest algorithm, commonly used for this kind of Big Data application. Indeed, the UK’s GCHQ also appears to use similar machine learning methods, as new Snowden docs published last week indicate. “It seems the technique of choice when it comes to machine learning is Random Decision Forests,” George Danezis, associate professor of Security and Privacy Engineering at University College London, wrote in a blog post analysing the released documents.

The random forest method uses random subsets of the training data to create a “forest” of decision “trees,” and then combines those by averaging the predictions from the individual trees. SKYNET’s algorithm takes the 80 properties of each cellphone user and assigns them a numerical score—just like a spam filter.

SKYNET then selects a threshold value above which a cellphone user is classified as a “terrorist.” The slides present the evaluation results when the threshold is set to a 50 percent false negative rate. At this rate, half of the people who would be classified as “terrorists” are instead classified as innocent, in order to keep the number of false positives—innocents falsely classified as “terrorists”—as low as possible.

False positives

We can’t be sure, of course, that the 50 percent false negative rate chosen for this presentation is the same threshold used to generate the final kill list. Regardless, the problem of what to do with innocent false positives remains.

“The reason they’re doing this,” Ball explained, “is because the fewer false negatives they have, the more false positives they’re certain to have. It’s not symmetric: there are so many true negatives that lowering the threshold in order to reduce the false negatives by 1 will mean accepting many thousands of additional false positives. Hence this decision.”

Enlarge / Statistical algorithms are able to find the couriers at very low false alarm rates, if we’re allowed to miss half of them

One NSA slide brags, “Statistical algorithms are able to find the couriers at very low false alarm rates, if we’re allowed to miss half of them.”

But just how low is the NSA’s idea of “very low”?

“Completely bullshit”

The problem, Ball told Ars, is how the NSA trains the algorithm with ground truths.

The NSA evaluates the SKYNET program using a subset of 100,000 randomly selected people (identified by their MSIDN/MSI pairs of their mobile phones), and a a known group of seven terrorists. The NSA then trained the learning algorithm by feeding it six of the terrorists and tasking SKYNET to find the seventh. This data provides the percentages for false positives in the slide above.

“First, there are very few ‘known terrorists’ to use to train and test the model,” Ball said. “If they are using the same records to train the model as they are using to test the model, their assessment of the fit is completely bullshit. The usual practice is to hold some of the data out of the training process so that the test includes records the model has never seen before. Without this step, their classification fit assessment is ridiculously optimistic.”

The reason is that the 100,000 citizens were selected at random, while the seven terrorists are from a known cluster. Under the random selection of a tiny subset of less than 0.1 percent of the total population, the density of the social graph of the citizens is massively reduced, while the “terrorist” cluster remains strongly interconnected. Scientifically-sound statistical analysis would have required the NSA to mix the terrorists into the population set before random selection of a subset—but this is not practical due to their tiny number.

This may sound like a mere academic problem, but, Ball said, is in fact highly damaging to the quality of the results, and thus ultimately to the accuracy of the classification and assassination of people as “terrorists.” A quality evaluation is especially important in this case, as the random forest method is known to overfit its training sets, producing results that are overly optimistic. The NSA’s analysis thus does not provide a good indicator of the quality of the method.

Enlarge / A false positive rate of 0.18 percent across 55 million people would mean 99,000 innocents mislabelled as “terrorists”

If 50 percent of the false negatives (actual “terrorists”) are allowed to survive, the NSA’s false positive rate of 0.18 percent would still mean thousands of innocents misclassified as “terrorists” and potentially killed. Even the NSA’s most optimistic result, the 0.008 percent false positive rate, would still result in many innocent people dying.

“On the slide with the false positive rates, note the final line that says ‘+ Anchory Selectors,'” Danezis told Ars. “This is key, and the figures are unreported… if you apply a classifier with a false-positive rate of 0.18 percent to a population of 55 million you are indeed likely to kill thousands of innocent people. [0.18 percent of 55 million = 99,000]. If however you apply it to a population where you already expect a very high prevalence of ‘terrorism’—because for example they are in the two-hop neighbourhood of a number of people of interest—then the prior goes up and you will kill fewer innocent people.”

Besides the obvious objection of how many innocent people it is ever acceptable to kill, this also assumes there are a lot of terrorists to identify. “We know that the ‘true terrorist’ proportion of the full population is very small,” Ball pointed out. “As Cory [Doctorow] says, if this were not true, we would all be dead already. Therefore a small false positive rate will lead to misidentification of lots of people as terrorists.”

“The larger point,” Ball added, “is that the model will totally overlook ‘true terrorists’ who are statistically different from the ‘true terrorists’ used to train the model.”

In most cases, a failure rate of 0.008% would be great…

The 0.008 percent false positive rate would be remarkably low for traditional business applications. This kind of rate is acceptable where the consequences are displaying an ad to the wrong person, or charging someone a premium price by accident. However, even 0.008 percent of the Pakistani population still corresponds to 15,000 people potentially being misclassified as “terrorists” and targeted by the military—not to mention innocent bystanders or first responders who happen to get in the way.

Security guru Bruce Schneier agreed. “Government uses of big data are inherently different from corporate uses,” he told Ars. “The accuracy requirements mean that the same technology doesn’t work. If Google makes a mistake, people see an ad for a car they don’t want to buy. If the government makes a mistake, they kill innocents.”

Killing civilians is forbidden by the Geneva Convention, to which the United States is a signatory. Many facts about the SKYNET program remain unknown, however. For instance, is SKYNET a closed loop system, or do analysts review each mobile phone user’s profile before condemning them to death based on metadata? Are efforts made to capture these suspected “terrorists” and put them on trial? How can the US government be sure it is not killing innocent people, given the apparent flaws in the machine learning algorithm on which that kill list is based?

“On whether the use of SKYNET is a war crime, I defer to lawyers,” Ball said. “It’s bad science, that’s for damn sure, because classification is inherently probabilistic. If you’re going to condemn someone to death, usually we have a ‘beyond a reasonable doubt’ standard, which is not at all the case when you’re talking about people with ‘probable terrorist’ scores anywhere near the threshold. And that’s assuming that the classifier works in the first place, which I doubt because there simply aren’t enough positive cases of known terrorists for the random forest to get a good model of them.”

The leaked NSA slide decks offer strong evidence that thousands of innocent people are being labelled as terrorists; what happens after that, we don’t know. We don’t have the full picture, nor is the NSA likely to fill in the gaps for us. (We repeatedly sought comment from the NSA for this story, but at the time of publishing it had not responded.)

Algorithms increasingly rule our lives. It’s a small step from applying SKYNET logic to look for “terrorists” in Pakistan to applying the same logic domestically to look for “drug dealers” or “protesters” or just people who disagree with the state. Killing people “based on metadata,” as Hayden said, is easy to ignore when it happens far away in a foreign land. But what happens when SKYNET gets turned on us—assuming it hasn’t been already?

1 Comment

Posted in Big Data, Network, Security, Technology

Tagged CIA, Cloud Analytic, Human Rights, metadata, mobile phone, SKYNET

Category Archives: Big Data

Why Consumerization of IT Demands a Change in our Work Culture

Encrypt or not to encrypt?

If Amazon were in Apple’s position, would it unlock its cloud for the feds?

There’s an easy way to protect your data in the cloud.

Internet of Things sparks healthcare cybersecurity concerns, HIMSS16 speaker says

Advertisement

Donate to The Digital Age Blog

Terror Advisory

Recent Posts

Recent Comments

Archives

Categories

Category Archives: Big Data

Why Consumerization of IT Demands a Change in our Work Culture

Encrypt or not to encrypt?

Is It Time to Join the Cloud?

If Amazon were in Apple’s position, would it unlock its cloud for the feds?

There’s an easy way to protect your data in the cloud.

Internet of Things sparks healthcare cybersecurity concerns, HIMSS16 speaker says

For Your Eyes Only: Experts Explore Preventing Inadvertent Disclosures During Discovery

The Altep, kCura and Milyli webinar explored best practices for safeguarding information, as well as technological tools for redaction

The NSA’s SKYNET program may be killing thousands of innocent people

The siren song of big data

Feeding the machine

False positives

“Completely bullshit”

In most cases, a failure rate of 0.008% would be great…

Advertisement

Social

Donate to The Digital Age Blog

Terror Advisory

Recent Posts

Recent Comments

Archives

Categories

Share this page