Social Security Number Vulnerability Findings Relied on Supercomputing

Yet another reason to be glad that the Pittsburgh Supercomputing Center (PSC) is right in our backyard. Check ou this news release directly from PSC.

Information available on the Internet can in certain cases be used to predict individual social-security numbers, posing a risk of identity theft that policy-makers and individuals should address. This finding, an unexpected consequence of public information in modern economies, published (Monday, July 6) in the Proceedings of the National Academy of Sciences (PNAS) and highlighted in the New York Times (July 7) and other national media, relied on computational resources of the TeraGrid, a National Science Foundation cyberinfrastructure program. It would have been difficult, if not impossible, to obtain these findings without these publicly-funded, high-performance computing (HPC) resources, says one of the lead researchers, Alessandro Acquisti, a professor at Carnegie Mellon University.

About a year ago, at an important phase in the project, Acquisti and his colleague, Ralph Gross, a post-doctoral researcher, and several graduate students who worked with them, began using a large-scale parallel computing system at the Pittsburgh Supercomputing Center (PSC). “At that stage,” said Acquisti, “we had a rough idea of the results, but to go forward we had to try many different variations of the algorithms. It would have been incredibly difficult to do this, or taken much, much longer without access to this system.”

After first working with desktop computers, the researchers turned last year to a PSC system called Pople (named for Nobel laureate chemist John Pople of Carnegie Mellon). A Silicon Graphics Altix 4700, installed in March 2008, Pople has 768 cores (processors) and 1.5 terabytes of shared memory (all of memory accessible from each core). The SSN runs used up to 400 of Pople’s cores and 800 gigabytes of memory, a large memory requirement that made Pople’s shared memory very helpful to the project.

TeraGrid staff at PSC installed Octave — an open-source version of the programming language MATLAB — and wrote a script to submit a large number of parallel Octave jobs simultaneously on Pople. This facilitated the Acquisti team’s interactive process, which involved doing many runs representing different states and computational strategies, checking and analyzing results and re-thinking before running more variations. PSC’s consulting, said Acquisti, was “extremely helpful.”

One fairly unassuming graphical figure in the PNAS paper, notes Acquisti, represents results of “more than 700,000 regressions over very large sets of data,” which to computational scientists gives a sense of the immense computational scope of the problem. 

“This project,” said Sergiu Sanielevici, PSC director of scientific applications and user support, who also leads user support and services for the TeraGrid, “exemplifies how powerful systems like Pople can open doors to data-mining and data-centric research in fields not traditionally associated with HPC, such as the social sciences, and make it possible to get answers that would otherwise be impractical or impossible.”

PSC supported this project through the NSF TeraGrid program, which allocates large-scale computing resources free to researchers at U.S. universities on a peer-review proposal basis.

Get super geeky and read even more right here.

0 Responses to “Social Security Number Vulnerability Findings Relied on Supercomputing”



  1. No Comments Yet

Leave a Reply