Big Data ==> Big, Dynamic, and Linked Data

About IBM System G

IBM System G is a comprehensive set of graph computing software system for Big Data portfolio.


In the Big Data era, data are linked and form large graphs. But, most traditional IT systems were designed for processing independent data, while analyses are mostly done in independent scenarios. Processing connected data has been a big challenge for Big Data Analytics, which requires both the traditional big data platforms for data processes that are easily to be parallelized and the novel graph computing platforms for data that are linked.




From the scientific aspect, Network as a new inter-disciplinary scientific field is emerging. Entities -- people, information, societies, nations, devices -- connect to each other and form all kinds of intertwined networks. Researchers from multiple disciplines -- electrical engineering, computer science, sociology, public health, economy, management, politics, laws, arts, physics, math, etc. -- are interacting with each other to build up common grounds of network science. Network theories are being formed for describing the dynamics, behaviors, and structures. A systematic mathematical formalism that enables predictions of network behavior and network interactions is also emerging. Trans-disciplinary approaches are usually required to lay the foundations of this science and to develop the requisite tools. Like 'Computer Science' was coined as an academic disciipline in the 1950s and the first computer science course was taught by IBMers in Columbia University in 1947, we are now envisioning the emerging of 'Network Science'. Graph Computing is the "tool" for Network Science, for storing, processing, analyzing, and visualizing connected data.




IBM System G is the first complete software stack for all aspects of Graph Computing. As in the above graph, there are three major aspects of graphs -- graph storage & retrieval, graph topological analysis, and graphical models. Graph database is a tool for efficiently managing large-scale graph data, especially important for contextual and relationship analysis. Graph analytics is important for finding the important vertices or edges that are more central, that are clustered, or that form abnormal patterns. Graphical models are essential to artificial intelligence, information reasoning and predictive analysis, which requires combining many factors to create actionable insights.

Different combinations of IBM System G components can be selected to fit different solution needs. Graphs may be large or small, static or dynamic, topological or semantic, and properties or bayesian. For instance, for graph database, small & non-commercial graphs can be stored in open source DB, large-scale graphs can be stored in GBase, or highly efficient scalable Native Graph Store. Applications can also pick up different analytics functions, e.g., PageRank, community clustering, betweenness computation, shortest path computations, etc, for their analytics needs. There are also various graph visualization tools to choose. IBM System G is flexible to allow solutions to pick the components they need while providing common APIs for different layers. These flexibilities are especially suited in the Service and Cloud environment.


IBM System G also includes several derived Network Science Analytics tools based on the state-of-the-art researches. They include the analytic tools of cognitive networks that are fundamental form of brains -- large-scale Bayesian Network Tools and Deep Learning Tools. We are also developing platform to store, analyze, and visualize mammal brain neurons, with the optimal goal of contributing to the human brain projects. Another aspect of cognitive understanding is to detect and predict human cognition. For instance, understanding people's emotions from their writing, or predicting how text or visual content arouse people's feeling. IBM System G also includes Spatio-Temporal analytics tools to facilitate analysis of moving objects. It also includes a set of behavioral analytics tools, e.g., anomaly detection, recommendations, etc.




You can engage IBM System G by:

  (1) utilizing any of its four types of Graph Computing tools: Graph Database, Analytics, Visualization, and Middleware, or any of the four types of derived Network Science Analytics tools including Cognitive Networks, Cognitive Analytics, Spatio-Temporal Analytics, and Behavioral Analytics to make new applications;

  (2) using it through IBM Cloud for your online services; or

  (3) deploying any of its existing six business solutions: Enterprise Expertise, Insider Threat, Social Media, Commerce, Healthcare, and Entertainment & Media.







IBM Research's graph software has been the winner of IEEE BigData 2013 Best Paper Award, and the Supercomputing's Graph 500 benchmarks. IBM System G runs on regular computers and IBM Cloud for datasets of within a few billion nodes and edges. It has powered a production system since early 2009 for enterprise social network analysis and enterprise location. Recently, it has been used for analyzing Twitter retweet and interaction graphs of 120,000,000 users with 2,000,000,000 edges, for influence, social bots, and fraud detection. It has been also processing Bitcoin graph of 200,000,000 vertices and 800,000,000 edges to analyze relationships of accounts and transactions for anomaly detection. Beta service of IBM System G is available on IBM Cloud on 1Q2015.

Graph 500 tested graphs beyond billions of vertices and edges. In the latest Graph 500 result in November 2015, IBM's BlueGene Q supercomputers at the Lawrence Livermore National Lab and the Argonne National Laboratory won the Top 2 and Top 3, while Japan's K computer at RIKEN Advanced Institute for Computational Science won the Top 1, all powered by our software. 183 supercomputers from 16 countries participated this benchmark. Overall, IBM Research's graph software powers 9 out of the Top 10 Graph 500 computers. The winner, K-supercomputer of 83K nodes and 663K cores, achieved graph traversal requirements of up to 38,621,400,000 vertices per second. IBM's graph software was chosen because its high-performance and highly scalability by leveraging various performance optimization methods on both shared-memory systems with many cores as well as distributed memory systems.



Overall, IBM System G has received more than $22M R&D funding.

In December 2013, IBM System G received IBM Corporate Outstanding Innovation Award and IBM Research Outstanding Accomplishment for its more than $50M contribution (~$80M to $100M) to the IBM Corporation.

In December 2014, IBM System G was recognized again on the same awards for its scientific contribution to the Social and Cognitive Network Science, with 29 patents, 89 publications, 4 keynote speeches, 30 invited talks, 7 panelists, 6 guest editors for special issues of journals, 6 long-term (a year or more) academic visitors, 7 best paper awards, and the first IEEE fellow cited by the contribution to Network Science.

It has also received various external awards.





For IBMers only, System G's internal website (http://systemg.ibm.com) has an additional 'Resource' tab which includes slides and more technical documents on the solutions, comparing to its external website (http://systemg.research.ibm.com) .