Spider Webs, Ribbon and bow Ties, Scale-Free Networks, And the Deep Web

The world wide web conjures up images of a giant spider web where everything is linked to devices in a random pattern and you can go from edge of the web to another by just following the right links. Theoretically, that’s what makes the web not the same as of typical listing system: You can follow hyperlinks from dark web sites page to another. In the “small world” theory of the web, every web page is viewed as separated from any other Web page by an average of about 19 clicks. In 1968, sociologist Stanley Milgram invented small-world theory for social networks by noting that every human was separated from any other human by only six degree of splitting up. On the web, small world theory was supported by early research on a small testing of web sites. But research conducted mutually by scientists at IBM, Compaq, and Alta Windows vista found something entirely different. These scientists used a web crawler to name 200 million Web pages and follow 1. 5 thousand links on these pages.

The examiner learned that the web was not like a spider web at all, but instead like a ribbon and bow tie. The bow-tie Web had a inch strong connected component” (SCC) composed of about 56 million Web pages. On the right side of the ribbon and bow tie was some forty four million OUT pages that you could get from the center, but could not come back to the middle from. OUT pages offered help to be corporate intranet and other web sites pages that are designed to trap you at the site when you land. On the left side of the ribbon and bow tie was some forty four million IN pages where you have access to to the center, but that you could not am from the center. We were holding recently created pages that had not yet been associated with many hub pages. In addition, 43 million pages were classified as inch tendrils” pages that did not url to the middle and might not be associated with from the center. However, the tendril pages were sometimes associated with IN and/or OUT pages. Occasionally, tendrils linked together without passing through the center (these are called “tubes”). Finally, there were 16 million pages totally shut off from everything.

Further evidence for the non-random and structured nature of the Web is provided in research performed by Albert-Lazlo Barabasi at the University of Notre Dame. Barabasi’s Team found that far from being a random, exponentially exploding network of 50 thousand Web pages, activity on the web was actually highly concentrated in “very-connected super nodes” that provided the connection to less well-connected nodes. Barabasi named this type of network a “scale-free” network and found parallels in the growth of cancers, diseases transmission, and computer trojans. As its similar, scale-free networks are highly liable to devastation: Destroy their super nodes and transmission of messages breaks down rapidly. On the upside, if you are a marketer trying to “spread the message” about your products, place your products on one of the super nodes and watch what is this great spread. Or build super nodes and attract a huge audience.

Thus the picture of the web that emerges from this research is quite not the same as earlier reports. The notion that most twos of web pages are separated by a handful of links, almost always under 20, and that the number of connections would grow exponentially with the size of the web, is not supported. In fact, there is a 75% chance that there is no path from randomly chosen page to another. With this knowledge, it now becomes clear why the most advanced web search engines only listing a very small percentage of all web pages, and only about 2% of the overall population of internet hosts(about 400 million). Search engines cannot find most web sites because their pages are not well-connected or from the central core of the web. Another important finding is the identification of a “deep web” composed of over 900 thousand web pages are not easy to get to to web spiders that most search engine companies use. Instead, these pages are either exclusive (not available to spiders and non-subscribers) like the pages of (the Wall Street Journal) or are not easily obtainable from web pages. Within the last few years newer search engines (such as the medical search engine Mammaheath) and older ones such as yahoo have been revised to search the deep web. Because e-commerce revenues partially depend on customers being able to find a web site using search engines, web site administrators need to make a plan to ensure their web pages are the main connected central core, or “super nodes” of the web. One way to do this is to make sure the site has as many links as possible to and from other relevant sites,

Leave a Reply

Your email address will not be published. Required fields are marked *