Wednesday, December 29, 2010

Analysis of Techniques for Detection of Deep Web Search Interface

The volume of information on the web is increasing day by day. The information in the web can be broadly categorized into two types i.e. surface web and deep web. The surface web pages can be easily indexed through conventional techniques but the deep web, whose size assumed to be thousand times larger than surface web, cannot be indexed through conventional search technique. The first stage of the extraction of the deep web information is the detection of deep web search interface. A search interface is generally consisting of html forms. The conventional techniques of searching the deep web information is done by filling the html forms on the search interface manually but recently the research is going on automatic accessing and understanding of html forms. Being the first stage of deep web extraction process, the detection of deep web search interface becomes one of the important module of deep web information retrieval. In this paper a technical analysis of some of the important deep web search interface detection techniques is done to find out their relative strengths and limitations with reference to current development in the field of deep web information retrieval technology.

Keywords : Deep web, hidden web, search interface detection, crawler, random forest.
1. IntroductionThe The whole process of extraction of information from deep web can be broadly categorized into four steps i.e. query interface analysis, values allotment, response analysis & navigation and relevance ranking. Query interface analysis is the first and most important step for deep web information retrieval. In query interface analysis, a request of fetching a web page from a web server is made by a crawler. After completion of the fetching process, an internal representation of the web page is produced after parsing and processing of html forms based on the developed model. Further the query interface analysis can be broken into the some modules that are detection of hidden web search interface, search form schema matching and domain ontology identification. In these module the detection of hidden web search interface is the first and foremost step towards deep web information retrieval. As expected, a human user can easily identify a deep web search interface but to understand a deep web search interface through a automatic technique without human intervention is a challenging task [1][2][3][4][5]. Figures 1 depict the different types of search interfaces.

Fig. 1 : Different types of search interface

Related Work
One of the prominent works for detection of deep web search interface is done by Leo Breiman (2001)[6] in form of random forest algorithm. A random forest algorithm detects the deep web search interface by using a model, based on decision trees classification. A random forest model can be defined as a collection of decision trees. A decision tree can be generated by bootstrapping processing of the training data. Various classification trees can be generated through random forest algorithm. To classify a new object from its input vector, the sample vector is passed to every tree defined in algorithm. A decision for classification is given by every tree. A decision about most voted classification is done by using all of the classification results of the individual trees. The advantages of random forest algorithm are that it exhibits a substantial performance improvement over single tree classifiers and injecting of the right kind of randomness makes accurate classifiers and regulators. The disadvantage of this algorithm is that it may select unimportant and noisy features in the training data, as a result a bad classification results because of its random selection feature.
One of the deep web crawler architecture is proposed by Sriram Raghavan and Hector Garcia-Molina (2001) [7]. In this paper, a task-specific, human-assisted approach is used for crawl the hidden web. There are two basic problems related to deep web search, firstly the volume of the hidden web is very large and secondly there is a need of such type of crawlers which can handle search interfaces efficiently, which are designed mainly for humans. In this paper a model of task specific human assisted web crawler is designed and relized in HiWE (hidden web exposure). The HiWE prototype built at Stanford which crawl the dynamic pages. HiWE is designed to automatically process, analyze, and submit forms, using an internal model of forms and form submissions. HiWE uses a layout-based information extraction (LITE) technique to process and extract useful information. The advantages of HiWE architecture is that its application/task specific approach allows the crawler to concentrate on relevant pages only and with the human assisted approach automatic form filling can be done. Limitations of this architecture are that it is not precise with response to partially filled forms and it is not able to identify and respond to simple dependency between form elements.
A technique for collecting hidden web pages for data extraction is proposed by Juliano Palmieri Lage et al. (2002) [8] . In this technique the authors have proposed the concept of web wrappers. A web wrapper is programs which extract the unstructured data from web pages. It takes a set of target pages from the web source as an input. These set of target pages are automatically generated by an approach called “Spiders”. Spiders automatically traverse the web for web pages. Hidden web agents assist the wrappers to deal with the data available on the hidden web. The advantage of this technique is that it can access a large number of web sites from diverse domains and limitation of this technique is that it can access only that web site that follow common navigation patterns. Further, modification can be done in this technique to cover navigation patterns based on these mechanisms.
A technique for automated discovery of search interface from a set of html forms is proposed by Jared Cope, Nick Craswell and David Hawking (2003) [9]. This paper defined a novel technique to automatically detect search interface from a group of html forms. A decision tree was developed with the C4.5 learning algorithm using automatically generated features from html markup that can give a classification accuracy of about 85% for general web interfaces. Advantage of this technique is that it can automatically discover the search interface. Limitation of this technique is that it is based on single tree classification method and number of feature generation is limited due to use of limited data set. As a future work, modification is suggested that a search engine can be develop using existing methods for other stages along with the proposed one with a technique to eliminate false positives.
A technique for understanding web query interfaces through best effort parsing with hidden syntax is proposed by Zhen Zhang et al. (2004)[10]. This paper addresses the problem of understanding web search interfaces by presenting a best-effort parsing framework. The paper presented a form extractor framework based on 2P grammar and the best effort parses in a language parsing framework. It identifies the search interface by continuously producing fresh instances by applying productions until attaining a fix-point, when no fresh instance can be produced. Best effort parser technique minimizes wrong interpretation as much as possible in a very fast manner. It also understands the interface to a large extent. Advantage of this technique is that it is a very simple and consistent technique with no priority among preferences and it can handle missing elements in form and limitation of this technique is that establishment of single global grammar that can be interacted to the machine globally is a critical issue.
A technique named as “siphoning hidden web data through key word based interface” for retrieval of information from hidden web databases through generation of a small set of representative keywords and build queries is proposed by Luciano Barbosa and Juliana Freire (2004) [11]. This technique is designed to enhance coverage of deep web. Advantage of this technique is that it is a simple and completely automated strategy that can be quite effective in practice, leading to very high coverage of deep web. Limitation of this technique is that it is not able to achieve the coverage for collection whose search interface fixes a number of results. Further the authors have advised that modification can be done in this algorithm to characterize search interfaces techniques in a better way so that different notions and levels of security can be achieved.
An improved version of random forest algorithm is proposed by Deng et al. (2008) [12]. In this improved technique a weighted feature selection algorithm is proposed to generate the decision trees. The advantage of this improved algorithm is that it minimizes the problem of classification of high dimension and sparse search interface using the ensemble of decision trees. Disadvantage of this improved algorithm is that it is highly sensitive towards the changes in training data set.
Further improvement in random forest algorithm is done by Yunming Ye et al. (2009) [13] by using feature weighting random forest algorithm for detection of hidden web search interface. This paper had presented a feature weighting selection process rather than random selection process. Advantage of this technique is that it makes a weighted feature selection process instead of random selection hence reduces the chances of noisy feature selection and limitation of this techniques is that features available only in the search forms were used. Future modification suggested in random forest algorithm to investigate more feature weighting methods for construction of random forests.
An algorithm named as “The naive bayesian web text classification algorithm” is proposed by Ping Bai and Junqing Li (2009) [14] for automatic and effective classification of web pages with reference to given model for machine learning. In the conventional techniques, category abstracts are produced using the inspection by domain experts either through semiautomatic method or artificial method. All the items are provided equal important according to conventional common bayesian classifier whereas according to improved naive bayesian web text classification algorithm, whole of the items in every title are provided higher importance to others. The strength of this technique is that text classification results are very accurate and further scope in this algorithm is suggested to make the classification process automatic in an efficient way.
An approach for automatic detection and unification of web search query interfaces using domain ontology is proposed by Anuradha and A.K.Sharma (2010) [15]. The technique proposed in this paper works by concentrating the crawler on the given topic considering the domain ontology. This technique results in the pages which contains the domain specific search form. The strengths of this technique are that results are produced from multiple sources, human effort is reduced and results are very accurate in less execution time. Limitation of this technique is that it is domain specific.

Summary of various techniques for detection of deep web search interface
By going through the literature survey of some deep web search interface detection techniques, it is concluded that each techniques for detection of deep web search interface have some relative strengths and limitations. A tabular summary is given below in table 1, which summarizes the techniques, strengths and limitations of some of important detection techniques for deep web search interface.

Table 1 : Summary of various techniques for detection of deep web search interface

Authors	Technique	Strengths	Limitations
Leo Breiman (2001)	Forest of regression trees as classifiers	A substantial improvement in performance over single tree classifiers.	May include un-important or noisy features.

Sri Ram Raghavan et al. (2001)	Hidden Web Exposer	An application specific approach to hidden web crawling	Imprecise in filling the forms.

Palmieri Lage et al. (2002)	Hidden Web Agents	Wide coverage of distinct domains.	Restricted to web sites that follow common navigation patterns.

Jared Cope et al. (2003)	Single tree classifiers	Automatically discovery of search interface, performed well when rules are generated on the same domain.	Long rules, large size of feature space in training samples, Over fitting, Classification precision is not very satisfying.

Zhen Zhang et al. (2004)	2P Grammar and Best effort Parser	Very simple and consistent, No priority among preferences, Handling of missing elements in form.	Critical to establish single global grammar that can be interacted to the machine globally.

Luciano Barbosa et al. (2004)	Automatic query generation based on small set of keywords.	A simple and completely automated strategy that can be quite effective in practice	A large domain of Keywords has to be generated.

Deng, X. B. et al. (2008)	weighted feature selection algorithm	Minimizes the problem of classification of high dimension and sparse search interface using the ensemble of decision trees	Highly sensitive towards the changes in training data set.

Ye, Li, Deng et al.(2009)	Feature weighted selection process	Minimizes the chances of selection of noisy features.	No use of contextual information associated with forms.

Ping Bai et al.(2009)	Naïve Bayesian Algorithm	Text classification results are very accurate.	Classification algorithm is not automatic.

Anuradha et al. (2010)	Based on domain ontology	Results are produced from multiple sources, reduces the human effort, less execution time, accuracy is high.	It is domain specific.

Conclusion
Deep web search interface are the entry point for the searching of the deep web information. A deep web crawler should understand and detect the deep web search interface efficiently to facilitate the further process of deep web information retrieval. An efficient detection of deep web search interface may results towards a significant retrieval of deep web information so the first and foremost step of deep web information retrieval is the efficient understanding and detection of deep web search interface. In this paper a technical analysis of some of the techniques for detection of deep web search interface is done and it is concluded that each of them have some relative strengths and limitations in detecting of deep web search interface. To explore the deep web information efficiently, an efficient technique for detection of deep web search interface should be designed which should have strengths simultaneously and particularly in terms of wide coverage of different domains, automatic procedure, resistant to noisy and unwanted features, ability to consider the features as per their importance, application specific approach as per requirement and user friendly approach. Finally the technique for detection of deep web search interface should be compatible with current web technology.

This is a paper published by :Dilip Kumar Sharma1, A. K. Sharma21GLA University, Mathura, UP, India. Email: todilipsharma@rediffmail.com
2YMCA University of Science and Technology, Faridabad, Haryana, India

I seem to be intrested in the topic specified by the author.Hope this become useful for any of the users of the blog.

References
Bergman, M.K. (2001). The Deep Web: Surfacing Hidden Value. In The Journal of Electronic Publishing, Vol. 7, No. 1.
Peisu, X., Ke, T. and Qinzhen, H.(2008). A Framework of Deep Web Crawler. In Proceedings of the 27th Chinese Control Conference, Kunming,Yunnan, China.
Sharma, D. K., and Sharma, A.K. (2010). Deep Web Information Retrieval Process: A Technical Survey. In International Journal of Information Technology & Web Engineering, USA, Vol 5, No. 1.
Khare, R., An, Y., and Song, Y. (2010). Understanding Deep Web Search Interfaces: A Survey. In ACM SIGMOD Record, Volume 39 , Issue 1, PP: 33-40.
Sharma D. K., and Sharma A.K. (2009). Query Intensive Interface Information Extraction Protocol for Deep Web., In Proceedings of IEEE International Conference on Intelligent Agent & Multi-Agent Systems, PP. 1-5 , IEEE Explorer.
Breiman, L. (2001). Random Forests. In Machine Learning, Vol. 45, No.1, PP: 5-32, Kluwer Academic Publishers.
Raghavan, S. and Garcia-Molina, H. (2001). Crawling the Hidden Web. In Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy.
Lage, P. et al. (2002). Collecting Hidden Web Pages for Data Extraction. In Proceedings of the 4th international workshop on Web information and data management , PP: 69-75.
Cope, J., Craswell, N., and Hawking, D. (2003). Automated Discovery of Search Interfaces on the web. In
Proceedings of the Fourteenth Australasian Database Conference (ADC2003), Adelaide, Australi,a.
Zhang, Z., He, B., and Chang, K. (2004). Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. In Proceedings of ACM International Conference on Management of Data ,PP: 107-118.
Barbosa, L., and Freirel, J.(2004). Siphoning Hidden-Web Data through Keyword-Based Interface., In Proceedings of SBBD.
Deng, X. B., Ye, Y. M., Li, H. B., & Huang, J. Z. (2008). An Improved Random Forest Approach For Detection Of Hidden Web Search Interfaces. In Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, China. IEEE.
Ye, Y., et al. (2009). Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces. In Computational Linguistics and Chinese Language Processing , Vol. 13, No. 4, PP: 387-404.
Bai, P., and Li, J.(2009). The Improved Naive Bayesian WEB Text Classification Algorithm, In International Symposium on Computer Network and Multimedia Technology, IEEE Explorer.
Anuradha, and Sharma, A.K. (2010). A Novel Approach For Automatic Detection and Unification of Web Search Query Interfaces Using Domain Ontology. In International Journal of Information Technology and Knowledge Management, July-December, Vol. 2, No. 2,PP: 196-199.
Dilip Kumar Sharma is B.Sc, B.E.(CSE), M.Tech.(IT), M.Tech. (CSE) and pursuing Ph.D in Computer Engineering. He is life member of CSI, IETE, ISTE,, ISCA, SSI and member of CSTA, USA. He has attended 21 short term courses/workshops/seminars organized by various esteemed originations. He has published 21 research papers in International Journals /Conferences of repute and participated in 18 International/National conferences. Presently he is working as Reader in Department of Computer Science, IET at GLA University, Mathura, U.P. since March 2003 and he is also CSI Student branch Coordinator. His research interests are deep web information retrieval, Digital Watermarking and Software Engineering. He has guided various projects and seminars undertaken by the students of undergraduate/postgraduate.
Prof. A. K. Sharma received his M.Tech. (CST) with Hons. from University of Roorkee (Presently I.I.T. Roorkee) and Ph.D (Fuzzy Expert Systems) from JMI, New Delhi and he obtained his second Ph.D. in Information Technology form IIITM, Gwalior in 2004. Presently he is working as Dean, Faculty of Engineering and Technology & Chairman, Dept of Computer Engineering at YMCA University of Science and Technology, Faridabad. His research interest includes Fuzzy Systems, OOPS, Knowledge Representation and Internet Technologies. He has guided 9 Ph.D thesis and 8 more are in progress with about 175 research publications in International and National journals and conferences. The author of 7 books, is actively engaged in research related to Fuzzy logic, Knowledge based systems, MANETS, Design of crawlers. Besides being member of many BOS and Academic councils, he has been Visiting Professor at JMI, IIIT&M, and I.I.T. Roorkee.

Nature Inspired Machine Intelligence

Artificial Neural Networks
Evolutionary Algorithms
Swarm Intelligence
Harmony Search
Simulated Annealing
Membrane Computing
Artificial Immune System (AIS)
DNA Computation
Computing with Words
Artificial Life
Quantum Computation
Hybrid Approaches

If All the above features can be combined and made effective implementation on a real-time basis which should provide the right information at the right time can be said as NIMI (Nature Inspired Machine Intelligence). No matter who develops, any R&D division of a small company to Microsoft, Infosys, TCS, Wipro can emerge as a indisputable common brand in common mans day to day life. I say it because I believe technology has to be cheaper and should always reach common man as a help. What is the future of burning carbon and making a global footprint on Ozone layer? To avoid education and methodologies has to evolve from basic to basic+1 at least.

References[1] A. Abraham, “Neuro-Fuzzy Systems: State-of-the-Art Modeling Techniques”, in Jose Mira and Alberto Prieto, eds., Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence, Springer Verlag Germany, 2001, pp. 269-276.
[2] W. Banzhaf, P. Nordin, E.R. Keller, and F.D. Francone, “Genetic Programming: An Introduction on The Automatic Evolution of Computer Programs and its Applications”, Morgan Kaufmann Publishers, Inc., 1998
[3] Kirkpatrick, S., C. D. Gelatt Jr., M. P. Vecchi, Optimization by Simulated Annealing, Science, 220, 4598, 671-680, 1983.
[4] G. Paun, Computing with membranes, Journal of Computer and System Sciences, 61 (1), 108-143, 2000.
[5] Deutsch, D., Quantum Theory, the Church-Turing Principle, and the Universal Quantum Computer”. Proc. Roy. Soc. Lond. A400, 97–117, 1985.
[6] A. Abraham, Intelligent Systems: Architectures and Perspectives, Recent Advances in Intelligent Paradigms and Applications, Abraham A., Jain L. and Kacprzyk J. (Eds.), Studies in Fuzziness and Soft Computing, Springer Verlag Germany, ISBN 3790815381, Chapter 1, pp. 1-35, 2002.
[7] Bishop C.M., Neural Networks for Pattern Recognition, Oxford University Press, Oxford, UK, 1995.
[8] Fogel, D. B., Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ, Second edition, 1999.
[9] Kennedy J. and Eberhart R. Swarm intelligence. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2001.
[10] Passino, K.M., Biomimicry of Bacterial Foraging for Distributed Optimization and Control, IEEE Control Systems Magazine, pp. 52-67, June 2002.
[11] de Castro, L. N. and Timmis, J. I., Artificial Immune Systems: A New Computational Intelligence Approach, Springer-Verlag, London, 2002.
[12] Amos M., Theoretical and Experimental DNA Computation. Springer, ISBN: 3-540-65773-8, 2005.
[13] Zadeh L.A. and Kacprzyk J. (Eds.) Computing with Words in Information/Intelligent Systems: Foundations, Studies in Fuzziness and Soft Computing, Springer Verlag, Germany, ISBN 379081217X, 1999.
[14] Reynolds R.G., Michalewicz, Z. Cavaretta M.J., Using Cultural Algorithms for Constraint Handling in GENOCOP. Proceedings of the Fourth Annual Conference on Evolutionary Programming. MIT Press, Cambridge, pp. 289-305, 1995.
[15] C. Adami, Introduction to Artificial Life. Springer-Verlag New York, Inc., 1998.
[16] Z.W. Geem, J.H. Kim, and G.V. Loganathan, “A new heuristic optimization algorithm: harmony search”, Simulation 76 (2), 60–68, 2001.

Harmony Search Algorithm

Harmony search (HS) is a music-inspired algorithm (Geem et al., 2001) and has been applied to various optimization problems including music composition, Sudoku puzzle, magic square, timetabling, tour planning, logistics, web page clustering, text summarization, Internet routing, visual tracking, robotics, energy system dispatch, power system design, cell phone networking, structural design, water network design, dam scheduling, flood model calibration, groundwater management, soil stability analysis, ecological conservation, vehicle routing, heat exchanger design, satellite heat pipe design, offshore structure mooring, RNA structure prediction, medical imaging, medical physics, etc (Geem, 2009; 2010a). Recently, HS was also applied to astronomical data analysis, which was published in Nature (Deeg et al., 2010).
Each musician in music performance plays a musical note at a time, and those musical notes together make a harmony. Likewise, each variable in optimization has a value at a time, and those values together make a solution vector. Just like the music group improves their harmonies practice by practice, the algorithm improves its solution vectors iteration by iteration.
The HS algorithm basically has three operations, such as memory consideration, pitch adjustment, and random selection. Using memory consideration operation, HS chooses a value from harmony memory (HM); using pitch adjustment operation, HS chooses a value which is slightly modified from HM; and using random selection operation, HS chooses a value randomly from entire value range. These basic operations constitute a novel stochastic derivative (Geem, 2008), instead of traditional calculus-based derivative, in order to search for the right direction to the optimal solution.
For more advanced issues in HS, researchers have researched exploratory power (Das et al., 2010), multi-modal solution space (Gao et al., 2009), multi-objective optimization (Geem, 2010b), distributed memory (Pan et al., 2010), hybridization (Fesanghary et al., 2008), and adaptive theory (Geem and Sim, 2010). In addition, HS has a unique derivative which considers the relationship among variables (Geem, 2011).

References
Das, S., Mukhopadhyay, A., Roy, A., Abraham, A., & Panigrahi, B. K. (2010) Exploratory Power of the Harmony Search Algorithm: Analysis and Improvements for Global Numerical Optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, http://dx.doi.org/10.1109/TSMCB.2010.2046035
Deeg, H. J., Moutou, C., & Erikson A. et al. A transiting giant planet with a temperature between 250 K and 430 K. Nature, 464, 384-387.
Fesanghary, M., Mahdavi, M., Minary-Jolandan, M., Alizadeh, Y. (2008). Hybridizing harmony search algorithm with sequential quadratic programming for engineering optimization problems. Computer Methods in Applied Mechanics and Engineering, 197(33-40), 3080-3091.
Gao, X. Z., Wang, X., & Ovaska S. J. (2009) Uni-modal and Multi-modal Optimization Using Modified Harmony Search Methods. International Journal of Innovative Computing, Information and Control, 5(10A), 2985-2996.
Geem, Z. W. (2008). Novel Derivative of Harmony Search Algorithm for Discrete Design Variables. Applied Mathematics and Computation, 199(1), 223-230.
Geem, Z. W. (2009). Music-Inspired Harmony Search Algorithms: Theory and Applications. Berlin: Springer.
Geem, Z. W. (2010a). Recent Advances in Harmony Search Algorithm. Berlin: Springer.
Geem, Z. W. (2010b). Multiobjective Optimization of Time-Cost Trade-Off Using Harmony Search. ASCE Journal of Construction Engineering and Management, 136(6), 711-716.
Geem, Z. W. (2011). Stochastic Co-Derivative of Harmony Search Algorithm. International Journal of Mathematical Modelling and Numerical Optimisation, 2(1), 1-12.
Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2001). A New Heuristic Optimization Algorithm: Harmony Search. Simulation, 76(2), 60-68.
Geem, Z.W., Sim, K.-B. (2010). Parameter-Setting-Free Harmony Search Algorithm. Applied Mathematics and Computation, http://dx.doi.org/10.1016/j.amc.2010.09.049
Pan, Q.-K., Suganthan, P.N., Liang, J. J., Tasgetiren, M.F. (2010). A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization, 42(2), 101 - 117.

Thursday, July 1, 2010

Indian Copyright Act may be amended soon, DRM may become legally breakable

A recent bill introduced by the Union Government seeks to amend the Copyright Act of 1957, attempting to bring Indian copyright laws at par with the World Intellectual Property Organisation (WIPO)’s treaties, and give creative contributors to copyrighted material more rights. The Copyright Amendment Bill 2010, if ratified, will greatly affect the music, film, and photography industries of India.

Importantly, the amendment also proposes that private & personal copying will be treated as 'fair dealing'. This interestingly allows users to break DRM (digital rights management) on their legally purchased content, as long as they are not violating copyright terms. This will allow them to move and use the content on various devices. Amazingly, this provision will also allow developers to make and sell tools to break DRM protection.

These are the other major changes:

Independent authorship rights for lyricists, composers, and singers in films, which presently belong only to the producer and music company for the film.
Addressing concerns of music companies for not being able to derive royalties from “version recordings” of their original songs
Producer and principal director will be treated as joint first-owners of the copyright, which at present only belong to the producer
Change in the term of copyright for photographers from 60 years to “life plus 60 years”
Allow physically challenged persons to access the copyright material in specialized formats
Proposes to make the Copyright Act conform to WIPO’s Internet treaties of anti-circumvention, giving equal rights to both online and offline work.
Statutory/compulsory licensing for broadcasting companies to be allowed to access written, audio, and video works.

As expected, the changes proposed have been met by strong opposition and disgruntlement, mainly from music companies, who claim that the Indian film and music industry cannot be compared to any other in the world, and will have to have different laws. Even those who stand to gain by the new provisions are sceptical if the amendment will ever be passed, as many of the changes run counter to the manner in which the Indian industry has worked for many decades, and some changes, such as individual rights for lyricists and composers cannot be found elsewhere in the world.

The ‘fair dealing’ provision on personal copying of DRM-protected content in India, if ratified, will make a lot of Indians very happy, as well as make a lot of people across the world very envious of them. Here’s to the Copyright Amendment Bill 2010 getting passed!

Microsoft launches Visual Studio 2010 and .NET Framework 4

Microsoft has launched the full-release general availability versions of Microsoft Visual Studio 2010 and .NET Framework 4, with a host of new features for developers. To fully-inaugurate their flaghsip developer products, Microsoft will be holding more than 150 developer events across the globe. Users will have access to many popular extensions for both Visual Studio 2010 and .NET Framework 4, made by over 50 partners and available at the time of the release.

Visual Studio 2010 has the all new Visual Studio editor, which will use Windows Presentation Foundation, support ribbon interface, multiple monitors, Windows 7 multitouch, SharePoint functionality, Windows Azure tools, and IntelliTrace, a new product that will help eradicate nonreproducible bugs. It will also come bundled with Expression Studio, Business & Enterprise Servers, and Microsoft Office in the Ultimate and Premium SKUs

The .NET Framework will also feature built-in support for industry standards, for high-performance middle-tier applications (including parallel programming, workflow, and service-oriented applications), and ASP.NET Model-View-Controller, and will also feature Dynamic Language Runtime. Developers can also enjoy a side-by-side installation with .NET Framework 3.5, and get the advantage of a runtime that has been decreased in size by 80%, making it entirely faster.

Microsoft will also launch the Release to Web version of Silverlight 4 sometime today, which will include more than 60 customizable pre-written controls, extended out-of-browser capabilities, and enterprise application enhancements.

Visual Studio 2010 is available in 4 SKUs:
Ultimate with MSDN - $11,924 New or $3,841 Renewal
Premium with MSDN - $5,469 New or $2,299 Renewal
Professional with MSDN - $1,199 New or $799 Renewal
Professional with MSDN - $1,199 New or $799 Renewal
Here's what the products will offer:

VS 2010	Features	MSDN Premium Benefits	Additional Software
Ultimate	IntelliTrace Historical Debugging Comprehensive Testing Tools Test Case and Test Lab Management Advanced UML Architecture Tools Architectural Discovery Tools Unit Testing with Code Coverage and Test Prioritization Code Analysis, Metrics and Optimization Database Development and Testing Tools	250 hours Azure Usage TFS License (1 CAL) 4 Support Incidents	Windows OS & Servers XNA Game Studio Expression Studio Office SQL Server Business & Enterprise Servers
Premium	Advanced Application Development & Debugging Unit Testing with Code Coverage and Test Prioritization Code Analysis, Metrics & Optimization Database Development and Testing Tools Read Only Architectural Diagrams	100 hours Azure Usage TFS License (1 CAL) 4 Support Incidents	Windows OS & Servers XNA Game Studio Expression Studio Office SQL Server Business & Enterprise Servers
Professional	Application Development & Debugging Unit Testing	50 hours Azure Usage TFS License (1 CAL) 2 Support Incidents	Windows Client and Server OS SQL Server XNA Game Studio

The testing as well as promos were out,now it just the new full body embodiment!!

Scientists have created bacteria-based artificial life...Nearly???

According to me, say nearly because they have created the 'software' part, which is no mean feat, considering it took nearly 15 years for them to do it. But is the hardware far behind?

Although we have evolved almost beyond recognition from our single-celled beginnings, the question of how life started has troubled scientists for a long time now. As typical scientific methodology prescribes, scientists are trying to find the answer to this big question by making efforts to generate life in the laboratory - life from lifeless chemicals. Genetic engineering is the manipulation of the genetic structure, DNA make-up, or genome of a living being, which is like the 'source-code' of the organism, present in every cell, determining everything about him/her/it. Although generating living cells from chemicals isn't possible as of now, we have been tinkering with the chemicals that make up the DNA to create artificial genomes.

Now, a scientific team - headed by Drs. Craig Venter, Hamilton Smith and Clyde Hutchison of the J. Craig Venter Institute, USA - has created the first artificial bacterium by transplanting an artificially-created genome into a naturally occuring host cell. Or in a rough analogy, the operating system has been created, but is loaded in borrowed hardware. In 2008, the JCVI team synthesized a small bacterial genome; however they were unable to activate that genome in a cell at that time - the Beta version of the OS crashed. This time, however, the cell has 'booted up' and created over a billion copies of itself, which contained and displayed the characteristics of the synthetic DNA. This is the first cell controlled completely by a synthetic genome. The genome, known as Mycoplasma mycoides, is also a work of leviathanic proportions. It contains 1.08 million base pairs (like 1.08 million lines of code) and is the largest chemically defined structure ever synthesized in the laboratory. But to say at this point, that 'Artificial life has been created', would not be very accurate, as in this case, a synthetic genome was inserted into old microplasma cells, and that is not the same as 'creating life', for truly new life wouldn't require an existing living recipient cell.

Assembly of Mycoplasma mycoides (click to enlarge)

The new technique can allow us to create brand new genomes that do just what we demand of them. They could produce bio-fuels for us, gulp up the excess carbon-dioxide in the atmosphere or clear an oil-spill, like the one that has crippled the shores of Louisiana. But as Dr Helen Wallace from Genewatch UK - an organisation that monitors developments in genetic technologies - puts it, "If you release new organisms into the environment, you can do more harm than good. By releasing them into areas of pollution, [with the aim of cleaning it up], you could actually be releasing a new kind of pollution because we don't know how these organisms will behave in the environment."

Though we are now writing the software of life, we'd have to also look out for bugs and loopholes. If there's a crash, the damage might not be reversible through a Quick Format.

Quantum Communication goes super-secure

Quantum particles, i.e. small sub-atomic particles like photons, are the carriers of information in Quantum Communication, where the 'quantum state' of the particle determines whether you are sending a 0 or a 1. While encrypting Quantum messages was already possible, researcher Robert Malaney, has further made Quantum Communication super-secure. The University of New South Wales' telecommunications researcher has developed a technique called 'unconditional location verification', where the location of the recipient of information is fixed. The protocol proposes to send an encrypted key to three wireless towers closest to the recipient, who is then required to decrypt and send back the information instantaneously. The location of the recipient is then determined using knowledge of transmission speeds, and further communication takes place only if the recipient is at the desired location. So, even if someone has mastered the decryption of all your classified information, he'd probably need to have the audacity of standing in your vicinity to do it.

The security that this technique provides would be desirable for organisations like banks, intelligence agencies, digital media distributors etc., who need point-to-point communication. This, for example, will ensure that the encryption technique of your bank transactions are further strengthened by cross checking your location of access, which in this case might be your home address, which is registered with the bank.

Windows 8 plans leaked, sound awesome; Windows Store to be Microsoft's App Store

In the last few months, leaked pictures and documents become have become a daily affair for us.

However, this one made us sit straight and notice - leaked footprints for Windows 8. Seriously, we haven’t yet poured even half our love over the awesome Windows 7, before this thing hit us square this morning. Like all leaks, only time shall confirm the authenticity of these numerous leaked slides. But the detail and the obvious directions they point to, give them a lot of weight.

These slides, each labelled confidential, probably were used for some behind-the-doors presentation and contain tremendous detail and some very mouth-watering prospects. For the eager eyes, here lies the list and our views.

Kinect-style sensing: When Microsoft put so much effort behind the creation of the wonder that is Kinect (earlier known as ‘Project Natal’), we expected (and wished in our dreams!) that the technology would be ported to the Windows too. There is ‘no’ sign of motion gestures for operating Windows, or any mention of motion-based gaming, but the slides show the plans of motion-detection to put Windows to ‘sleep’ or ‘wake’ it up. Yes, as the slides above indicate, plans are that user’s entry or exit would be used to log-in or put the computer to sleep. This would also mark the introduction of sensor support, like proximity sensors, to Windows. Neat, but we want more of Kinect!

Quick on/off: Booting times are irritating, we know! They may soon, however, be a thing of the past as Microsoft plans to introduce a feature, dubbed as ‘Log off+Hibernate’, that would enable quick powering up from 0 watts consumption. This also is in line with the “Big stuff” that ‘ This is an absolute necessity, considering that Windows 8 is as much for handhelds (which needs always-on operating system) as it is for desktops. More on ‘Windows 8 and handhelds’, read on.

Windows Store: It’s a trend Apple started and now everyone wants to have an App store. But we suppose that Windows is a platform which needs this the most. Here’s our view: Rivals have always thrown dirt on Microsoft for the vulnerability and the security loopholes in its operating system. App store not only give Microsoft a huge source of revenue but also a point of centralised vigilance. As one of the slides suggests, developers would be able to upload the app to be available in only certain geographical areas, and also decide the type of devices it would be available for. The concept pictures show an all-encompassing vision, especially the 'developer's dashboard' above, which would be additional lure for them, if billions of users of Windows aren't. We just can’t seem to stress enough how much difference this would make. Instead of unreliable software from around the net, finally, we can have reliable, checked Apps. The largest of platforms might be the last to get its App store, but take our word, it would dwarf all others around.

Dell hid the fact that it deliberately shipped 11.8 million potentially faulty computers between 2003 and 2005

Possibly the biggest news of the month, Dell has admitted to some very shady behaviour. A company not known for its squeaky clean record, Dell’s troubles this time around are centred on recently unsealed court papers, which contained a mighty revelation within them: Dell knowingly shipped 11.8 million Optiplex PCs that were potentially defective due to a faulty capacitor, between May 2003 and June 2005.

The capacitors, mostly manufactured by Nichicon, showed a 97% failure rate in a study conducted soon after the first problems started showing up. So what did Dell do? They didn’t recall the series, but instead, told their representatives to hide the problem, and then went on to continue shipping the PCs, and tell customers that the problem was caused by them overworking the system! A way to enjoy service fees? Possibly. More likely a way to get away from the bad publicity. However, the very same unsealed court papers also reveal that in 2005, Dell apparently paid out a $300 million fine to various companies, including Advanced Internet Technologies, who was the main plaintiff in the matter.

Wednesday, May 19, 2010

Technology: Microsoft Warns of Windows 7 Graphics Flaw

A flaw with the graphics driver in Windows 7 could compromise the stability and security of PCs, Microsoft has warned.

The vulnerability lies in the Windows Canonical Display Driver (cdd.dll) for the 64-bit versions of Windows 7 and Windows Server 2008 R2.

"If exploited, it would likely cause the affected system to stop responding and restart," Jerry Bryant, group manager of response communications warns on the Microsoft Security Response Center blog. "Code execution, while possible in theory, would be very difficult due to memory randomisation, both in kernel memory and via Address Space Layout Randomisation (ASLR)."

Microsoft claims that the vulnerability only affects machines running the Aero graphics interface, and advises that customers "may choose to disable Windows Aero as a workaround to protect against potential threats" until the company releases a fix.

That said, Microsoft claims that the chances of the flaw being exploited in the wild are low, and have awarded the bug the lowest possible score on its Exploitability Index.

Further details of the flaw can be found in Microsoft's security advisory.

Tuesday, April 13, 2010

Web sites that can take a punch

The recent, well-publicized cyberattack on Google was just the latest skirmish in a long war. And like most long wars, this one features an arms race, as hackers seek out new security holes, and web site administrators try to close them.

Systems for detecting attacks against networked computers are commercially available, and academic and industrial researchers are constantly improving them. But when a web site is under attack, its only viable defense may be to take its servers offline, which, in the short term, can cost it money in lost revenue and productivity and, in the long term, could hurt its credibility. Indeed, knocking a site offline may be an attackers’ sole intention.

MIT researchers have developed a system to keep web servers — or, for that matter, any Internet-connected computers — running even when they’re under attack. The work was funded largely by the U.S. Defense Department’s Defense Advanced Research Projects Agency (DARPA), and in a pair of tests whose thoroughness is unusual in academia, DARPA hired a group of computer security professionals outside MIT to try to bring down a test network protected by the new system. In both tests, says Martin Rinard, the professor of electrical engineering and computer science who led the research, the system exceeded all the performance criteria that DARPA set for it.

The MIT system was developed by a host of researchers, including not only Rinard but Jeff Perkins, a research scientist at MIT’s Computer Science and Artificial Intelligence Lab, Postdoctoral Fellow Stelios Sidiroglou-Douskos and Professor Michael Ernst, who has since moved to the University of Washington. During normal operation, it monitors the programs running on an Internet-connected computer to determine their normal range of behavior, and during an attack, it simply refuses to let them wander outside that range.

To take a simple example, suppose that a program running on a web server routinely stores data in one of two memory locations — call them A and B. During an attack, malicious code tries to trick the program into storing data at location C instead. The MIT system won’t let it: instead, it sends the data to either location A or location B.

Of course, the data may not be of a type that belongs at either of those locations. And the system will modify behaviors that could be even more disruptive than data storage. But in sites with large banks of servers, the MIT system gets several chances to find the best response to an attack. If storing at location A causes one server in the bank to crash, the MIT system will tell the other servers to store it at location B, instead.

“The idea is that you’ve got hundreds of machines out there,” Rinard says. “We’re saying, ‘Okay, fine, you can take out six or 10 of my 200 machines.’” But, he adds, “by observing what happens with the executions of those six or 10 machines, we’ll be able to deploy patches out to protect the rest of the machines.” The entire process of recognizing an attack, testing a number of countermeasures and deploying the most effective ones can take a matter of seconds.
Baptism by fire

In the first of DARPA’s two field tests, engineers at a computer security firm — the so-called red team — were given the code for the MIT defense system. (In the real world, a company that marketed such a system would make every effort to keep its code secret, but Rinard says that it’s standard practice in the security field to consider the worst-case scenario.) The red team had several months in which to devise attacks against a hypothetical network protected by the system. During the test itself, no malicious code was allowed to execute on the protected computers, and in 70 percent of cases, the MIT system kept the applications running on those computers from going down. DARPA also set performance goals for the system, such as the amount of extra processing power it required, and the extent to which it altered the applications’ normal operation. In all cases, the system was well within DARPA’s prescribed limits.

The first red-team exercise considered cases in which hackers tried to infect computers with malicious code, and the MIT researchers presented the results of the test at the Association for Computing Machinery’s Symposium on Operating Systems Principles last fall. A second red-team exercise, testing an updated version of the defense system that the MIT researchers developed together with defense contractor BAE Systems, concluded at the end of January. That test evaluated the system’s ability to handle a different kind of attack, which seeks to circumvent security checks that web applications typically perform to ensure that users have permission to access protected information. Although the researchers are still sorting through the data from that test, Sidiroglou-Douskos says that the system’s success rate in keeping applications up and running rose from 70 percent to 90 percent.

Angelos Keromytis, an associate professor of computer science at Columbia University, who works on related techniques for combating cyberattacks, says that the MIT approach is “very original,” but cautions that Web developers may be reluctant to adopt it anytime soon. “They’re wary of a system that changes another system automatically,” Keromytis says. “When they manually make changes to their systems, they break them, so they think that automatically doing it is going to be worse.” Keromytis points out, however, that while DARPA has run a number of red-team exercises evaluating new technologies in a range of areas, “This is probably one of the most successful exercises that I have seen.” The mere fact that DARPA was willing to spend so much money testing the system, Keromytis says, indicates that “they think it’s close enough to a rough prototype that works, which is more than one can say for most academic research.”

Monday, April 5, 2010

iPad Arrives This Saturday

CUPERTINO, California—March 29, 2010—Apple’s magical new iPad will be available in all 221 US Apple® retail stores and most Best Buy stores this Saturday, April 3, beginning at 9 a.m. Starting at just $499, iPad lets users browse the web, read and send email, enjoy and share photos, watch HD videos, listen to music, play games, read ebooks and much more, all using iPad’s revolutionary Multi-Touch™ user interface. iPad is just 0.5 inches thick and weighs just 1.5 pounds—thinner and lighter than any laptop or netbook—and delivers up to 10 hours of battery life.*

“iPad connects users with their apps and content in a far more intimate and fun way than ever before,” said Steve Jobs, Apple’s CEO. “We can’t wait for users to get their hands and fingers on it this weekend.”

Apple retail stores will offer a free Personal Setup service to every customer who buys an iPad at the store, helping them customize their new iPad by setting up their email, loading their favorite apps from the App Store, and more. Also beginning Saturday morning, all US Apple retail stores will host special iPad workshops to help customers learn more about this magical new product.

Pricing & Availability
iPad will be available in Wi-Fi models on April 3 in the US for a suggested retail price of $499 for 16GB, $599 for 32GB, and $699 for 64GB. The Wi-Fi + 3G models will be available in late April for a suggested retail price of $629 for 16GB, $729 for 32GB and $829 for 64GB. iPad will be sold in the US through the Apple Store® (www.apple.com), Apple’s retail stores, most Best Buy stores, select Apple Authorized Resellers and campus bookstores. The iBooks app for iPad including Apple’s iBookstore will be available as a free download from the App Store in the US on April 3.

*Battery life depends on device settings, usage and other factors. Actual results vary.

Apple ignited the personal computer revolution in the 1970s with the Apple II and reinvented the personal computer in the 1980s with the Macintosh. Today, Apple continues to lead the industry in innovation with its award-winning computers, OS X operating system and iLife and professional applications. Apple is also spearheading the digital media revolution with its iPod portable music and video players and iTunes online store, and has entered the mobile phone market with its revolutionary iPhone.

Saturday, March 27, 2010

Yes, You Can Build a Web Company in India. Here’s How.

Silicon Valley and India have a cozy relationship, but a big question has resulted in friction, failed companies and millions in losses: When will the Internet catch on in India in a big way?

A few companies have done well and a few more are coming up, slowly but surely. But there are hardly any true breakout hits.

RedBus

is pretty close. It’s essentially an Expedia for bus tickets in India. It sells about 3,500 bus seats per day, is the fourth most-trafficked Web site in India and has at least tripled its revenues year-over-year. The company sells seats for roughly half the bus operators in India, and that’s saying something: This is an insanely fragmented market that had next to zero centralization just a few years ago. All of this has been built in three years on about $1 million in venture funding. (The company raised another $1.3 million in 2008, but it’s still in the bank. Investors include Helion

, Inventus

andSeedfund

I can vouch for the company being cheap. Having spent my morning in the plush eight-acre Infosys headquarters, the offices of RedBus were a marked contrast. They are split among two buildings located in one of those very chaotic Indian neighborhoods where vendors are shouting, cows are wandering and smell of open sewers is not too far off. It feels far from the sanitized, steel-and-glass rows of multinationals.

None of this is intended as an insult– co-founder and CEO Phanindra Sama is proud of his cheapness. (Sama is pictured above, sorry it’s so blurry. My camera was having issues.) We met in a no-frills, un-airconditioned conference room. He didn’t turn on the air conditioning for famed Silicon Valley Indian entrepreneur Kanwal Rekhi

, when he visited last month either—and Rekhi is an investor in RedBus.

Despite the sweat trickling down my forehead, arms, legs and back throughout the interview, I didn’t want to leave. What Sama and his two founders have pulled off in a short period of time with little funding in India is impressive.

Background for Americans: There are two kinds of buses in India—those that make stops and have ticket-takers on board and that go to one destination only and sell pre-paid tickets only. There are some 3,000 operators of the latter category and, before RedBus, there was no way to contact them directly. To get a bus ticket, you went to an agent. That agent only had inventory from a few bus lines. To book the ticket, he or she would call one person who was in charge of booking every seat on that particular route. There was a long wait time, and frequently the routes the agents knew about were sold out – meaning you had to change your travel plans, or find another agent who had different sources. Meanwhile there was no standardization on pricing and commissions. The agent simply wrote the cost on a piece of paper and if you wanted to ride, you paid it.

Now, RedBus has a central database that gets seats from half of India’s bus operators. It has done so well that it powers the bus ticket applications for most of India’s more general travel sites like MakeMyTrip.com. It also sells an OpenTable-like software-as-a-service product to help bus companies manage their own inventory and better integrate their inventory with RedBus. In terms of seats, it sells less than 1% of the 750,000 rides taken daily, but with several channels and few other easy options, there’s a ton of room to grow a big company.

Sama didn’t set out to build a company. I know that’s a cliché with startups these days, but it’s a rarity in Bangalore where the glamor of being a Web entrepreneur runs high and plenty of TechCrunch-reading kids save up money, quit for a year, try to start a company, and go back to a multinational if it doesn’t hit quickly. When RedBus’s mentor first suggested the company raise $1 million, Sama gasped. He hadn’t even thought in those amounts. His only immediate thought was: “If I had $1 million, I’d put it in the bank and make interest.”

That mentor was Sanjay Anandaram

formerly of Neta, Wipro and other ventures known between the Silicon Valley and Indian entrepreneur communities. Sama met Anandaram throughTIE’s

JumpStartUp program. Despite the reach, influence and press of TIE—the uber-Indian networking organization started in the Valley— Sama is the first entrepreneur I’ve met in India who gives it this much credit for his company’s survival.

Specifically, he cites Anandaram’s advice. When RedBus was trying to sell software to the bus lines, it was Anandaram who said: Don’t keep trying to sell the same thing, ask what they need and build that. The bus lines needed to sell seats. So RedBus built a site, and bought the inventory itself from the bus lines to list on the site. Once it proved it could move seats, the operators were happy to pay the company a percentage of seats sold.

Once the company could prove results, it was Anandaram who warned them to undersell expectations: Tell an operator you can sell one seat for them a month, even if you think you can sell fifty. If you sell two, you’ll be a hero, not a disappointment. RedBus has carried that over to fundraising, admittedly forgoing higher valuations because it didn’t want to oversell and under-deliver.

That’s harder than it sounds for an entrepreneur, who is usually the single most bullish person on his company. And it is absolutely shocking in India’s startup culture. I had a blog network tell me on my last morning in India – with a straight face – that it would be doing double the revenues of Gawker in a few years. I like to give entrepreneurs the benefit of the doubt, but I also know the media business. Forgive the generalization, but Indians just love to over-sell. It’s deep in their trader heritage. “You have to sacrifice your ego,” Sama says.

But, especially for a startup in India, the most important piece of advice Sama and his co-founders got from Anandaram might have been this: You are not an Internet company.Because the Internet isn’t more widespread in India, there has to be a core mindset that the Net is an important channel, but just a channel. Just under 50% of RedBus’s business comes from the Net, much of the rest is via mobile phones.

And the company invested early in two expensive ways of skirting that Web limitation. The first was building its own network of bike couriers to deliver tickets and take payments, ala the hugely successful Chinese online travel company, CTrip. The second was investing in seven different call centers throughout India, not one central call center. Says Sama, if you don’t localize a call center to local slang, languages, and customs the customer service won’t work.

Seriously? An Indian in Bangalore arguing a centralized, remote call center can’t give good customer service? That has about as much globalization-irony as China’s BYD refusing to outsource any of its manufacturing.

For Anandaram’s part he noted the founders’ willingness to listen and learn from someone who’d been there. He says the biggest mistakes he sees Indian startups making are not seeking advice, being too obsessed with retaining control and not valuing sales, marketing and partnerships.

The RedBus story squares with something I’ve been noticing more in my travels to emerging markets—frequently when entrepreneurs complain about a lack of angel investing or venture capital, what they are really lacking isn’t just the money, it’s the mentorship. This came up inmy recent conversation with Pierre Omidyar, whose philanthropic effort, the Omidyar Network, seeks to fund both non-profit and for profit entrepreneurs specifically those in the poorest areas of the world. Omidyar Networks has money it can gives these entrepreneurs, thanks to eBay and the dot com boom—lots of money. But what the organization is increasingly finding so lacking is that horrible buzz word “human capital.”

In Omidyar’s own experience, eBay never touched the $3 million it raised from Benchmark in 1996. But the mentorship he got was well worth giving up 25% of the company. “That’s what is so hard to find around the world,” Omidyar says. “We’re increasingly looking at whether $500,000 worth of human capital could help more than $500,000.”

I know that the idea the VCs bring more than money is ridiculed by most entrepreneurs today, but those are usually entrepreneurs operating in a scene that has had an explosion of startups—both failed and successful ones—in the last fifteen years. Even the shiest, most awkward or most unconnected entrepreneurs in the Valley can find a mentor with little effort. Sometimes we take for granted that that’s not the case in much of the rest of the world.

Lucky for RedBus’s founders, they were an exception.