Springer Science and Business Media LLC
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Sắp xếp:
On Finding Templates on Web Collections
Springer Science and Business Media LLC - Tập 12 - Trang 171-211 - 2009
Templates are pieces of HTML code common to a set of web pages usually adopted by content providers to enhance the uniformity of layout and navigation of theirs Web sites. They are usually generated using authoring/publishing tools or by programs that build HTML pages to publish content from a database. In spite of their usefulness, the content of templates can negatively affect the quality of results produced by systems that automatically process information available in web sites, such as search engines, clustering and automatic categorization programs. Further, the information available in templates is redundant and thus processing and storing such information just once for a set of pages may save computational resources. In this paper, we present and evaluate methods for detecting templates considering a scenario where multiple templates can be found in a collection of Web pages. Most of previous work have studied template detection algorithms in a scenario where the collection has just a single template. The scenario with multiple templates is more realistic and, as it is discussed here, it raises important questions that may require extensions and adjustments in previously proposed template detection algorithms. We show how to apply and evaluate two template detection algorithms in this scenario, creating solutions for detecting multiple templates. The methods studied partitions the input collection into clusters that contain common HTML paths and share a high number of HTML nodes and then apply a single-template detection procedure over each cluster. We also propose a new algorithm for single template detection based on a restricted form of bottom-up tree-mapping that requires only small set of pages to correctly identify a template and which has a worst-case linear complexity. Our experimental results over a representative set of Web pages show that our approach is efficient and scalable while obtaining accurate results.
Wikiometrics: a Wikipedia based ranking system
Springer Science and Business Media LLC - Tập 20 - Trang 1153-1177 - 2017
We present a new concept—Wikiometrics—the derivation of metrics and indicators from Wikipedia. Wikipedia provides an accurate representation of the real world due to its size, structure, editing policy and popularity. We demonstrate an innovative “mining” methodology, where different elements of Wikipedia – content, structure, editorial actions and reader reviews – are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals. Our proposed ranking methods were compared to leading and widely accepted benchmarks, and were found to be extremely correlative but with the advantage of the data being publically available.
Guest editorial: web applications and techniques
Springer Science and Business Media LLC - Tập 18 - Trang 1391-1392 - 2015
Enhancing decision-making in user-centered web development: a methodology for card-sorting analysis
Springer Science and Business Media LLC - Tập 24 - Trang 2099-2137 - 2021
The World Wide Web has become a common platform for interactive software development. Most web applications feature custom user interfaces used by millions of people every day. Information architecture addresses the structural design of information to build quality web applications with improved usability of content, navigation, and findability. One of the most frequently utilized information architecture methods is card sorting—an affordable, user-centered approach for eliciting and evaluating categories and navigable items. Card sorting facilitates decision-making during the development process based on users’ mental models of a given application domain. However, although the qualitative analysis of card sorts has become common practice in information architecture, the quantitative analysis of card sorting is less widely applied. The reason for this gap is that quantitative analysis often requires the use of customized techniques to extract meaningful information for decision-making. To facilitate this process and support the structuring of information, we propose a methodology for the quantitative analysis of card-sorting results in this paper. The suggested approach can be systematically applied to provide clues and support for decisions. These might significantly impact the design and, thus, the final quality of the web application. Therefore, the approach includes proper goodness values that enable comparisons among the results of the methods and techniques used and ensure the suitability of the analyses performed. Two publicly available datasets were used to demonstrate the key issues related to the interpretation of card sorting results and the overall suitability and validity of the proposed methodology.
Automating Content Extraction of HTML Documents
Springer Science and Business Media LLC - Tập 8 - Trang 179-224 - 2005
Web pages often contain clutter (such as unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction of “useful and relevant” content from web pages has many applications, including cell phone and PDA browsing, speech rendering for the visually impaired, and text summarization. Most approaches to making content more readable involve changing font size or removing HTML and data components such as images, which takes away from a webpage’s inherent look and feel. Unlike “Content Reformatting,” which aims to reproduce the entire webpage in a more convenient form, our solution directly addresses “Content Extraction.” We have developed a framework that employs an easily extensible set of techniques. It incorporates advantages of previous work on content extraction. Our key insight is to work with DOM trees, a W3C specified interface that allows programs to dynamically access document structure, rather than with raw HTML markup. We have implemented our approach in a publicly available Web proxy to extract content from HTML web pages. This proxy can be used both centrally, administered for groups of users, as well as by individuals for personal browsers. We have also, after receiving feedback from users about the proxy, created a revised version with improved performance and accessibility in mind.
Assessing the authenticity of subjective information in the blockchain: a survey and open issues
Springer Science and Business Media LLC - Tập 24 - Trang 483-509 - 2020
Blockchain, with its ever-increasing maturity and popularity, is being used in many different applied computing domains. To document the advancements made, researchers have conducted surveys on blockchain from many different viewpoints. However, one perspective which is missing from such surveys is how to assess the authenticity of subjective information and consider this in the processing of blockchain. The goal of this article is to highlight this gap as an open issue. We do this by surveying articles, both from the academic and grey literature and propose a taxonomy to classify the literature as focusing on either the inherent or add-on characteristics of a blockchain. We then focus on those works that aim to build authenticity in the application of blockchains and determine if they address the gap of considering subjective information in their analysis. Based on our findings from the survey, we propose future research directions in this area which are essential to achieve blockchain’s envisaged view of it being the single source of truth for all types of information.
An efficient multidimensional $L_{\infty }$ wavelet method and its application to approximate query processing
Springer Science and Business Media LLC - Tập 24 - Trang 105-133 - 2020
Approximate query processing (AQP) has been an effective approach for real-time and online query processing for today’s query systems. It provides approximate but fast query results to users. In wavelet based AQP, queries are executed against the wavelet synopsis which is a lossy, compressed representation of the original data returned by a specific wavelet method. Wavelet synopsis optimized for
$L_{\infty }$
-norm error can guarantee approximate error of each individual element, thus it can provide error guaranteed query results for many queries. However, most algorithms for building one dimensional
$L_{\infty }$
synopsis are of super linear complexity, which makes the extension to their multidimensional case challengeable. In this paper, we propose an efficient multidimensional wavelet method towards constructing
$L_{\infty }$
synopsis and we apply it to AQP. The proposed wavelet method can bound the approximate error of each individual element and it has linear time complexity. It can also provide fast AQP. These good properties are all verified theoretically. Extensive experiments on both synthetic and real-life datasets are presented to show its effectiveness and efficiency for data compression and AQP.
A web-based Internet Java Phone for real-time voice communication
Springer Science and Business Media LLC - - 2000
Although a wide range of Internet telephony applications such as VocalTec's Internet Phone and Microsoft's NetMeeting has been developed to support real-time voice communication over the Internet, they are predominantly written for the Windows 95/NT platform and are not compatible with other operating systems. Moreover, most of the Internet telephony software is operated as stand-alone applications and must be downloaded and installed prior to operation. This has caused great inconvenience in using Internet telephony software. To resolve this, a web-based Internet Java Phone (or IJPhone) which can be downloaded from the Internet and run from standard Java-compliant web browser, is proposed in this paper. The IJPhone system consists of two main components: the Web-based Telephone Exchange for user connection and Internet Java Phone Applet for establishing real-time Internet voice communication. In addition, as Java applets have certain security restrictions placed on them, the paper also discusses the method proposed to overcome them.
AO4BPEL: An Aspect-oriented Extension to BPEL
Springer Science and Business Media LLC - Tập 10 - Trang 309-344 - 2007
Process-oriented composition languages such as BPEL allow Web Services to be composed into more sophisticated services using a workflow process. However, such languages exhibit some limitations with respect to modularity and flexibility. They do not provide means for a well-modularized specification of crosscutting concerns such as logging, persistence, auditing, and security. They also do not support the dynamic adaptation of composition at runtime. In this paper, we advocate an aspect-oriented approach to Web Service composition and present the design and implementation of AO4BPEL, an aspect-oriented extension to BPEL. We illustrate through examples how AO4BPEL makes the composition specification more modular and the composition itself more flexible and adaptable.
Durable queries over non-synchronized temporal data
Springer Science and Business Media LLC - Tập 26 - Trang 2099-2113 - 2022
Temporal data are ubiquitous nowadays and efficient management of temporal data is of key importance. A temporal data typically describes the evolution of an object over time. One of the most useful queries over temporal data are the durable top-k queries. Given a time window, a durable top-k query finds the objects that are frequently among the best. Existing solutions to durable top-k queries assume that all temporal data are sampled at the same time points (i.e., at any time, there is a corresponding observed value for every temporal data). However, in many practical applications, temporal data are collected from multiple data sources with different sampling rates. In this light, we investigate the efficient processing of durable top-k queries over temporal data with different sampling rates. We propose an efficient sweep line algorithm to process durable top-k queries over non-synchronized temporal data. We conduct extensive experiments on two real datasets to test the performance of our proposed method. The results show that our methods outperforms the baseline solutions by a large margin.
Tổng số: 937
- 1
- 2
- 3
- 4
- 5
- 6
- 10