WEBIST 2017 Abstracts


Area 1 - Internet Technology

Full Papers
Paper Nr: 9
Title:

The Web as a Software Platform: Ten Years Later

Authors:

Antero Taivalsaari and Tommi Mikkonen

Abstract: In the past ten years, the Web has become a dominant deployment environment for new software systems and applications. In view of its current popularity, it is easy to forget that only 10-15 years ago hardly any developer would write serious software applications for the Web. Today, the use of the web browser as a software platform is commonplace, and JavaScript has become one of the most popular programming languages in the world. In this paper we revisit some predictions that were made over ten years ago when the Lively Kernel project was started back in 2006. Ten years later, most of the elements of the original vision have been fulfilled, although not entirely in the fashion we originally envisioned. We look back at the Lively Kernel vision, reflecting our original goals to the state of the art in web programming today.

Paper Nr: 28
Title:

Personalized Hotlink Assignment using Social Networks

Authors:

Christos Makris, Konstantinos Siaterlis and Pantelis Vikatos

Abstract: In this paper, we introduce a novel methodology for personalized website reconstruction. We combine context and popularity of the web pages and the information of user’s interest from social media. We present an efficient automatic web restructure placing suitable hotlinks between nodes of the generated website’s graph using information of social media contrary to previous studies. In addition, our methodology includes an innovative personalization scheme using a topic modeling approach to texts of users of social media to create a graph of categories. We evaluate our approach counting user’s feedback about the ordering and relevance of links to a website.

Paper Nr: 36
Title:

Can Matrix Factorization Improve the Accuracy of Recommendations Provided to Grey Sheep Users?

Authors:

Benjamin Gras, Armelle Brun and Anne Boyer

Abstract: Matrix Factorization (MF)-based recommender systems provide on average accurate recommendations, they do consistently fail on some users. The literature has shown that this can be explained by the characteristics of the preferences of these users, who only partially agree with others. These users are referred to as Grey Sheep Users (GSU). This paper studies if it is possible to design a MF-based recommender that improves the accuracy of the recommendations provided to GSU. We introduce three MF-based models that have the characteristic to focus on original ways to exploit the ratings of GSU during the training phase (by selecting, weighting, etc.). The experiments conducted on a state-of-the-art dataset show that it is actually possible to design a MF-based model that significantly improves the accuracy of the recommendations, for most of GSU.

Paper Nr: 44
Title:

Comparative Analysis of Web Platform Assessment Tools

Authors:

Solange Paz and Jorge Bernardino

Abstract: Search engines are used daily anywhere in the world. Although they regularly use updated indexes to run quickly and efficiently, they sometimes fail to keep the user on their page for a long time. As such, it is important that their response time is the lowest possible Therefore, it is essential to understand what load is supported by each search engine by conducting load testing. These tests have the objective of optimizing the performance of the application being tested, thus verifying the maximum amount of data that is processed. In this paper we conduct a comparative analysis of the four most popular web platform assessment tools, Apache JMeter, Apache Flood, The Grinder and Gatling, and elect the best. In the experimental evaluation the search engines used are: Google, Bing, Ask and Aol Search.

Paper Nr: 62
Title:

Concatenation, Embedding and Sharding: Do HTTP/1 Performance Best Practices Make Sense in HTTP/2?

Authors:

Robin Marx, Peter Quax, Axel Faes and Wim Lamotte

Abstract: Web page performance is becoming increasingly important for end users but also more difficult to provide by web developers, in part because of the limitations of the legacy HTTP/1 protocol. The new HTTP/2 protocol was designed with performance in mind, but existing work comparing its improvements to HTTP/1 often shows contradictory results. It is unclear for developers how to profit from HTTP/2 and whether current HTTP/1 best practices such as resource concatenation, resource embedding, and hostname sharding should still be used. In this work we introduce the Speeder framework, which uses established tools and software to easily and reproducibly test various setup permutations. We compare and discuss results over many parameters (e.g., network conditions, browsers, metrics), both from synthetic and realistic test cases. We find that in most non-extreme cases HTTP/2 is on a par with HTTP/1 and that most HTTP/1 best practices are applicable to HTTP/2. We show that situations in which HTTP/2 currently underperforms are mainly caused by inefficiencies in implementations, not due to shortcomings in the protocol itself.

Short Papers
Paper Nr: 6
Title:

A Prime Number Approach to Matching an XML Twig Pattern including Parent-Child Edges

Authors:

Shtwai Alsubai and Siobhán North

Abstract: Twig pattern matching is a core operation in XML query processing because it is how all the occurrences of a twig pattern in an XML document are found. In the past decade, many algorithms have been proposed to perform twig pattern matching. They rely on labelling schemes to determine relationships between elements corresponding to query nodes in constant time, therefore the processing time is improved. In this paper, a new algorithm TwigStackPrime is proposed, which is an improvement to TwigStack (Bruno et al., 2002). To reduce the memory consumption and computation overhead of twig pattern matching algorithms when Parent-Child (P-C) edges are involved, TwigStackPrime efficiently filters out a tremendous amount of irrelevant elements and avoid unnecessary computations by introducing a new labelling scheme, called Child Prime Labels (CPL). Extensive performance studies on various real-world and artificial datasets were conducted to demonstrate the significant improvement of CPL over the previous indexing and querying techniques. The experimental results show that the new technique has a superior performance to the previous approaches.

Paper Nr: 21
Title:

Efficient Processing of Semantically Represented Sensor Data

Authors:

Farah Karim, Maria-Esther Vidal and Sören Auer

Abstract: Large collections of sensor data are semantically described using ontologies, e.g., the Semantic Sensor Network (SSN) ontology. Semantic sensor data are RDF descriptions of sensor observations from related sampling frames or sensors at multiple points in time, e.g., climate sensor data. Sensor values can be repeated in a sampling frame, e.g., a particular temperature value can be repeated several times, resulting in a considerable increase in data volume. We devise a factorized compact representation of semantic sensor data using linked data technologies to reduce repetition of same sensor values, and propose algorithms to generate collections of factorized semantic sensor data that can be managed by existing RDF triple stores. We empirically study the effectiveness of the proposed factorized representation of semantic sensor data. We show that the size of semantic sensor data is reduced by more than 50\% on average without loss of information. Further, we have evaluated the impact of this factorized representation of semantic sensor data on query execution. Results suggest that query optimizers can be empowered with semantics from factorized representations to generate query plans that effectively speed up query execution time on factorized semantic sensor data.

Paper Nr: 25
Title:

Towards a Bio-inspired Approach to Match Heterogeneous Documents

Authors:

Nourelhouda Yahi, Hacene Belhadef, Mathieu Roche and Amer Draa

Abstract: Matching heterogeneous text documents coming from different sources means matching data extracted from these documents, generally structured in the form of vectors. The accuracy of matching directly depends on the right choice of the content of these vectors. That’s why we need to select the best features. In this paper, we present a new approach to select the minimum set of features that represents the semantics of a set of text documents, using a quantum inspired genetic algorithm. Among different Vs characterizing the big data we focus on ‘Variety’ criterion, therefore, we used three sets of different sources that are semantically similar to retrieve their best features which describe the semantics of the corpus. In the matching phase, our approach shows significant improvement compared with the classic ‘Bag-of-words’ approach.

Paper Nr: 53
Title:

Progressive Web Apps: The Possible Web-native Unifier for Mobile Development

Authors:

Andreas Biørn-Hansen, Tim A. Majchrzak and Tor-Morten Grønli

Abstract: A recent advancement of the mobile web has enabled features previously only found in natively developed apps. Thus, arduous development for several platforms or using cross-platform approaches was required. The novel approach, coined Progressive Web Apps, can be implemented through a set of concepts and technologies on any web site that meets certain requirements. In this paper, we argue for progressive web apps as a possibly unifying technology for web apps and native apps. After an introduction of features, we scrutinize the performance. Two cross-platform mobile apps and one Progressive Web App have been developed for comparison purposes, and provided in an open source repository for results’ validity verification. We aim to spark interest in the academic community, as a lack of academic involvement was identified as part of the literature search.

Paper Nr: 66
Title:

Diamond - A Cube Model Proposal based on a Centric Architecture Approach to Enhance Liquid Software Model Approaches

Authors:

Clay Palmeira da Silva, Nizar Messai, Yacine Sam and Thomas Devogele

Abstract: The adoption of multiple connected devices in our lives is a reality which the available technology is not able to deal with. The concept of Liquid Software emerged in the end of the 90s, however, its full potential of a unified interface which can drift between different connected devices and bring with its behavior and complexities is still not fully applied. Thus, enhancements of current Web application architecture, in other words, a new approach able to deal with our technology requirements is required. In this context, we propose a centric-basic architecture to deal with Liquid Software principles and constraints. The CUBE, once built, should be able to deal with all these requirements, making use of best practices from different technologies.

Posters
Paper Nr: 19
Title:

An Ontological Model for Assessment Analytics

Authors:

Azer Nouira, Lilia Cheniti-Belcadhi and Rafik Braham

Abstract: Today, there is a growing interest in data and analytics in the learning environment resulting in a highly qualified research concerning models, methods, tools, technologies and analytics. This research area is referred to as learning analytics. Metadata becomes an important item in an e-learning system, many learning analytics models are currently developed. They use metadata to tag learning materials, learning resources and learning activities. In this paper, we firstly give a detailed injection of the existing learning analytics models in the literature. We particularly observed that there is a lack of models dedicated to conceive and analyze the assessment data. That is why our objective in this paper is to propose an assessment analytics model inspired by the Experience API data model. Hence, an assessment analytics ontology model is developed supporting the analytics of assessment data by tracking the assessment activities, assessment result and assessment context of the learner.

Paper Nr: 22
Title:

A Web Integration Framework for Cheap Flight Fares

Authors:

Manuel Sánchez, Juncal Gutiérrez-Artacho and Jorge Bernardino

Abstract: Travel agencies offer their services via the Internet, which creates new methods of communication and connection between customers and third companies. Due to the difficulty that the management of large volume of flight routes represents, it is necessary to capture the information provided by airlines through a variety of services, providing end customers with competitive fares. In this paper, we analyse the information source of flight fares offered by airlines, studying the difficulties, limitations and costs involved in accessing these data. A framework that explores the possibilities of finding "hidden" flight fares that result in much cheaper options in comparison to the average price of each flight route will be presented.

Paper Nr: 55
Title:

A Security Approach using SIP Protocol in Imbedded Systems

Authors:

Toniclay Andrade Nogueira, Adauto Cavalcante Menezes, Admilson De Ribamar Lima Ribeiro and Edward David Moreno Ordonez

Abstract: Voice over IP communication will dominate the world. However, given the growing demand for voice and data communication to make any and all communication reliable and secure, several attacks occur frequently in communication networks, so this work is based on verifying security, analyzing risks, vulnerabilities, such as verifying the attacks and proposing a security measure for voice over IP communication on embedded devices.

Paper Nr: 56
Title:

Evaluation of Firewall Open Source Software

Authors:

Diogo Sampaio and Jorge Bernardino

Abstract: Computers systems are virtually in every area of our life, but their use has several risks. This is particularly relevant for small business that are beginning to resort in informatics systems for all their activities, and where a breach of security can have catastrophic consequences. Most risks or security vulnerabilities, besides inadverted errors, originates from criminal activity, which anonymously thrives on the Web and can outbreak any organization, mainly for profit but sometimes just for the challenge of doing it. Consequently, creating and managing a security system is often the main form of precaution and it is the solution that guarantees better success rates. In this paper, we are interested in software with a lower financial cost, therefore our focus is in Free and Open Source Software. To this end, the following types of security tools are analyzed: Firewall and Web Applications Firewall (WAF).

Paper Nr: 68
Title:

A System for Aspect-based Opinion Mining of Hotel Reviews

Authors:

Isidoros Perikos, Konstantinos Kovas, Foteini Grivokostopoulou and Ioannis Hatzilygeroudis

Abstract: Online travel portals are becoming important parts for sharing travel information. User generated content and information in user reviews is valuable to both travel companies and to other people and can have a substantial impact on their decision making process. The automatic analysis of used generated reviews can provide a deeper understanding of users attitudes and opinions. In this paper, we present a work on the automatic analysis of user reviews on the booking.com portal and the automatic extraction and visualization of information. An aspect based approach is followed where latent dirichlet allocation is utilized in order to model topic opinion and natural language processing techniques are used to specify the dependencies on a sentence level and determine interactions between words and aspects. Then Naïve Bayes machine learning method is used to recognize the polarity of the user’s opinion utilizing the sentence’s dependency triples. To evaluate the performance of our method, we collected a wide set of reviews for a series of hotels from booking.com. The results from the evaluation study are very encouraging and indicate that the system is fast, scalable and most of all accurate in analyzing user reviews and in specifying users’ opinions and stance towards the characteristics of the hotels and can provide comprehensive hotel information.

Area 2 - Service based Information Systems

Full Papers
Paper Nr: 13
Title:

Revealing Fake Profiles in Social Networks by Longitudinal Data Analysis

Authors:

Aleksei Romanov, Alexander Semenov and Jari Veijalainen

Abstract: The goal of the current research is to detect fake identities among newly registered users of vk.com. Ego networks in vk.com for about 200.000 most recently registered profiles were gathered and analyzed longitudinally. The reason is that a certain percentage of new user accounts are faked, and the faked accounts and normal accounts have different behavioural patterns. Thus, the former can be detected already in a few first days. Social graph metrics were calculated and analysis was performed that allowed to reveal outlying suspicious profiles, some of which turned out to be legitimate celebrities, but some were fake profiles involved in social media marketing and other malicious activities, as participation in friend farms.

Posters
Paper Nr: 15
Title:

Simulating User Interactions: A Model and Tool for Semi-realistic Load Testing of Social App Backend Web Services

Authors:

Philipp Brune

Abstract: Many mobile apps today support interactions between their users and/or the provider within the app. Therefore, these apps commonly call a web service backend system hosted by the app provider. For the implementation of such service backends, load tests are required to ensure their performance and scalability. However, existing tools like JMeter are not able to simulate “out of the box” a load distribution with the complex time evolution of heterogeneous, real and interacting users of a social app, which e.g. would be necessary to detect critical performance bottlenecks. Therefore, in this paper a probabilistic model for simulating interacting users of a social app is proposed and evaluated by implementing it in a prototype load testing tool and using it to test a backend of new real-world social app currently under development.

Paper Nr: 52
Title:

Towards Model-driven Hypermedia Testing for RESTful Systems

Authors:

Henry Vu, Tobias Fertig and Peter Braun

Abstract: Testing RESTful systems is a missing topic within literature. Especially hypermedia testing is not mentioned at all. We discuss the challenges of hypermedia testing that were discovered within our research. We will differ between client-side and server-side challenges since REpresentational State Transfer (REST) describes a client-server system. Therefore, both sides have to be considered. Hypermedia tests for the server have to ensure that there is no response without hypermedia links. However, the client also has to be hypermedia compliant. Thus, we propose to simulate a server update to check whether the client breaks. Since we use Model-driven Software Development (MDSD) to generate RESTful systems we also propose a model-driven approach for hypermedia testing. This allows us to generate tests for a server based on its underlying model. Moreover, we can build a crawler to verify our generated servers and to test all hypermedia links for different user roles. Any modification to the model can result in a server update, which can be used to test hypermedia clients.

Area 3 - Web Interfaces

Full Papers
Paper Nr: 58
Title:

Helping Non-programmers to Understand the Functionality of Composite Web Applications

Authors:

Carsten Radeck and Klaus Meißner

Abstract: The mashup paradigm allows end users to build their own web applications by combining components in order to fulfill specific needs. Mashup development and usage are still cumbersome tasks for non-programmers, for instance, when it comes to understanding the composite nature of mashups and their functionality. Non-programmers may struggle to use components as intended, especially if the latter provide capabilities in combination, and may lack awareness for inter-widget communication (IWC). Prevalent mashup approaches provide no or limited concepts for these aspects, resulting in more or less successful trial and error strategies of users. In this paper, we present our proposal for assisting non-programmers to understand and leverage the functionality of components and their interplay in a mashup. Based on annotated component descriptions, interactive explanations and step-wise instructions are generated and presented directly in context of components’ user interface (UI). In addition, active IWC is visualized to foster awareness of users. We describe the iterative design which led us from early approaches towards our current solution. The concepts are implemented in our mashup platform and evaluated by means of a user study. The results indicate that our solutions help non-programmers to better understand the functionality of composite web application (CWA).

Paper Nr: 67
Title:

Private Data in Collaborative Web Mashups

Authors:

Gregor Blichmann, Andreas Rümpel, Martin Schrader, Carsten Radeck and Klaus Meißner

Abstract: Situational development and utilization of long-tail Web applications often involves scenarios with multiple persons interacting. Keeping private data under control comes into focus when using applications in sensitive domains, such as financial management. Main problems comprise the lack of data restriction capabilities in an adequate granularity, missing awareness on data restricted as well as missing visual representations both on previewing such private data and replacement of non-shared data on the invitee’s side. To this end, we present an innovative sharing process with fine-grained control of private data, sharing awareness and impression management. Further, policy compliance for private data enables corporate use, fostering the utilization of the collaborative Web mashup paradigm in business application scenarios. A user study shows the suitability of the corresponding interaction features for the target group of Web users with no programming skills.

Short Papers
Paper Nr: 7
Title:

A Revisit to Web Browsing on Wearable Devices

Authors:

Jinwoo Song, Hyunjune Kim, Ming Jin and Honguk Woo

Abstract: Wearable devices and smartwatches have become prevalent in recent years, yet consuming web contents on those devices are not common mainly due to their restricted IO capabilities. In this paper, we revisit the web browser model and propose the notion of fast access browsing that incorporates the lightweight, always-on web snippets, namely widget view, into web applications. This allows smartwatch users to rapidly access web contents (i.e., within 200ms) similarly as they interact with notification. To do so, we analyse about 90 smartwatch applications, identify the quick preview pattern, and then define the constrained web specifications for smartwatches. Our implementation, the wearable device toolkit for fast access browsing, is now being tested and deployed on commercialized products and the developer tool for building widget view enabled web applications will be soon available as the smartwatch SDK extensions.

Paper Nr: 43
Title:

Columbus: A Tool for Discovering User Interface Models in Component-basedWeb Applications

Authors:

Adrian Hernandez-Mendez, Andreas Tielitz and Florian Matthes

Abstract: The processes of replacing, maintaining or adapting the existing User Interfaces in Component-based Web Applications to new conditions requires a significative amount of efforts and resources for coordinating their different stakeholders. Additionally, there are many design alternatives, which can vary according to the context of use. Therefore, understanding the structure and composition of UIs and their contained elements can provide valuable insights for future adaptations. In this paper, we present a tool for discovering UI models in the source code of Component-based Web Applications, which could be used to support the reverse engineering process. Subsequently, we evaluated its capabilities of User Interface model extractions using open-source project TodoMVC. The evaluation process shows the main limitations of the JavaScript frameworks for creating an abstract UI model (i.e. technology independent model) for Web Applications.

Paper Nr: 69
Title:

Digital Assisted Communication

Authors:

Paula Escudeiro, Nuno Escudeiro, Marcelo Norberto, Jorge Lopes and Fernando Soares

Abstract: The communication with the deaf community can prove to be very challenging without the use of sign language. There is a considerable difference between sign and written language as they differ in both syntax and semantics. The work described in this paper addresses the development of a bidirectional translator between several sign languages and their respective text, as well as the evaluation methods and results of those tools. A multiplayer game is using the translator is also described on this paper. The translator from sign language to text employs two devices, namely the Microsoft Kinect and 5DT Sensor Gloves in order to gather data about the motion and shape of the hands. This translator is being adapted to allow the communication with the blind as well. The Quantitative Evaluation Framework (QEF) and the ten-fold cross-validation methods were used to evaluate the project and show promising results. Also, the product goes through a validation process by sign language experts and deaf users who provide their feedback answering a questionnaire. The translator exhibits a precision higher than 90% and the projects overall quality rates are close to 90% based on the QEF.

Posters
Paper Nr: 5
Title:

Modelling Agile Requirements using Context-based Persona Stories

Authors:

Jorge Sedeño, Eva-Maria Schön, Carlos Torrecilla-Salinas, Jörg Thomaschewski, Maria José Escalona and Manuel Mejías

Abstract: In recent years hybrid approaches focusing on user needs by integrating Agile methodologies (e.g. Scrum, Kanban or Extreme Programming) with Human-Centered Design (HCD) have proven to be particularly suitable for the development of Web systems. On the one hand, HCD techniques are used for requirements elicitation and, on the other hand, they can be utilized to elicit navigation relationships in Web projects. Navigation is one of the basic pillars of Web systems and also a fundamental element for the methodologies within the Model-Driven Web Engineering (MDWE) field. This paper presents an approach to model Agile requirements by means of integrating HCD techniques into Agile software development. We contribute to the software development body of knowledge by creating the concept of a Context-based Persona Story (CBPS) and formalizing it through a metamodel. Our approach covers the modelling of users and stakeholders by personas as well as the visualization of the context of use by storyboards. The attributes of the context of use enable us to elicit acceptance criteria for describing the scope of an Agile requirement.

Paper Nr: 29
Title:

SIAS: Suicidal Intentions Alerting System

Authors:

Georgios Domalis, Christos Makris, Pantelis Vikatos, Anastasios Papathanasiou, Efterpi Paraskevoulakou and Manos Sfakianakis

Abstract: In this paper, we present an alerting system based on an efficient classification model for detecting suicidal people using natural language processing and data mining techniques. The model uses linguistic features which are derived from an analysis of handwritten and electronic messages/notes. The model was trained and validated with fully anonymised real data provided by the Cyber Crime Division of Greek Police as well as available suicidal notes from social media. The alerting system is intended as a prevention, management tool for automatic detection of suicidal intentions.

Area 4 - Web Intelligence

Full Papers
Paper Nr: 4
Title:

Detecting Hacked Twitter Accounts based on Behavioural Change

Authors:

Meike Nauta, Mena Habib and Maurice van Keulen

Abstract: Social media accounts are valuable for hackers for spreading phishing links, malware and spam. Furthermore, some people deliberately hack an acquaintance to damage his or her image. This paper describes a classification for detecting hacked Twitter accounts. The model is mainly based on features associated with behavioural change such as changes in language, source, URLs, retweets, frequency and time. We experiment with a Twitter data set containing tweets of more than 100 Dutch users including 37 who were hacked. The model detects 99% of the malicious tweets which proves that behavioural changes can reveal a hack and that anomaly-based features perform better than regular features. Our approach can be used by social media systems such as Twitter to automatically detect a hack of an account only a short time after the fact allowing the legitimate owner of the account to be warned or protected, preventing reputational damage and annoyance.

Paper Nr: 8
Title:

A Serendipity-Oriented Greedy Algorithm for Recommendations

Authors:

Denis Kotkov, Jari Veijalainen and Shuaiqiang Wang

Abstract: Most recommender systems suggest items to a user that are popular among all users and similar to items the user usually consumes. As a result, a user receives recommendations that she/he is already familiar with or would find anyway, leading to low satisfaction. To overcome this problem, a recommender system should suggest novel, relevant and unexpected, i.e. serendipitous items. In this paper, we propose a serendipity-oriented algorithm, which improves serendipity through feature diversification and helps overcome the overspecialization problem. To evaluate our algorithm and compare it with others, we employ a serendipity metric that captures each component of serendipity, unlike the most common metric.

Paper Nr: 17
Title:

Reentrancy and Scoping for Multitenant Rule Engines

Authors:

Kennedy Kambona, Thierry Renaux and Wolfgang De Meuter

Abstract: Multitenant web systems can share one application instance across many clients distributed over multiple devices. These systems need to manage the shared knowledge base reused by the various users and applications they support. Rather than hard-coding all the shared knowledge and ontologies, developers often encode this knowledge in the form of rules to program server-side business logic. In such situations, a modern rule engine can be used to accommodate the knowledge for tenants of a multitenant system. Existing rule engines, however, were not conceptually designed to support or cope with the knowledge of the rules of multiple applications and clients at the same time. They are not fit for multitenant setups since one has to manually hard-code the modularity of the knowledge for the various applications and clients, which quickly becomes complex and fallible. We present Serena, a rule-based framework for supporting multitenant reactive web applications. The distinctive feature of Serena is the notion of reentrancy and scoping in its Rete-based rule engine, which is the key solution in making it multitenant. We validate our work through a simulated case study and a comparison with a similar common-place approach, showing that our flexible approach improves computational efficiency in the engine.

Paper Nr: 33
Title:

Target-dependent Sentiment Analysis of Tweets using a Bi-directional Gated Recurrent Unit

Authors:

Mohammed Jabreel and Antonio Moreno

Abstract: Targeted sentiment analysis classifies the sentiment polarity towards a certain target in a given text. In this paper, we propose a target-dependent bidirectional gated recurrent unit (TD-biGRU) for target-dependent sentiment analysis of tweets. The proposed model has the ability to represent the interaction between the targets and their contexts. We have evaluated the effectiveness of the proposed model on a benchmark dataset from Twitter. The experiments show that our proposed model outperforms the state-of-the-are methods for target-dependent sentiment analysis.

Paper Nr: 37
Title:

Enhancing JSON to RDF Data Conversion with Entity Type Recognition

Authors:

Fellipe Freire, Crishane Freire and Damires Souza

Abstract: Nowadays, many Web data sources and APIs make their data available on the Web in semi-structured formats such as JSON. However, JSON data cannot be directly used in the Web of data, where principles such as URIs and semantically named links are essential. Thus it is necessary to convert JSON data into RDF data. To this end, we have to consider semantics in order to provide data reference according to domain vocabularies. To help matters, we present an approach which identifies JSON metadata, aligns them with domain vocabulary terms and converts data into RDF. In addition, along with the data conversion process, we provide the identification of the semantically most appropriate entity types to the JSON objects. We present the definitions underlying our approach and results obtained with the evaluation.

Paper Nr: 41
Title:

Automatic Integration of Spatial Data into the Semantic Web

Authors:

Claire Prudhomme, Timo Homburg, Jean-Jacques Ponciano, Frank Boochs, Ana Roxin and Christophe Cruz

Abstract: For several years, many researchers tried to semantically integrate geospatial datasets into the semantic web. Although, there are many general means of integrating interconnected relational datasets (e.g. R2RML), importing schema-less relational geospatial data remains a major challenge in the semantic web community. In our project SemGIS we face significant importation challenges of schema-less geodatasets, in various data formats without relations to the semantic web. We therefore developed an automatic process of semantification for aforementioned data using among others the geometry of spatial objects. We combine Natural Language processing with geographic and semantic tools in order to extract semantic information of spatial data into a local ontology linked to existing semantic web resources. For our experiments, we used LinkedGeoData and Geonames ontologies to link semantic spatial information and compared links with DBpedia and Wikidata for other types of information. The aim of our experiments presented in this paper, is to examine the feasibility and limits of an automated integration of spatial data into a semantic knowledge base and to assess its correctness according to different open datasets. Other ways to link these open datasets have been applied and we used the different results for evaluating our automatic approach.

Paper Nr: 46
Title:

Building a Query Engine for a Corpus of Open Data

Authors:

Mauro Pelucchi, Giuseppe Psaila and Maurizio Toccu

Abstract: Public Administrations openly publish many data sets concerning citizens and territories in order to increase the amount of information made available for people, firms and public administrators. As an effect, Open Data corpora has become so huge that it is impossible to deal with them by hand; as a consequence, it is necessary to use tools that include innovative techniques able to query them. In this paper, we present a technique to select open data sets containing specific pieces of information, and retrieve them in a corpus published by a portal of open data. In particular, users can formulate structured queries blindly submitted to our search engine prototype (i.e., being unaware of the actual structure of data sets). Our approach reinterpret and mixes several known information retrieval approaches, giving at the same time a database view of the problem. We implemented this technique within a prototype, that we tested on a corpus containing more that over 2000 data sets. We noted that our technique provides focused results w.r.t. the baseline experiments performed with Apache Solr.

Short Papers
Paper Nr: 3
Title:

Truth Assessment of Objective Facts Extracted from Tweets: A Case Study on World Cup 2014 Game Facts

Authors:

Bas Janssen, Mena Habib and Maurice van Keulen

Abstract: By understanding the tremendous opportunities to work with social media data and the acknowledgment of the negative effects social media messages can have, a way of assessing truth in claims on social media would not only be interesting but also very valuable. By making use of this ability, applications using social media data could be supported, or a selection tool in research regarding the spread of false rumors or ’fake news’ could be build. In this paper, we show that we can determine truth by using a statistical classifier supported by an architecture of three preprocessing phases. We base our research on a dataset of Twitter messages about the FIFA World Cup 2014. We determine the truth of a tweet by using 7 popular fact types (involving events in the matches in the tournament such as scoring a goal) and we show that we can achieve an F1-score of 0.988 for the first class; the Tweets which contain no false facts and an F1-score of 0.818 on the second class; the Tweets which contain one or more false facts.

Paper Nr: 12
Title:

Ontology Development for Classification: Spirals - A Case Study in Space Object Classification

Authors:

Bin Liu, Li Yao, Junfeng Wu and Zheyuan Ding

Abstract: Ontology-based classification (OBC) has been used extensively. The classification ontologies are the grounds of the OBC systems. It is an urgent call for a method to guide the development of classification ontology, to get better performances for OBC. A method for developing classification ontology named Spirals is proposed, taking the development of the ontology for space object classification named OntoStar as an example. Firstly, soft sensing data and hard sensing data are collected. Then, various kinds of human knowledge and knowledge obtained by machine learning are combined to build the ontology. Finally, data-driven evaluation and promotion is deployed to assess and promote the ontology. Experiments of the OBC system built upon OntoStar show that the data-driven evaluation and promotion in Spirals increases the accuracy of space object classification by 4.1%. OBC is more robust than baseline classifiers with respect to a missing feature in the test data. When classifying space objects with the feature “size” missing in the test data, OBC keeps its FP rate, while that of baseline classifiers increases between 3.9% and 35.5%; the losing accuracy of OBC is 0.2%, while that of baseline classifiers ranges from 1.1% to 69.5%.

Paper Nr: 24
Title:

Time Weight Content-based Extensions of Temporal Graphs for Personalized Recommendation

Authors:

Armel Jacques Nzekon Nzeko'o, Maurice Tchuente and Matthieu Latapy

Abstract: Recommender systems are an answer to information overload on the web. They filter and present to customers, small subsets of items that they are most likely to be interested in. Users’ interests may change over time, and accurately capturing this dynamics in such systems is important. Sugiyama, Hatano and Yoshikawa proposed to take into account the user’s browsing history. Ding and Li were among the first to address this problem, by assigning weights that decrease with the age of the data. Others authors such as Billsus and Pazzani, Li, Yang, Wang and Kitsuregawa proposed to capture long- and short- terms preferences and combine them for personalized search or news access. The Session-based Temporal Graph (STG) is a general model proposed by Xiang et al. to provide temporal recommendations by combining long- and short-term preferences. Later, Yu, Shen and Yang have introduced Topic-STG, which takes into account topics information extracted from data. In this paper, we propose Time Weight Content-based STG that generalizes Topic STG. Experiments show that, using Time-Averaged Hit Ratio as measure, this time weight content-based extension of STG leads to performance increases of 4%, 6% and 9% for CiteUlike, Delicious and Last.fm datasets respectively, in comparison to STG.

Paper Nr: 27
Title:

Bringing Scientific Blogs to Digital Libraries

Authors:

Fidan Limani, Atif Latif and Klaus Tochtermann

Abstract: Research publication via scientific blogging is gaining momentum, with an ever-increasing number of researchers accepting it as their main or complementary research dissemination channel. This development has prompted both scientific bloggers and Digital Libraries (DL) to explore the potential of streamlining these resources along DL collections for increased and complementary user selection. In this paper we explore a methodology for achieving the integration of DL and a blog post collections, together with some use case scenarios that demonstrate the values and capabilities of this integration.

Paper Nr: 30
Title:

Connected Closet - A Semantically Enriched Mobile Recommender System for Smart Closets

Authors:

Anders Kolstad, Özlem Özgöbek, Jon Atle Gulla and Simon Litlehamar

Abstract: A common problem for many people is deciding on an outfit from a vastly overloaded wardrobe. In this paper, we present Connected Closet, a semantically enriched Internet of Things solution of a smart closet with a corresponding mobile application for recommending daily outfits and suggesting garments for recycling or donation. This paper describes the whole design and architecture for the system, including the physical closet, the recommender algorithms, the mobile application, and the backend comprising of microservices implemented using container technology. We show how users can benefit from the system by supporting them in organizing their wardrobe, and receiving daily personalized outfit suggestions. Moreover, with the system’s recycling suggestions, the system can be beneficial for the sustainability of the environment and the economy.

Paper Nr: 70
Title:

Graph Community Discovery Algorithms in Neo4j with a Regularization-based Evaluation Metric

Authors:

Andreas Kanavos, Georgios Drakopoulos and Athanasios Tsakalidis

Abstract: Community discovery is central to social network analysis as it provides a natural way for decomposing a social graph to smaller ones based on the interactions among individuals. Communities do not need to be disjoint and often exhibit recursive structure. The latter has been established as a distinctive characteristic of large social graphs, indicating a modularity in the way humans build societies. This paper presents the implementation of four established community discovery algorithms in the form of Neo4j higher order analytics with the Twitter4j Java API and their application to two real Twitter graphs with diverse structural properties. In order to evaluate the results obtained from each algorithm a regularization-like metric, balancing the global and local graph self-similarity akin to the way it is done in signal processing, is proposed.

Posters
Paper Nr: 34
Title:

Samsung and Volkswagen Crisis Communication in Facebook and Twitter - A Comparative Study

Authors:

Boyang Zhang, Jari Veijalainen and Denis Kotkov

Abstract: Since September 2015 at least two major crises have emerged where major industrial companies producing consumer products have been involved. In September 2015 diesel cars manufactured by Volkswagen turned out to be equipped with cheating software that caused NO2 and other emission values to be reduced to acceptable levels while tested from the real, unacceptable values in normal use. In August 2016 reports began to appear that the battery of a new smart phone produced by Samsung, Galaxy Note7, could begin to burn, or even explode, while the device was on. In Nov. 2016 also 34 washing machine models were reported to have caused damages due to disintegration. In all cases, the companies have experienced substantial financial losses, their shares have lost value, and their reputation has suffered among consumers and other stakeholders. In this paper, we study the commonalities and differences in the crisis management strategies of the companies, mostly concentrating on the crisis communication aspects. We draw on Situational Crisis Communication Theory (SCCT). The communication behaviour of the companies and various stakeholders during crisis is performed by investigating the official web sites of the companies and communication in Twitter and Facebook on their own accounts. We also collected streaming data from Twitter where Samsung and the troubled smart phone or washing machines were mentioned. For VW we also collected streaming data where the emission scandal or its ramifications were mentioned and performed several analyses, including sentiment analysis.

Paper Nr: 57
Title:

Detection of Fake Profiles in Social Media - Literature Review

Authors:

Aleksei Romanov, Alexander Semenov, Oleksiy Mazhelis and Jari Veijalainen

Abstract: False identities play an important role in advanced persisted threats and are also involved in other malicious activities. The present article focuses on the literature review of the state-of-the-art research aimed at detecting fake profiles in social media. The approaches to detecting fake social media accounts can be classified into the approaches aimed on analysing individual accounts, and the approaches capturing the coordinated activities spanning a large group of accounts. The article sheds light on the role of fake identities in advanced persistent threats and covers the mentioned approaches of detecting fake social media accounts.

Paper Nr: 60
Title:

Search Engine Query Recommendation - Using SNA over Query Logs with User Profiles

Authors:

Lule Ahmedi and Dardan Shabani

Abstract: Recommending the adequate query in search engine for a specific user on the web is still a challenge even for recommender systems today with social networks incorporated. In this paper we present a query recommender that in addition to relying on similarity of the actual query posted by current user to queries in a query log in search engine, it also bases on social network analysis (SNA) to first find most similar users to the current user based on their profiles, and then recommend their most similar queries to current user. Calculation of the similarity of users follows an existing approach for Points of Interest (POIs) recommendation, which applies certain SNA ranking algorithms over concurrent users based on their social profiles in the login session.

Paper Nr: 64
Title:

E-Shop - A Vertical Search Engine for Domain of Online Shopping

Authors:

Vigan Abdurrahmani, Lule Ahmedi and Korab Rrmoku

Abstract: Along with general search engines which search the entire web, vertical search engines that are rather focused on specific domains of search have also gained in popularity in web search. In our case we are focused on domain of online shopping. We treat main processes of search engines which include crawling, indexing and query processing. Each process is developed based on general search engines and adapted to tackle the problems of the specified domain. Specifically, a search result ranking algorithm called Warehouse, which may be adapted to rank results of any specific domain, is introduced.

Area 5 - Mobile Information Systems

Full Papers
Paper Nr: 54
Title:

An Intercepting API-Based Access Control Approach for Mobile Applications

Authors:

Yaira K. Rivera Sánchez, Steven A. Demurjian and Lukas Gnirke

Abstract: Mobile device users employ mobile applications to realize tasks once limited to desktop devices, e.g., web browsing, media (audio, video), managing health and fitness data, etc. While almost all of these applications require a degree of authentication and authorization, some involve highly sensitive data (PII and PHI) that must be strictly controlled as it is exchanged back and forth between the mobile application and its server side repository/database. Role-based access control (RBAC) is a candidate to protect highly sensitive data of such applications. There has been recent research related to authorization in mobile computing that has focused on extending RBAC to provide a finer-grained access control. However, most of these approaches attempt to apply RBAC at the application-level of the mobile device and/or require modifications to the mobile OS. In contrast, the research presented in this paper focuses on applying RBAC to the business layer of a mobile application, specifically to the API(s) that a mobile application utilizes to manage data. To support this, we propose an API-Based approach to RBAC for permission definition and enforcement that intercepts API service calls to alter information delivered/stored to the app. The proposed intercepting API-based approach is demonstrated via an existing mHealth application.

Short Papers
Paper Nr: 51
Title:

Conquering the Mobile Device Jungle: Towards a Taxonomy for App-enabled Devices

Authors:

Christoph Rieger and Tim A. Majchrzak

Abstract: Applications for mobile devices (apps) have created an ecosystem that facilitated a trend towards task-oriented, interoperable software. Following smartphones and tablets, many further kinds of devices became (and still become) app-enabled. Examples for this trend are smart TVs and cars. Additionally, new types of devices have appeared, such as Wearables. App-enabled devices typically share some characteristics, and many ways exist to develop for them. So far for smartphones and tablets alone, issues such as device fragmentation are discussed and technology for cross-platform development is scrutinized. Increasingly, app-enabled devices appear to be a jungle: It becomes harder to keep the overview, to distinguish and categorize devices, and to investigate similarities and differences. We, thus, set out with this position paper to close this gap. In our view, a taxonomy for app-enabled devices is required. This paper presents the first steps towards this taxonomy and thereby invites for discussion.

Posters
Paper Nr: 31
Title:

Mobile Devices and Cyber Security - An Exploratory Study on User’s Response to Cyber Security Challenges

Authors:

Kanthithasan Kauthamy, Noushin Ashrafi and Jean-Pierre Kuilboer

Abstract: In today’s increasingly connected, global, and fast-paced computing environment, sophisticated security threats are common occurrences and detrimental to users at home as well as in business. The first and most important step against computer security attacks is the awareness and understanding of the nature of the threats and their consequences. Although the users of mobile devices and laptops are often the target of security threats, many of them, specifically millennials, seem oblivious of such threats. A survey of college students reveals that despite all the hype about cybersecurity and its potential damages, the respondents are using their mobile devices without much apprehension or thoughts about threats, potential damages, and safeguarding against them. This study is on the premise that as the use of mobile devices is exponentially increasing among millennials, their laid back attitude and behaviour in response to cybersecurity is alarming and not to be overlooked. Simple solutions such as availability of useful information should be considered.