Computer science is a very broad concept, at the same time quite abstract for many people. What do your working tools look like?
It is true. I select tools, programming language and technologies depending on the performed task. Recently, while working on the topic of database anomaly detection, I used, among others: C# programming language, .NET Core, Neo4j database management system, Microsoft Visual Studio integrated environment and the driver necessary to connect to the mentioned Neo4j.Driver database.
What does the process of writing algorithms look like?
Analysing and creating algorithms are not simple tasks. I usually start my work by reviewing currently existing algorithms, methods. I look for their flaws, weaknesses and advantages. Then I try to develop a method without these disadvantages. Skills that are necessary to perform these types of tasks can be divided into soft and hard. Soft skills include, among others: logical thinking, creativity, innovation, reaching goals, determination. Hard skills include, among others: programming, knowledge of basic definitions in the field of artificial intelligence, databases and software engineering.
What is the topic of your doctoral dissertation?
Methods of analysis and representation of non-relational data sets. This topic concerns such areas and fields of computer science as: artificial intelligence, graph databases, linguistic summaries of databases, fuzzy logic, fuzzy rules, decision support system.
Recently, as part of our research work, together with Prof. Adam Niewiadomski, Dr. Marcin Kacprowicz and Dr. Agnieszka Duraj, we have prepared scientific articles on exception detection in databases, also called anomalies, strange, rare records. This allows you to check whether they have been generated by an external mechanism. Exceptions should be treated properly, as they may have a negative impact on the procedures and results of data analysis - blur or even distort the general idea of the analysed collections, while when identified properly, they can provide unique information about hacking into computer networks, illegal use of credit cards, hacking into transaction service at the bank, rapid changes in the parameters of medical devices showing the health status of patients, etc.
Databases - what is meant by this concept?
They are a systematized collection of data organized according to set rules. It includes digital data collected in accordance with the rules adopted for a given computer programme that specializes in collecting and processing such data. Such a program is called a ‘database management system.’ Structured Query Language (SQL) is a mechanism that allows you to read data from a database. We use SQL queries to specify what data is to be read from a database.
In my doctoral dissertation, we operate on enormous data sets, the amount of which is often expressed in petabytes. This guarantees proper verification of methods and well executed tests.
You have focused on the linguistic analysis of databases - what is the advantage of this method over others?
The linguistic analysis of huge data sets enables us to convey a clear, coherent, short message to the user, and above all, expressed in a natural (close to human) language.
People are used to conveying messages using natural language, therefore an important goal of IT systems should be to define and use such procedures as clearly as possible. This gives us the guarantee of easily interpreted results for all users. To meet this expectation, methods of representing linguistic data through fuzzy sets were selected. As a result, instead of the sentence ‘1,950 women in the database, which includes 2,000 women, are in the 15-20 years age group’, we get the phrase ‘Most women are young’.
It is worth mentioning that linguistic summaries have a lower complexity than other popular solutions that deal with data interpretation and aggregation.
Furthermore, less time is needed to analyse a huge database compared to using more complex algorithms.
Where is your tool for more effective reading of databases used?
Linguistic summaries are used as a tool supporting the analysis of the obtained data sets. This method has a wide range of applications: from generating headlines for newspapers, through creating article summaries, to decision support systems.
I will try to explain the last of the applications. Let us imagine we have a huge dataset of customer complaints. The database contains information including, among others: the date of the complaint, the country of origin of the product complained about, the age of the customer making the complaint. On the basis of this data, using predefined fuzzy sets, we are able to generate linguistic summaries, e.g. most of the complaints made in the summer period by elderly people were accepted. Then we can calculate the degree of truth of such a summary. The degree of truth takes values in the range x ? <0,1>. The closer the sentence is to 1, the more true it is. If the calculation shows that the sentence is true, we can suggest to the company's complaint handling department (CRM) to make a specific decision (to accept or reject a complaint).
The decision support system fits into the win-win strategy (satisfaction of two parties). For a department in a given company, because it is able to make a decision about a given complaint faster, and for a customer, because it is able to verify in advance whether or not the complaint will be accepted. Linguistic summaries are a universal method and could be used in a decision support system based on any other data, e.g. medical. This system could suggest whether a given person has a flu.
The effectiveness of the method depends on the quality and quantity of data used in the database.
Why is it worth being a scientist?
It gives us the opportunity to improve our skills, innovate, constantly challenge ourselves and fulfil ourselves by exploring the areas of our interest. Scientific work is a constant challenge, the results of which allow us to achieve a great reward in the form of immense satisfaction.