Page 2 of 2

Making Sense of Box Plots: A Beginner’s Guide to Interpreting Data

Introduction

Are you feeling overwhelmed by box plots? If so, you’re not alone! Interpreting data can be a daunting task, but it doesn’t have to be. With this beginner’s guide to making sense of box plots, you’ll be able to make sense of data in no time. This guide will provide you with the tools you need to understand box plots and interpret data quickly and easily. So don’t worry, you’ll be a box plot pro in no time!

Making sense of box plots can be a daunting task for beginners. But with a little bit of practice, you can learn to interpret data quickly and accurately.

Box plots are a great way to visualize data, and can help you make decisions about how to proceed with your project.

The first step is to understand the different components of a box plot. The box itself represents the middle 50% of the data, while the whiskers represent the rest of the data. The line in the middle of the box is the median, which is the middle value of the data set. The dots outside the box are outliers, which are values that are significantly higher or lower than the rest of the data.  Once you understand the components of a box plot, you can start to interpret the data.

To interpret a box plot, look at the size of the box and the position of the median. If the box is large, it means that the data is spread out, and if the median is close to the center of the box, it means that the data is evenly. If the box small and the median close to one side, it means that the data is clustered around one value.

You can also look at the outliers to see if there are any extreme values that could be affecting the data. With a little practice, you’ll be able to make sense of box plots quickly and accurately.

Introduction to Box Plot Visualizations (microstrategy.com)

Box plots in JavaScript (plotly.com)

Gantt Charts: A Must-Have Tool for Every Project Manager

Introduction

As a project manager, I know how important it is to stay organized and on top of things. Gantt charts are a must-have tool for any project manager who wants to keep their projects running smoothly. Gantt charts provide a visual representation of a project timeline, making it easy to see which tasks need to be completed and when. With Gantt charts, you can easily identify potential problems and make adjustments to ensure your project is completed on time and within budget. Plus, Gantt charts are easy to use and don’t require any special software or training. So if you’re looking for a simple yet effective way to manage your projects, Gantt charts are the way to go.

Gantt charts are an essential tool for every project manager. They provide a visual representation of the timeline of a project, allowing project managers to easily track progress and identify any potential issues. With a Gantt chart, project managers can quickly see what tasks need to be completed and when, and can easily adjust the timeline as needed.

Gantt charts are also great for keeping stakeholders informed about the progress of a project. They provide a clear picture of the timeline, which makes it easy for stakeholders to understand where the project is at and what needs to be done. This helps to ensure that everyone is on the same page and that the project is running smoothly.

Overall, Gantt charts are an invaluable tool for project managers. They provide a clear and concise timeline of the project, making it easy to track progress and identify any potential issues. They also help to keep stakeholders informed, which is essential for successful project management. If you’re a project manager, Gantt charts are a must-have tool for your toolbox.

Excerpt

Gantt charts are a must-have tool for every project manager. They help you plan and track the progress of your project, and make sure that you stay on track. They’re easy to use and understand, and they can save you a lot of time and hassle. So if you’re a project manager, make sure you get yourself a Gantt chart!

https://project-management.com/gantt-chart-software-lp/

How Line Charts Helped Me Visualize Data Better

Introduction

As someone who loves working with data, I’ve always been a fan of visualizing it in a way that makes it easier to understand. Recently, I started using line charts to do just that and it has made a huge difference in how I understand the data I’m working with. Line charts are a great way to visualize data because they provide a clear and concise picture of how a certain set of data points are related. They also allow me to easily identify trends and changes over time. By using line charts, I’m able to better understand the data I’m working with and make more informed decisions.

The Power of Line Charts for Data Analysis

When it comes to data analysis, line charts are a powerful tool. They can be used to track trends over time, compare different sets of data, and identify patterns. Line charts are easy to read and interpret, making them a great choice for visualizing data.

Line charts are a type of graph that plots data points along a line. Each data point is connected by a line, making it easy to see the overall trend. The x-axis of the graph is usually the independent variable, while the y-axis is the dependent variable. This means that the x-axis is usually a timeline, while the y-axis is the value of the data being tracked.

Line charts are a great way to visualize data because they make it easy to identify patterns and trends. For example, if you’re tracking sales over time, a line chart can quickly show you whether sales are increasing or decreasing. It can also show you how sales are changing from month to month.

Line charts can also be used to compare different sets of data. For example, if you’re tracking sales for two different products, you can use a line chart to compare the performance of each product over time. This can help you identify which product is performing better and why.

Line charts are also useful for forecasting. By looking at the trend of the data, you can make predictions about what will happen in the future. This can be especially helpful for businesses that need to plan ahead.

Overall, line charts are a great tool for data analysis. They make it easy to track trends, compare different sets of data, and make predictions. If you’re looking for a way to visualize your data, line charts are a great choice.

 

Excerpt

I’m a data enthusiast, and I’ve found that line charts are a great way to visualize my data. They help me see trends and patterns more clearly, and make it easier to understand the information I’m looking at. Line charts are my go-to for data visualization, and I’m grateful for how much they’ve helped me better understand my data.

Line Chart (microstrategy.com)

The Bar Chart

There is an amusing anecdote about a bar chart depicting the pros and cons of coffee versus tea. One day, two colleagues met in the coffee kitchen and noticed the chart on the wall. One colleague was an inveterate coffee drinker, while the other preferred tea.

When the coffee drinker looked at the chart and saw that coffee had more benefits than tea, he began to smile triumphantly and said, „See, I told you coffee was better than tea!“ The tea drinker, however, not a fan of statistics and charts, just shook his head and replied, „That may be, but I’ll stick with tea anyway.“

The anecdote illustrates that while bar charts are a useful and informative way to present data, ultimately the decision of what to consume or do depends on personal tastes and preferences. Nevertheless, the bar chart remains one of the most important chart types to the human eye because it provides a clear and quick visualization of data and is easy to understand, even for people without extensive knowledge of statistics.

Learn more about Bar Charts (microstrategy.com)

Oracle APEX App Multi-Tenant Capable

How do I turn a single-user app into a multi-tenant app?

yaitcon – YET ANOTHER IT CONSULTANT

Task:

Tom, I want to use your app, but only with my data!

MICHAEL

There are a few ways to achieve this (here only 2):

  1. Clone the app in a workspace and mount it on its own Oracle schema, create all the necessary objects (which you can now define right inside Oracle Apex with the install scripts).

    not all sideeffects:
  • Update of the app means more complex app lifecycle management.
  • Maybe other processes have to be adapted which also write to the new database or schema.
  • There might be common data that should be used by all

2. Don’t clone the app and change the data model and customize the APP

not all side effects:

  • This means you have to filter in the Apex App depending on APP_USER or client of a user.
  • Depending on how the data is connected to Regions, IG, IR etc., you might have to make an adjustment in each page.

to point 2:

Here we can proceed very effectively because we can use the option of Oracle VPD-Virtual Private Database. We do not need to customize a single view or Apex SQL + PL/SQL query and that is really a huge advantage.

What was to be done?

In my case I had to extend the APP_USER in the relevant tables (of course not in attached views), I call the column MANDANT:

ALTER TABLE CARLOG_DATA add MANDANT VARCHAR2(100); 

Next I had to tell the asynchronous load process to assign the data to the user. That means the DML (Insert , Update, Delete) in this table had to be extended by the column MANDANT. It would also have been possible to use a DML trigger that could assign the data correctly. But this was not an „easier“ option for me so I decided to customize the loading process.

As soon as the data of the other client was in the database, I saw my data aggregated with his in the Apex APP. For example, in this specific case, my car drove 133km instead of 100km that day.

Now the Oracle VPD comes into play:

First, create a function that allows to manage more complex criteria in one place: Here I restrict that the check only takes place if the session is in the context of the Apex application 100.

The APP_USER is my client. Oracle Apex automatically sets the APP_USER to the session context after login. Nice…

create or replace function get_mandant_criteria 
(
 object_schema IN VARCHAR2, 
 object_name VARCHAR2
) 
return varchar2 DETERMINISTIC as
l_criteria varchar2(4000);
begin
  -- Only apply VPD to specific APEX applications
  if sys_context('APEX$SESSION', 'APP_ID') in (100)
  then
   l_criteria:=q'#(MANDANT=sys_context('APEX$SESSION', 'APP_USER'))#';
  end if;

  return l_criteria;
end get_mandant_criteria;
/

The function must of course also be used in the VPD therefore.
Using Oracle Virtual Private Database to Control Data Access

begin
  dbms_rls.add_policy
    (object_schema=>'CARLOG'  
    ,object_name=>'CARLOG_DATA' -- für jede Tabelle anlegen wo gebr.
    ,policy_name=>'CARE_APP_100'
    ,function_schema=>'ADMIN' 
    ,policy_function=>'get_mandant_criteria'
    ,statement_Types=>'SELECT'
    );
end;
/

And already APEX App 100 is filtered to APP_USER client from login time.

Yes! YAITCON

On the subject of multi-tenant and more complex security concepts such as row-level and column-level security, we recommend our SCURTY!

SPHINX.at

When the time is not right

DATETIME(2) – Transfer from SQL Server to Exasol

Customer reports that the time does not arrive correctly from SQL Server to Exasol – basically Daylight Saving Times

SPHINX.AT

He even sends a test case that can be reproduced 1 to 1.

Here is a small excerpt from the preparation:

Sql Server: (Create view for testing)

create view datetimetest as
select dt,convert(varchar,dt, 21) as vardt
from (
select try_convert(DATETIME,'1980-04-05 23:59:59.000',102) as dt 
union all
select try_convert(DATETIME,'1980-04-06 00:00:00.000',102) as dt 
union all
select try_convert(DATETIME,'1980-04-06 00:10:00.000',102) as dt 
union all
select try_convert(DATETIME,'1980-04-06 00:59:00.000',102) as dt 
union all
select try_convert(DATETIME,'1980-04-06 01:00:00.000',102) as dt 
union all
select try_convert(DATETIME,'1980-04-07 00:00:00.000',102) as dt 
union all
select try_convert(DATETIME,'1981-03-29 02:00:00.000',102) as dt 
union all
select try_convert(DATETIME,'1981-03-29 02:01:00.000',102) as dt
) a;

Exasol: (Query the data via Exasol)

SELECT * 
FROM ( IMPORT FROM jdbc at con_mssql STATEMENT 
       'select * from dbo.datetimetest')
SQL Server Result

Compare – 2nd row and last Row

Exasol Result

Now the question arises:

Do we have outdated JDBC drivers?

JDBC Driver not really up2date..

Download – JDBC Driver for SQL Server | Microsoft Learn

Problem solved?

NOT AT ALL

Why can a date actually become a timestamp?
Data type of SQL Server becomes Timestamp in jdbc driver.
=> Because it converts Java/JDBC!

SQL datatyp to Java Type

Which setting must now be set how, so that we take over the DATETIME(2) correctly? It is actually only converted if it does not correspond to the default UTC.

And in ExaOperation we see (not easy to find) the answer:
in Extra Database Parameters the user.timezone is set to Europe/Vienna?!

„The exact String is: -etlJdbcJavaEnv=-Duser.timezone=Europe/Vienna“

You barely change this setting and reboot the machine:

USE UTC !

Lo and behold Exasol query now returns the same result as SQL Server:

YES!

Problem solved?

unfortunately NOT – THE PROBLEM IS NOW AT POSTGRES !
IN CASE OF TIMESTAMP WITH TIME ZONE!

After further analysis we found out that due to a bug with Postgres this setting was set to Europe/Vienna. (from 2020)

Now we have to find out if this bug still exists…

In fact there was or is a bug with Postgres datatype timestamp with TIME ZONE… and lo and behold… the bug is still there almost 3 years later.

Postgres: (create testview in Postgres SQL)


create or replace view datetimetest as
select  dt dtwotz, dt at time zone 'Europe/Vienna' as dt,cast(dt as text) as vardt
from (
select 
cast('1980-04-05 23:59:59.000' as timestamp without time zone) as dt 
union all
select 
cast('1980-04-06 00:00:00.000' as timestamp without time zone) as dt 
union all
select 
cast('1980-04-06 00:10:00.000' as timestamp without time zone) as dt 
union all
select 
cast('1980-04-06 00:59:00.000' as timestamp without time zone) as dt 
union all
select 
cast('1980-04-06 01:00:00.000' as timestamp without time zone)  as dt 
union all
select 
cast('1980-04-07 00:00:00.000' as timestamp without time zone) as dt 
union all
select 
cast('1981-03-29 02:00:00.000' as timestamp without time zone)  as dt 
union all
select 
cast('1981-03-29 02:01:00.000' as timestamp without time zone)  as dt 
) a;
Posgres Query Result

Now still using user.timezone = UTC from Exasol to query on Postgres:

OMG – in EXASOL -1
and -2 diffs?

If we switch back to Europe/Vienna we have the problem with SQL Server.

Therefore we have to decide:

  • Postgres vs. SQL Server => which system is more important 🙂
  • Urge Exasol to fix the bug, it is open since 2020.
  • implement workaround => load it into varchar and then convert it into timestamp with TZ (for SQL Server or for Postgres is basically the same)

We are happy to help with such analyses.

Yes! SERGO + YAITCON

Related to the topic:

Daylight saving time 2023: Tips to cope with lost hour of sleep – cleveland.com

Our Top-Technical-Pains

  1. Certificate Handling
  2. LDAP – Active Directory structures
  3. Restricted environments (proxies, VPNs, …) that just get in the way instead of providing additional security
  4. Missing autodocumenter – old Confluence documentation
  5. Case sensitive column names
  6. Language support for tools
  7. Handling of time zones Daylight Saving Time

Exasol one year experience

Exasol and Oracle are both powerful relational database management systems that can be used for different use cases.

SPHINX.AT

Below are what I consider to be the top 5 features of Exasol compared to an Oracle database:

  1. Exasol’s shared-nothing architecture means that each node in the cluster has its own resources such as CPU, RAM and disk space and operates independently of other nodes. This ensures high scalability and resilience, since if a node fails, only its data is affected and not the entire system.
  2. In addition, Exasol provides automatic data distribution and replication to minimize data access time and increase availability. In-memory technology ensures that typically 10-15% of the data is quickly available in memory, while the rest is offloaded to disk. The high level of parallelization and scalability makes it possible to process large amounts of data in a short time, making Exasol a popular solution for data analysis and business intelligence.
  3. Scalability: Exasol can scale horizontally and vertically and thus offers high scalability. Oracle, on the other hand, is not as easy to scale and usually requires more effort.
  4. Easy administration: The administration of Exasol databases is easier than the administration of Oracle databases. Since there is simply not that much to manage 😉 We don’t need to configure and monitor RAC or Data Guard.
  5. Real-time processing: Exasol can process data in real time, enabling real-time analysis and decision making. Oracle also offers this feature, but not at the same speed and efficiency as Exasol.

e.g.: If during the execution of a query it is determined that a certain execution plan is not optimal, the optimizer can generate a new plan and use it to process the query faster and more efficiently. (Adaptive Plan at Oracle)

Another example of dynamic optimization is adaptive indexing. Here the optimizer can decide which indexes are needed to make a query more efficient and create or remove these indexes at runtime. And that is really ingenious!

No light without shadow, but this is also communicated fairly and transparently by Exasol, this database is not suitable for an OLTP system. (SQL parsing and single inserts/updates are much slower than with an Oracle for example). There is also no point-in-time recovery, because there are no archive logs and therefore the last backup and its time are decisive which data can be recovered. And this is hard to argue for a normal OLTP application if this data was lost.

For me as a DBA an Exasol is a dream and as a developer by the integration of Python and R and other Languages anyway incredible.

As an Oracle developer I miss features like the Oracle Scheduler , Oracle VPD and of course the low code tool like Oracle Apex. However, it is possible to combine these two worlds to get the best out of it for our customers.

I also find Oracle PL/SQL more convenient than Exasol LUA. But I would prefer everything to run in Python only 😉

Yes! YAITCON

Newer posts »

© 2023 Sphinx Blog

Theme by Anders NorenUp ↑