Interview Questions And Answers Set - 10
MySQL - Using Your Own
You should definately benchmark your application and database to find
out where the bottlenecks are. By fixing it (or by replacing the bottleneck with
a 'dummy module') you can then easily identify the next bottleneck (and so on).
Even if the overall performance for your application is sufficient, you should
at least make a plan for each bottleneck, and decide how to solve it if someday
you really need the extra performance.
For an example of portable benchmark programs, look at the MySQL benchmark
You can take any program from this suite and modify it for your needs. By doing
this, you can try different solutions to your problem and test which is really
the fastest solution for you.
It is very common that some problems only occur when the system is very heavily
loaded. We have had many customers who contact us when they have a (tested)
system in production and have encountered load problems. In every one of these
cases so far, it has been problems with basic design (table scans are NOT good
at high load) or OS/Library issues. Most of this would be a LOT easier to fix if
the systems were not already in production.
To avoid problems like this, you should put some effort into benchmarking your
whole application under the worst possible load! You can use Sasha's recent hack
for this - super-smack. As the name suggests, it can bring your system down to
its knees if you ask it, so make sure to use it only on your development
MySQL - Design Choices
MySQL keeps row data and index data in separate files. Many (almost all) other
databases mix row and index data in the same file. We believe that the MySQL
choice is better for a very wide range of modern systems.
Another way to store the row data is to keep the information for each column in
a separate area (examples are SDBM and Focus). This will cause a performance hit
for every query that accesses more than one column. Because this degenerates so
quickly when more than one column is accessed, we believe that this model is not
good for general purpose databases.
The more common case is that the index and data are stored together (like in
Oracle/Sybase et al). In this case you will find the row information at the leaf
page of the index. The good thing with this layout is that it, in many cases,
depending on how well the index is cached, saves a disk read. The bad things
with this layout are:
Table scanning is much slower because you have to read through the indexes to
get at the data.
You can't use only the index table to retrieve data for a query.
You lose a lot of space, as you must duplicate indexes from the nodes (as you
can't store the row in the nodes).
Deletes will degenerate the table over time (as indexes in nodes are usually not
updated on delete).
It's harder to cache ONLY the index data.
Because MySQL uses extremely fast table locking (multiple readers /
single writers) the biggest remaining problem is a mix of a steady stream of
inserts and slow selects on the same table.
We believe that for a huge number of systems the extremely fast performance in
other cases make this choice a win. This case is usually also possible to solve
by having multiple copies of the table, but it takes more effort and hardware.
We are also working on some extensions to solve this problem for some common
MySQL - Portability
Because all SQL servers implement different parts of SQL, it takes work to write
portable SQL applications. For very simple selects/inserts it is very easy, but
the more you need the harder it gets. If you want an application that is fast
with many databases it becomes even harder!
To make a complex application portable you need to choose a number of SQL
servers that it should work with.
You can use the MySQL crash-me program/web-page
http://www.mysql.com/information/crash-me.php to find functions, types, and
limits you can use with a selection of database servers. Crash-me now tests far
from everything possible, but it is still comprehensive with about 450 things
For example, you shouldn't have column names longer than 18 characters if you
want to be able to use Informix or DB2.
Both the MySQL benchmarks and crash-me programs are very database-independent.
By taking a look at how we have handled this, you can get a feeling for what you
have to do to write your application database-independent. The benchmarks
themselves can be found in the `sql-bench' directory in the MySQL source
distribution. They are written in Perl with DBI database interface (which solves
the access part of the problem).
See http://www.mysql.com/information/benchmarks.html for the results from this
As you can see in these results, all databases have some weak points. That is,
they have different design compromises that lead to different behavior.
If you strive for database independence, you need to get a good feeling for each
SQL server's bottlenecks. MySQL is VERY fast in retrieving and updating things,
but will have a problem in mixing slow readers/writers on the same table.
Oracle, on the other hand, has a big problem when you try to access rows that
you have recently updated (until they are flushed to disk). Transaction
databases in general are not very good at generating summary tables from log
tables, as in this case row locking is almost useless.
To get your application really database-independent, you need to define an easy
extendable interface through which you manipulate your data. As C++ is available
on most systems, it makes sense to use a C++ classes interface to the databases.
If you use some specific feature for some database (like the REPLACE command in
MySQL), you should code a method for the other SQL servers to implement the same
feature (but slower). With MySQL you can use the /*! */ syntax to add
MySQL-specific keywords to a query. The code inside /**/ will be treated as a
comment (ignored) by most other SQL servers.
If REAL high performance is more important than exactness, as in some Web
applications, a possibility is to create an application layer that caches all
results to give you even higher performance. By letting old results 'expire'
after a while, you can keep the cache reasonably fresh. This is quite nice in
case of extremely high load, in which case you can dynamically increase the
cache and set the expire timeout higher until things get back to normal.
In this case the table creation information should contain information of the
initial size of the cache and how often the table should normally be refreshed.
What Have We Used MySQL For?
During MySQL initial development, the features of MySQL were made to fit our
largest customer. They handle data warehousing for a couple of the biggest
retailers in Sweden.
From all stores, we get weekly summaries of all bonus card transactions, and we
are expected to provide useful information for the store owners to help them
find how their advertisement campaigns are affecting their customers.
The data is quite huge (about 7 million summary transactions per month), and we
have data for 4-10 years that we need to present to the users. We got weekly
requests from the customers that they want to get 'instant' access to new
reports from this data.
We solved this by storing all information per month in compressed 'transaction'
tables. We have a set of simple macros (script) that generates summary tables
grouped by different criteria (product group, customer id, store ...) from the
transaction tables. The reports are Web pages that are dynamically generated by
a small Perl script that parses a Web page, executes the SQL statements in it,
and inserts the results. We would have used PHP or mod_perl instead but they
were not available at that time.
For graphical data we wrote a simple tool in C that can produce GIFs based on
the result of a SQL query (with some processing of the result). This is also
dynamically executed from the Perl script that parses the HTML files.
In most cases a new report can simply be done by copying an existing script and
modifying the SQL query in it. In some cases, we will need to add more fields to
an existing summary table or generate a new one, but this is also quite simple,
as we keep all transactions tables on disk. (Currently we have at least 50G of
transactions tables and 200G of other customer data.)
We also let our customers access the summary tables directly with ODBC so that
the advanced users can themselves experiment with the data.
We haven't had any problems handling this with quite modest Sun Ultra
SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2 CPU 400
Mhz UltraSPARC, and we are now planning to start handling transactions on the
product level, which would mean a ten-fold increase of data. We think we can
keep up with this by just adding more disk to our systems.
We are also experimenting with Intel-Linux to be able to get more CPU power
cheaper. Now that we have the binary portable database format (new in Version
3.23), we will start to use this for some parts of the application.
Our initial feelings are that Linux will perform much better on low-to-medium
load and Solaris will perform better when you start to get a high load because
of extreme disk IO, but we don't yet have anything conclusive about this. After
some discussion with a Linux Kernel developer, this might be a side effect of
Linux giving so much resources to the batch job that the interactive performance
gets very low. This makes the machine feel very slow and unresponsive while big
batches are going. Hopefully this will be better handled in future Linux
What is the difference between
mysql_fetch_array and mysql_fetch_object?
mysql_fetch_array ? Fetch a result row as an associative ARRAY, a numeric
array, or both
mysql_fetch_object ? Fetch a result row as an OBJECT
What are the different table
present in MYsql?
MyISAM: This is default. Based on Indexed Sequntial Access Method. The above SQL
will create a MyISA table.
ISAM : same
HEAP : Fast data access, but will loose data if there is a crash. Cannot have
BLOB, TEXT & AUTO INCRIMENT fields
BDB : Supports Transactions using COMMIT & ROLLBACK. Slower that others.
InoDB : same as BDB