[Answered ]-What's the best way to optimize this MySQL query?

2đź‘Ť

âś…

First of all, the SQL is badly formatted. The most obvious error is the line splitting before each AS clause. Second obvious problem is using implicit joins instead of explicitly using INNER JOIN ... ON ....

Now to answer the actual question.

Without knowing the data or the environment, the first thing I’d look at would be some of the MySQL server settings, such as sort_buffer and key_buffer. If you haven’t changed any of these, go read up on them. The defaults are extremely conservative and can often be raised more than ten times their default, particularly on the large iron like you have.

Having reviewed that, I’d be running pieces of the query to see speed and what EXPLAIN says. The effect of indexing can be profound, but MySQL has a “fingers-and-toes” problem where it just can’t use more than one per table. And JOINs with filtering can need two. So it has to descend to a rowscan for the other check. But having said that, dicing up the query and trying different combinations will show you where it starts stumbling.

Now you will have an idea where a “tipping point” might be: this is where a small increase in some raw data size, like how much it needs to extract, will result in a big loss of performance as some internal structure gets too big. At this point, you will probably want to raise the temporary tables size. Beware that this kind of optimization is a bit of a black art. 🙂

However, there is another approach: denormalization. In a simple implementation, regularly scheduled scripts will run this expensive query from time-to-time and poke the data into a separate table in a structure much closer to what you want to display. There are multiple variations of this approach. It can be possible to keep this up-to-date on-the-fly, either in the application, or using table triggers. At the other extreme, you could allow your application to run the expensive query occasionally, but cache the result for a little while. This is most effective if a lot of people will call it often: even 2 seconds cache on a request that is run 15 times a second will show a visible improvement.

You could find ways of producing the same data by running half-a-dozen queries that each return some of the data, and post-processing the data. You could also run version of your original query that returns more data (which is likely to be much faster because it does less filtering) and post-process that. I have found several times that five simpler, smaller queries can be much faster – an order of magnitude, sometimes two – than one big query that is trying to do it all.

👤staticsan

0đź‘Ť

No index will help you since you are scanning entire tables.
As your database grows the query will always get slower.

Consider accumulating the stats : after every game, insert the row for that game, and also increment counters in the player’s row, Then you don’t need to count() and sum() because the information is available.

👤bobflux

0đź‘Ť

  • select * is bad most times – select only the columns you need
  • break the select into multiple simple selects, use temporary tables when needed
  • the sum(case part could be done with a subselect
  • mysql has a very bad performance with or-expressions. use two selects which you union together
👤codymanix

0đź‘Ť

Small Improvement

select *,
(kills / deaths) as killdeathratio,
(totgames - wins) as losses from (select gp.name as name,
gp.gameid as gameid,
gp.colour as colour,
Avg(dp.courierkills) as courierkills,
Avg(dp.raxkills) as raxkills,
Avg(dp.towerkills) as towerkills,
Avg(dp.assists) as assists,
Avg(dp.creepdenies) as creepdenies,
Avg(dp.creepkills) as creepkills,
Avg(dp.neutralkills) as neutralkills,
Avg(dp.deaths) as deaths,
Avg(dp.kills) as kills,
sc.score as totalscore,
Count(1 ) as totgames,
Sum(case
when ((dg.winner = 1 and dp.newcolour < 6) or
(dg.winner = 2 and dp.newcolour > 6))
then 1
else 0
end) as wins
from gameplayers as gp,
( select * from dotagames dg1 where dg.winner <> 0 ) as dg,
games as ga,
dotaplayers as dp,
scores as sc
where and dp.gameid = gp.gameid
and dg.gameid = dp.gameid
and dp.gameid = ga.id
and gp.gameid = dg.gameid
and gp.colour = dp.colour
and sc.name = gp.name
group by gp.name
having totgames >= 30
) as h order by totalscore desc

Changes:
1. count (*) chnaged to count(1)
2. In the FROM, The number of rows are reduced.

👤copperstone

Leave a comment