Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Dec 18, 2012

Sep 30, 2012

Jul 11, 2012

Pgtune - postgresql.conf tuning wizard

pgtune takes the wimpy default postgresql.conf and expands the database server to be as powerful as the hardware it's being deployed on
sudo apt-get install pgtune
pgtune -i /etc/postgresql/8.4/main/postgresql.conf -T Web -c 100 -M 96000000

Jun 27, 2012

apachetop

sudo apachetop -f logs/apache_access.log

Mar 6, 2012

Scalability panel (djangocon.eu)

http://lanyrd.com/2011/djangocon-europe/sfpth/


What are the common mistakes you see. Things that take a long time (“sending an email”) that lock up a process. Using the filesystem for caching. Testing locally and pushing it live and see it fall over.
Using an external API can also take a long time. Work with timeouts to work around it.
Scaling wise, what should you think about before scaling becomes an actual issue? And what should you definitely leave until the moment comes you need to scale? First things first: use a queue like celery, even on small sites. Get used to such a queue and have it in place, that’ll help a lot with performance later.
Make sure you’ve got your database schema is mostly OK. It doesn’t have to be perfect right away, but at least mostly OK.
Expect something to change and assume you’ll have to swap something out later to improve the performance.
One important aspect of scaling is measurement and profiling. What are the best practices and good tools for doing that in production? Bitbucket has a middleware that switches on with a special query string and that starts up the python c profiler and gives them data on the request.
The debug toolbar is a great help in development. For realtime stats graphite and statsd are an option. Or munin or kakti for real-time generic server information graphs.
Logging. Always set up logging. Look at the logfiles and figure out what happened.
Opennms, pingdom, munin, nagios, django-kong were mentioned as monitoring tools.
Puppet vs Chef vs Whatever for provisioning servers in a Django stack. Fight! Puppet is good. Chef is good. Puppet is alright. (So: not much of a fight :-) )
Django ORM: how much of an issue is this going to be when I want to scale? It is much less an issue than it used to be.
You’ll only get to know the hotpoints for YOUR application when you run into them. When you optimize beforehand, the points will be different than those you’ll really hit. And then there are ways to solve it. Caching, asynchronous, less joins, splitting things, etc. You can denormalize, too.
Simple: check your indexes. Do you have the right ones? Are you missing ones?
Also changing your actual database server configuration default values can make a lot of difference. Spend two days figuring out all the options. And check postgres for rediculously low default memory values.
Incremental roll-outs help with detecting problems. When all your 15 new instances suddenly die, you know you need to change something.
Considering that using a caching proxy, like for instance Varnish, is commonly used for improving performance/scalability, are there any options out there for Django which handle cache invalidation in a good manner that you know of? Use etags.
Most caching is dependent fully on your individual app. So something generic is virtually impossible.
Varnish gives you lots of control. You can invalidate pages from your python code. So set up a couple of proper database triggers.
Is Django fast enough? Should more attention be on speed and benchmark tests? Yes and yes. It is fast enough, but we should watch it.
Django is fast enough. If you want to scale, scale over multiple boxes instead of building out one single box.
But: watch out that django doesn’t get any slower!
Code deployment to web workers: there are lots of different ways, can we get the groups thoughts on the best practices?
  • By hand.
  • Pip. But it is a bit slow. Now they use github (with a local git mirror for their sites).
  • Fabric.
  • Simple bash script that ssh-s to the server and that updates everything.
  • HAProxy helps in getting a server offline and getting it transparently back up after the update.
If you were starting a new project today, which Python VMs would you consider? Probably cpython as an ops person would probably not allow us to run pypy. But I’m watching pypy and it looks good.
What’s the worst scalability failure story you’ve ever heard?
  • Running postgres with 32MB of memory (a default setting...).
  • A sysadmin that, to prove his valid point, pulled the electricity plug out of the live machine. He won.
  • Returning a string instead of an iterator in a wsgi script that was getting lots of hits. One character at a time...
What do you use to find slow sql queries? Django debug toolbar. Another trick is to evaluate querysets early to better see what’s going on (as querysets are lazily evaluated).
Use mysql/postgres’s configuration option to log slow queries.
How do you mimick a large load and can you simulate it? Use apachebench, but keep in mind that that won’t be a “perfect” worst-case load.
The other answers were mostly “we can’t do that”. Incremental roll-outs help. Key question: can you respond quickly? Can you deploy quickly.
How to handle database rollbacks when you rollback a release? Most either don’t use southor they don’t do rollbacks. A migration can only ADD columns or tables. They’re never removed. Never. Addition-only. This way the old code can talk just fine to the new database structure.
Suggested reading: always ship trunk.
Which wsgi runner do you use? Mod_wsgi is not out of date or slow in any way, it works just fine.
Gunicorn is awesome. Especially the built-in asynchronous mode and eventlet can help a lot if you use it.
What’s your best experience regarding scalability?
  • A php site. 0.5MB traffic to 100 MB of traffic within in a month. It teached him a lot.
  • A plone site 8 years ago. Plone was two request-a-second at that time. They had squid in front. It was the oxfam site that was used for post-tsunami donations. He was Real Happy with squid that day.
  • Some multimedia website. From 0 to 8 million users in one year. Learning on the job!
How to deal with backfilling data? After adding tables, you sometimes have to fill them with default data. How to do it without killing your server? Use celery and use a management command to slowly push small batches unto your task queue.

Nov 2, 2011

Python / Django Performance profiling

https://code.djangoproject.com/wiki/ProfilingDjango
http://ianozsvald.com/2012/03/18/high-performance-python-1-from-pycon-2012-slides-video-src/
Профилирование и отладка Python, инструменты

Django

django-extensions - this is a repository for collecting global custom management extensions for the Django Framework
python manage.py runprofileserver --use-cprofile --prof-path=/tmp/output
django-perftools
  • QueryCountLoggingMiddleware - Perftools includes a logger that will monitor requests execution time. Once it hits the defined threshold, it will log to the named perftools logger, including the metadata for the request (as defined by Sentry's logging spec).
  • RemoteProfilingMiddleware - Profiles a request and saves the results to disk.
  • SlowRequestLoggingMiddleware - Logs requests which exceed a maximum number of queries.
django-profiler is util for profiling python code mainly in django projects but can be used also on ordinary python code. It counts sql queries a measures time of code execution. It logs its output via standard python logging library and uses logger profiling. If your profiler name doesn't contain any empty spaces e.g. Profiler('Profiler1') django-profiler will log all the output to the profiling.Profiler logger. @profilehook decorator uses profilehooks python package to gather code execution stats. Except it logs via standard python logging it also outputs code execution stats directly to sys.stdout.
from profiling import profile

@profile
def complex_computations():
    #some complex computations

django-processinfo - application to collect information about the running server processes.

dogslow - Dogslow is a Django watchdog middleware class that logs tracebacks of slow requests.

Python

line_profiler - is a module for doing line-by-line profiling of functions. kernprof is a convenient script for running either line_profiler or the Python standard library's cProfile or profile modules, depending on what is available.
runsnakerun - is a small GUI utility that allows you to view (Python) cProfile or Profile profiler dumps in a sortable GUI view. It allows you to explore the profiler information using a "square map" visualization or sortable tables of data. It also (experimentally) allows you to view the output of the Meliae "memory analysis" tool using the same basic visualisations.
dis - The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.
def myfunc(alist):
    return len(alist)

>>> dis.dis(myfunc)
  2           0 LOAD_GLOBAL              0 (len)
              3 LOAD_FAST                0 (alist)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE

plop - Plop is a stack-sampling profiler for Python. Profile collection can be turned on and off in a live process with minimal performance impact.
statprof.py - This package provides a simple statistical profiler for Python.
pytrace - is a fast python tracer. it records function calls, arguments and return values. can be used for debugging and profiling.

WSGI

wsgi-shell - The 'ispyd' package provides an in process shell for introspecting a running process. It was primarily intended for investigating running WSGI application processes, specifically to determine what a process is doing when it hangs, but has many other features as well. This includes being able to start an embedded interactive Python interpreter session, set debugger probe points to record tracebacks for exceptions and then later run 'pdb' in post mortem mode on those exceptions.

Linux

top -H
1 - нагруженность цпу
С - сортировать по цпу
М - сортировать по памяти
к - прибить процесс
с - показать путь к комманде

vmstat - выдает информационный отчет о активности процессов, памяти, свопинга, поблочного ввода/вывода, прерываний и процессора
w - кто зарегистрирован и что они делает
free – использование памяти
pstree - процессы в виде иерархии

ps
ps -u someusername -o pid,%cpu,%mem,start_time,size=-size-,state,cmd
ps -u someusername -o pid,%cpu,%mem,start_time,size=-size-,state,comm | grep runfastcgi.fcgi

Disks usage
df -h