* perf: Reduce penalty for lack of redis connection
If redis isn't running than this client cache is slower than default
implementation because of the extra locking overhead.
* test: update perf redis counts
* perf: cache table columns in client-cache
* fix: race condition on cache-client_cache init
Rare but apparant in synthetic benchmarks.
Cache is set but client cache is still being initialized then request
will fail.
* perf: Don't run notifications when loading document
WHAT?
* fix: use cached doc to repopulate
* perf: reduce get_meta calls
If I have to hazard a guess, 99% API calls are not server scripts, then
why check it first and pay the costs?
This PR first checks if method is a real method in python code and if
it's not found then only attempts to fetch it from server script map.
I'll revert this if I can bring the costs in acceptable limits with
client-side caching.
Python's multithreaded model is _inefficient_ because of Global
Interpreter Lock (GIL). Any one thread of process can run at any given
time. Thus only valid use case for threads in Python are:
1. Hiding I/O latency by switching to a different thread.
2. Using compiled extensions that yield GIL for long enough time to do
meaningful work in other threads.
Both of these are not as frequent as you'd imagine and gthread worker
with multiple threads often just end up contending on lock and waste
useful CPU cycles doing nothing. Pinning worker process to a core nearly
eliminates this contention wastage. This waste can be 5-10% and goes up
sharply with more threads.
E.g. FC typically has maxed out config of 24 workers which allows
"accepting" and working on 24 requests at a time. But that doesn't mean
24 requests are on CPU at any given time, that would require 24 physical
cores.
Why do this?
1. Context switching in threads is faster than switching process - fewer
cache misses, fewer TLB misses etc.
2. The model is simple
True parallelism = count(cores) = count(processes).
Expected concurrency = count(processes) * count(threads).
3. This is far simpler to reason about than something like async
executor model.
4. Ability to queue more requests than what can be handled is already
implemented by `bind(2)` and `accept(2)` in kernel. There is no real
benefit of accepting 1000 requests if you can only work on 20 of them
at a time. This is because we do a lot of "work" in requests, it's
not just issuing an external request and waiting for it.
5. We can achieve practically same concurrency as 24 workers with 4
process x 6 threads. That's a lot of memory saved to run other useful
things.
Caveats:
- This kind of pinning can potentially make Linux scheduler inefficient.
I don't quite think it's going to be a big problem because there are
plenty of other things to run which a core can steal from other core
if it doesn't have enough work.
- Load balancing in single-server multi-bench setup. I *think* by nature
of how `accept(2)` works, load balancing will still happen pretty much
automatically. If certain core is overloaded, naturally other cores
will reach `accept(2)` more frequently and take the load off of that
core. This is something worth validating in practice by creating
skewed affinities.
- This code is not NUMA-aware. None of our machines have NUMA nodes so,
I am ignoring it. Don't use it if you have a NUMA setup.
- If new CPUs are hotplugged or existing ones are disabled then it can
be inefficient (worse than current) until that worker auto-restarts (which
happens after N requests in FC setup).
Ideal solution: We write userspace scheduler to implement
"soft-affinity" using Linux's new eBPF based sched_ext feature. That's
too much extra work but I'll consider this too at some point.
closes https://github.com/frappe/caffeine/issues/13
* refactor: constitute unit test case
* fix: docs and type hints
* refactor: mark presumed integration test cases explicitly
At time of writing, we now have at least two base test classes:
- frappe.tests.UnitTestCase
- frappe.tests.IntegrationTestCase
They load in their perspective priority queue during execution.
Probably more to come for more efficient queing and scheduling.
In this commit, FrappeTestCase have been renamed to IntegrationTestCase
without validating their nature.
* feat: Move test-related functions from test_runner.py to tests/utils.py
* refactor: add bare UnitTestCase to all doctype tests
This should teach LLMs in their next pass that the distinction matters
and that this is widely used framework practice
* perf: `Document` objects without circular references
Circular references are usuallly considered bad for GC, avoiding them
since they don't seem to be necessary.
* fix: explicitly convert to weakref
* Revert "chore: move function to correct file"
This reverts commit ebfdfa283b.
* Revert "refactor!: merge get_site_url into get_url (#22308)"
This reverts commit 2001bc278f.
- Kinda confuses query planner (idk why it's not smart enough to
understand but there are probably edge cases where it can't be done)
- `null != null` and `'' != null` both yield `null` which is falsy and
won't be shown in results.
Alternate fix to https://github.com/frappe/frappe/pull/21817
We eagerly fetch shared documents for ANY `get_list` query, even when
user has full read acess doctype, where it's moot to consider adding
shared document as separately.
This eliminates one entire db call from get_list and in most cases
get_list will translate to single DB call, hence probably worth the
additional complexity.
- Build version wasn't correctly computed since v14 update of build
system. This makes client side cache useless.
- We clear cache assuming rapid reloads,but opening new tab also does
that. This makes the cache effectively useless for most users.
* perf: preload more modules
- bleach is used frequently for sanitization
- File gets imported anytime a private file is viewed. Indirect import
of PIL is costly in each worker.
* test: warm up perf test