* upstream/develop: (1373 commits)
perf: cache dynamic links map in Redis (#28878)
fix: Never query `flag_print_sql` in `developer_mode=0` (#28884)
fix(restore): remove MariaDB view security definers
fix: sanitize user input during setup wizard
feat(sanitize_column): improve check
refactor: make optimizations.py private entirely (#28872)
fix(site_cache): site cache thread safety (#28870)
chore(printview): change error message
perf: speedup `frappe.call` by ~8x (#28866)
test: reduce noise in test output (#28862)
chore: spelling_invalid_values (#28858)
fix: Remove misleading os.O_NONBLOCK flag (#28859)
fix: string replacement in error logger
perf(gthread): Pin web workers to a single core (#28854)
fix: MariaDBDatabase.get_tables() should not query the entire database schema (#28846)
fix: add strings and fields to translation
fix: typo in test controller boilerplate
perf: faster add_to_date (#28843)
perf(version): Make get_versions fast for autoincrement doctypes (#28847)
refactor: log in monitor as well
...
Note about correctness: Once site has seen enough usage this map will rarely change. So the
problem of "cache inconsistency" is very rare, still care is taken to
avoid possible cache inconsistencies.
Unnecessary overhead and need to disable this everytime I want to get
realistic performance numbers out.
All the performance affecting toggles should be directly controlled by
just `developer_mode` alone.
Identified two cases where site cache can break:
1. Other thread clears cache using clear_cache because of TTL or manual
eviction.
2. Other thread pops the eliment we are about to read because of
`maxsize` limit.
This change should fix both and even make it lil bit faster.
* chore: remove verbose output from test runner
This is same output that's shared by test runner in different format?
This makes it annoying to scroll through when just running single test
locally.
* fix: Remove clutter from test output
Test records don't change after first run.
Tests are executed many many times locally
* test: retry flaky postgres backup tests
Python's multithreaded model is _inefficient_ because of Global
Interpreter Lock (GIL). Any one thread of process can run at any given
time. Thus only valid use case for threads in Python are:
1. Hiding I/O latency by switching to a different thread.
2. Using compiled extensions that yield GIL for long enough time to do
meaningful work in other threads.
Both of these are not as frequent as you'd imagine and gthread worker
with multiple threads often just end up contending on lock and waste
useful CPU cycles doing nothing. Pinning worker process to a core nearly
eliminates this contention wastage. This waste can be 5-10% and goes up
sharply with more threads.
E.g. FC typically has maxed out config of 24 workers which allows
"accepting" and working on 24 requests at a time. But that doesn't mean
24 requests are on CPU at any given time, that would require 24 physical
cores.
Why do this?
1. Context switching in threads is faster than switching process - fewer
cache misses, fewer TLB misses etc.
2. The model is simple
True parallelism = count(cores) = count(processes).
Expected concurrency = count(processes) * count(threads).
3. This is far simpler to reason about than something like async
executor model.
4. Ability to queue more requests than what can be handled is already
implemented by `bind(2)` and `accept(2)` in kernel. There is no real
benefit of accepting 1000 requests if you can only work on 20 of them
at a time. This is because we do a lot of "work" in requests, it's
not just issuing an external request and waiting for it.
5. We can achieve practically same concurrency as 24 workers with 4
process x 6 threads. That's a lot of memory saved to run other useful
things.
Caveats:
- This kind of pinning can potentially make Linux scheduler inefficient.
I don't quite think it's going to be a big problem because there are
plenty of other things to run which a core can steal from other core
if it doesn't have enough work.
- Load balancing in single-server multi-bench setup. I *think* by nature
of how `accept(2)` works, load balancing will still happen pretty much
automatically. If certain core is overloaded, naturally other cores
will reach `accept(2)` more frequently and take the load off of that
core. This is something worth validating in practice by creating
skewed affinities.
- This code is not NUMA-aware. None of our machines have NUMA nodes so,
I am ignoring it. Don't use it if you have a NUMA setup.
- If new CPUs are hotplugged or existing ones are disabled then it can
be inefficient (worse than current) until that worker auto-restarts (which
happens after N requests in FC setup).
Ideal solution: We write userspace scheduler to implement
"soft-affinity" using Linux's new eBPF based sched_ext feature. That's
too much extra work but I'll consider this too at some point.
closes https://github.com/frappe/caffeine/issues/13
* perf: resolve rounding method once
When rounding method is explcitly specified it's 1.4x faster.
* perf: reorder checks
Bankers rounding is default and most common now
* perf: speedup get_system_settings
* fix: fallback for always printing tracebacks
I don't recall ever hitting "no" to this prompt. It's of no use for me.
Also, this makes automated scripts not really automated.
* revert: prompting for exceptions
Always print full exception
In some cases, while running in docker, we end up with:
```
[Errno 18] Invalid cross-device link: 'tmp<hash>' -> './assets/frappe'
```
Using `shutil.move` fixes this as it supports different filesystems, `os.replace` doesn't
Signed-off-by: Akhil Narang <me@akhilnarang.dev>
* perf: Restore dict's flat overrides
Using `super()` is unnecessary cost. This class is used A LOT. Ref: https://github.com/frappe/frappe/pull/16449/
Please consider performance while adding types, it's almost always possible to achieve good typing without this.
Also `frappe._dict` is almost always used as `dict[Any, Any]` or
`dict[str, Any]`, type annotations are useless here.
* ci: ugh wait for processes to exit
* perf: Use latest pickle protocol
* perf: pop flags from cached documents
This is also the right thing to do, things like `doc.flags.for_update`
shouldn't be "cached".
Developers can easily enable `can_cache` without knowing what it
entails. Public cache means proxy can likely cache things without
talking to backend.
Obviously many endpoints which can be cached on client side should
probably not be cached in proxy.
E.g. linked PR to the PR that added this feature suggest caching
notification log for short time... we don't want to leak one user's
cached notification to another user.
I don't buy that developers should know about cache implementation to
ensure it's secure or correct to enable it on certain endpoint. In
addition to that, we have very few mechanisms to burst cache
inside proxy. End user hitting ctrl+shift+r won't do anything if proxy
wants to serve stale response.
We should figure out better way to instruct FW about final cache
control headers than hardcoding it IMO.