I do my development in Ubuntu-22.04 Linux running on the Windows Subshell for Linux. I recently got a laptop refresh and the latest software doesn’t run. The fix is obscure, so I thought I’d document it.
sbcl runs fine out of the box in WSL2, but I’m encountering a bug where TCP connections to one particular server are being left in the CLOSE_WAIT state indefinitely. After several minutes, I hit the limit on the number of open files.
The “right thing” would be to track down who isn’t
closing the connection properly, but it’s only a few hundred
connections. It appears that ulimit
is set to 1024,
which is pretty easy to hit with this bug.
Bumping ulimit
to something more reasonable is a lazy
workaround. It isn’t a solution — I’m still leaking open
files — but I’ll be able to leak thousands of them without
having problems.
But increasing nofiles
turned out to be a problem. I
edited all the magic files in /etc
until they all said
I could have 131071 open files. When I re-started WSL, all the ways
I could start a shell agreed that the ulimit
was
131071, yet when I started sbcl
and ran this:
(uiop:run-program "prlimit" :output *standard-output*) RESOURCE DESCRIPTION SOFT HARD UNITS AS address space limit unlimited unlimited bytes CORE max core file size 0 unlimited bytes CPU CPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited bytes LOCKS max number of file locks held unlimited unlimited locks MEMLOCK max locked-in-memory address space 67108864 67108864 bytes MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes NICE max nice prio allowed to raise 0 0 NOFILE max number of open files 1024 1048576 files NPROC max number of processes 62828 62828 processes RSS max resident set size unlimited unlimited bytes RTPRIO max real-time priority 0 0 RTTIME timeout for real-time tasks unlimited unlimited microsecs SIGPENDING max number of pending signals 62828 62828 signals STACK max stack size 8388608 unlimited bytes NIL NIL 0 (0 bits, #x0, #o0, #b0)
The limit was at the old value of 1024.
WSL launched sbcl
without a shell, so
the ulimit
setting was not being run.
The solution is easy, but it took me a long time to figure it out.
Not only do you need to edit all the magic in /etc
, and
add ulimit
statements to your .bashrc
, you
should also add ulimit
statements to
your .profile
, and then instruct wsl
to
launch your program under a login shell:
(require ’sly) (setq sly-lisp-implementations ’((sbcl ("C:\\Program Files\\WSL\\wsl.exe" "--distribution-id" "{df4f07a6-2142-405c-8a6a-63f1ca3a7e8d}" "--cd" "~" "--shell-type" "login" "/usr/local/bin/sbcl") )))
This bit of insanity allows me to run sbcl
with 131071
open files in Linux as my inferior lisp program in a Windows Emacs
running SLY. (Running Emacs under Windows gives me a way to use a
modified Dvorak keyboard. I could run Emacs in the Linux subsystem,
but the Wayland server is in a container and doesn’t let you modify
the keyboard.)
3 comments:
This bug sounds like it could belong to a Embrace Extend Exterminate strategy of some "leading" company. I doubt that Canonical would distribute a Ubuntu version with such a bug. Anyway, the Emacs-Windows thing: why use Wayland when Xorg would do? Just because Wayland is the new thing, which proclaims to be a better architecture? Xorg is more mature, faster, has less bugs and better tools. :)
Seems super complicated. Not sure why folks claim Linux is hard to use when Windows requires all these hoops.
It is complicated. But I wouldn't attribute it all to Windows. It is me trying to run both at the same time. The fact that it works at all is pretty remarkable.
Update: I found the socket leak. The server I was contacting was sending a JSON reply, but was sending extra characters after the JSON. After reading the JSON, I closed the stream, but since there were unread characters pending, the OS kept the socket open. I added some code to drain the stream after reading the JSON and all the sockets in CLOSE_WAIT state disappeared.
I'm not sure if this is a bug. If you close an input stream (or abandon it to garbage collection) and it still has pending characters, maybe the system should drain the stream for you. There is no way to drain it yourself after you've closed or abadoned it.
Post a Comment