Event timeout issue-- scripters are revolting! ;D

Snoots Dwagon · Post by **Snoots Dwagon** » Mon Mar 22, 2021 8:12 pm

Uh okay, scripters may or may not be revoting... but I thought i'd open up this can o' worms.

Kitely has a 30-second event timeout. I have heard over the past 2 months from 3 scripters that have been confused and irked by this setting. The typical comment: "I brought in a script from xxxx grid and it works just fine there... but on Kitely it has an event timeout."

I understand that Opensim has two default settings:
1) 30 second event timeout
2) It hides timeout errors. The script fails... but the system doesn't let you know. Kitely lets us know.

What I'm going to question though... is that timeout really necessary. I mean, what if that timeout were set to an hour rather than 30 seconds? I don't know the behind-the-scenes implications, but I do know that a lot of scripts (especially those that use timer events) can enter an event and bounce back and forth within that event for much longer than 30 seconds. While there are ways to code around that... those ways are a pain in the hiney scales.

So thought I'd mention this for general information... and to see if there might be an option to a script failing just because it's been using an event structure longer than 30 seconds.

Thanks!

Post by **Ilan Tochner** » Mon Mar 22, 2021 8:28 pm

Hi Snoots,

OpenSim has a script timeout, we just made it so that scripters could actually see the timeout error when it happens instead of things failing silently (which can cause a hell of a time when you want to figure out why something doesn't work). There are various errors that are hidden by default in OpenSim which we've elected to show. It's simply better to know when an error occurs than have it be hidden (try debugging a script when you can't see the errors that are thrown).

In other words, scripts that are throwing errors in Kitely are ones that are encountering a problem. It's almost certain that they are also encountering that same problem when running on other OpenSim grids but you couldn't tell because those grids are just hiding those errors.

The timeout exists because of how the scripting engine works. If scripts didn't time out you could easily exhaust all the threads that deal with scripts (IIRC, Xengine uses only 15 threads for running scripts).

Snoots Dwagon · Post by **Snoots Dwagon** » Mon Mar 22, 2021 9:25 pm

So everyone knows... Ilan and I had already chatted about this, and I told him I was gonna put it on forums for everyone to see and understand, and he said okay.

For those who are having this problem, the workaround to an event timeout can be tricky, but the logic is solid:

Bounce between two events.

For example, let's say you're doing something over and over, but in between that something you have to do another something that is external to your something and then come back to your something.

That's clear, right?

The time within a single event might easily exceed 30 seconds-- such as when you're reading a notecard and conducting a 30 minute tour.

So what you do is "bounce between two events". For example, have a link_message() event that calls the timer() event which when it's finished calls the link_message() event again... until your task is complete. It's not easy to explain and even more complex to do... but this will give scripters an idea of how to bypass this timeout problem. (Note: there are other ways to do this. All of them-- including this one-- require re-writing a large chunk of your script. This is just one concept of like, maybe two.)

Be aware that like all such constructs, it is easy to lose track of your logic and create an eternal loop. So when writing the script it might be good to include a "tracker" variable to tell you how the script is progressing. Otherwise if it enters an eternal loop, you won't be aware until much later.

So now if your friends hit this error, you can send them here!

Thanks for helping on this Ilan. Sometimes the forums are the easiest way to get the word out on a widespread question. : )

John Mela · Post by **John Mela** » Tue Mar 23, 2021 11:49 pm

Just adding to what Snoots has said ...

You don't need to use two different events. I favour keeping everything inside a single event, specifically the link_message() event. At the end of your chunk of processing, send a message to yourself using llMessageLinked() to retrigger the event. To keep it separate from other link messages, use a specific value for the integer portion and test for that at the beginning of the event. For example:

Code: Select all

link_message(integer sender_num, integer num, string str, key id) {
    if (num == 77665544) {
        // do stuff 
        // ...
        // end of stuff
        if (allFinished()) return; // don't process any more if we're finished
        // trigger next batch
        llMessageLinked(LINK_THIS, 77665544, "", NULL_KEY);
    }
}

Snoots Dwagon · Post by **Snoots Dwagon** » Thu Mar 25, 2021 6:24 pm

Ohhh what a neat trick. I didn't even think of calling a link_message from within itself as I thought that would just loop within the event. But it appears to create a new link_message event... which is exactly what's needed. I'm glad to hear that works. Nifty!

Snoots Dwagon · Post by **Snoots Dwagon** » Thu Mar 25, 2021 7:25 pm

BTW, Alexina and I discovered yesterday something to watch for as well. It was an easy "fix" but I'm sure this could affect other areas of scripting.

If you enter an event, and one function within that event takes longer than 30 seconds, that can cause an event timeout.

Since there is no way to script around such a situation, the concept may be to shorten the needs of that function. For example, if the scripted object is moving from point A to point B and that single move takes longer than 30 seconds, chop up that one long move into four shorter moves.

So it's a balance between making sure the script works correctly... and making sure that how the script is used follows the rules as well. There is more than one way to work around this Event Timeout problem (as we've seen above). The most important thing to realize is that it exists, and on some scripts can be a real show-stopper. It is good that Kitely has decided to make this error visible rather than keeping it hidden as standard Opensim code does. Nothing worse that a script-stopper and the scripter has no idea what just happened.

That said, what I'd really love to see is some indication of which event has timed out and if possible... where it timed out at. That would be really neat.

Christine Nyn · Post by **Christine Nyn** » Thu Mar 25, 2021 9:24 pm

I'm wondering why a script would need to be in an event for longer than 30 seconds, or why a function called from within an event would need longer than 30 seconds to complete. It does rather suggest that somewhere there might be a function which is essentially marking time within itself and so hogging the thread rather than giving it back for the use of other scripts that might need it. Admittedly there are some occasions when large amounts of data might need to be processed but such scripts tend to be fairly specialised and not commonly encountered.

Snoots Dwagon wrote: ↑
Thu Mar 25, 2021 7:25 pm
If you enter an event, and one function within that event takes longer than 30 seconds, that can cause an event timeout.
Since there is no way to script around such a situation....

Actually you don't "script around" such situations. You look ahead when designing a script and if large delays look possible you find ways of tracking where a process is up to so that you can interrupt that process and resume it seamlessly, thus sidestepping any inherent delay.

John Mela showed an approach for doing this kind of thing that he uses in his RezMela system, link here:

viewtopic.php?p=28435#p28435

Use of the timer event to break up this kind of processing is severely limited by the usual OpenSim minimum timer of 0.5 seconds, which is why John favours using calls to link_message to retrigger extensive batch processing. If the limitation being dealt with involves waiting for a result rather than dealing with a large amount of data another possible approach is to hand off the waiting and checking to a helper script using llMessageLinked and then responding when that script signals it has detected the desired result.

Snoots Dwagon · Post by **Snoots Dwagon** » Fri Mar 26, 2021 3:08 pm

You are correct on all counts Christine... if one knows of this limitation to start with. But as Kitely realized, many scripters aren't aware of this limitation because Opensim by default doesn't notify of the timeout problem. Kitely wrote that notification in there themselves for the specific purpose of helping scripters. They had to add another notice regarding "too many events".

Do only "specialized" scripts have timeout issues? One not-uncommon example is a non-physical move function. The scripter wishes to move a non-physical vehicle from point A to point B in increments of a meter or less... and in the middle of that trip say something to the passenger. So they create a loop to perform that move. That totally logical process (say, on a long straightaway) takes longer than 30 seconds and the event times out. It has nothing to do with a script "hogging" anything. It's someone trying to use standard, everyday coding logic to perform a task... but Opensim not allowing it. So a "workaround" is required. Scripting threads exist all over SL and OS forums... to discuss similar scripting problems and possible fixes.

The point I made in "work arounds" involved situations in which no amount of scripting is going to reach a desired goal. In such case, the concept itself must be changed to "work around" the issue.

Basically, event timeouts can be problematic. If one has been deep-scripting for years and is aware of all the caveats (and there are lots of caveats), then they can plan ahead for such, as you say. But for those learning scripting, or experienced scripters who have not come across that particular issue before, caveats can be real head-scratchers. I've spoken with some long-time scripters lately who were unaware of the event timeout issue, or the timer .5 second delay, or the erratic llSleep() script-killing failure (and uncounted more issues).

Thus forums posts like this. : )

Christine Nyn · Post by **Christine Nyn** » Fri Mar 26, 2021 10:09 pm

Essentially, the root of most scripting "problems" where LSL is the scripting language stem from insufficient understanding of exactly how an event driven script engine works, what shared resources are available to it, and what strategies should be adopted to ensure everyone gets a more or less equal chance to be able to use those resources.

In the example you give (of creating a loop to move a non-physical vehicle from point A to point B over a time scale which conceivably could be longer than 30 seconds) someone with a background in basic programming will, as you say, set up a loop to perform the task. That loop will almost certainly be a closed loop that runs until the task reaches its desired endpoint. Not only will that task fail because the script engine will suspend it for dawdling overlong in whatever function it was in but if several such scripts were running they would deny other scripts timely access to the script engine's resources. What follows is lag or quite possibly a sim on which it is almost impossible to get anything done within a reasonable time scale.

An alternative approach might be to calculate the distance from A to B before starting, figure out how many individual steps at a given speed it would take to complete the move (we'll ignore the method of movement here) and then allow a timer to regulate the implementation of each move towards completion. The basic thinking here is different in nature to the "make a loop" approach as it recognises that resources are limited and instead of the script grabbing a chunk of processor time for itself and sitting on it it time-slices its requirements and allows other scripts access to the processor in between times.

There's no criticism intended or implied here, I'm just looking at a given example and agreeing with you that one apparently logical approach can lead to problems and a different approach can avoid those problems. When driving a car it's perfectly possible to get from A to B while performing all the necessary actions do that, but you wouldn't pass a test and get a driving licence if that's all you could do. Getting your licence would involve showing an awareness of road conditions and other road users, and I do feel that although there is no licence required scripters really need to script with their heads up, knowing the conditions they are working within and being aware that theirs are not the only needs being served.

Snoots Dwagon · Post by **Snoots Dwagon** » Sat Mar 27, 2021 4:19 am

I agree with you Christine... in theory. Ideally in a Utopian society, all scripters would know precisely how every action of a script affects every aspect of world performance, how it interacts with the asset server, and how it affects the viewer.

The reality however, is that 99.9% of scripters aren't aware of all that. (That figure just off the top of my head.) The average scripter simply scripts, according to standard coding logic that would work just about anywhere else. And anywhere else... a loop function would be the vehicle to perform a repetitious task.

However, the manner in which a script affects behind the scenes operations of the instance server is invisible, for the most part un-trackable (without writing extensive debugging code), and unmeasured during script operation. The average coder would have no reason to expect LsL to be so very badly behaved that one script can affect every other script on the system by simply using a logical loop. That it does so indicates inefficient system coding and time sharing rather than scripter error or lack of understanding of coding logic.

I've been coding LsL since early 2005. I coded professionally about 20 years prior to that. Yet to this day I see things in LsL that astound me in lack of logic and planning in the environment of a virtual world. As a result, scripters have to write work-arounds quite often. If anyone doesn't believe it, ask Balpien Hammerer about his and my experiences with coding music players over 15+ years... and how many times we had to change code that had prior worked perfectly... because of some invisible and unknown system shift "behind the scenes". We had to update code so many times (I stopped counting at a dozen significant updates) we both finally just gave up on it and moved on to other things. At one time he and I both had the best music players on SL, players that worked flawlessly. Now we don't. Nobody does.

Every language has its quirks and issues... but LsL seems to excel at the prevalence of "caveats" and things that just don't make sense logically. It is a common trait of deep-level techs to forget how non-techs think... or even how non-LsL coders think. They just believe everyone should know this stuff. However, these issues are not obvious, are ill-documented, and what documentation there is consists of so much technobabble that even a professional often has difficulty understanding it.

To test LsL event logic out, today I wrote and tested a short program to see how an event calling itself operated under several conditions. From a standard-coder's standpoint, the results were illogical (the logic was inconsistent). So when a coder with over 3 decades of experience (myself) stumbles across such things in LsL, and when the Kitely company has to write special error notices to let scripters know these things are happening, imagine how such issues affect a new coder, or an experienced coder trying to adapt to LsL and its myriad concepts.

Prominent example: after all these years, why does llSleep()-- one of the most basic concepts of scripting-- have an unpredictable and intermittent flaw that can not only stop a script, but cause it to fail to the point of requiring replacement? And what is a scripter supposed to use in its place, since llSleep() is a failure-prone function? Is a scripter who unwittingly uses llSleep() to blame... or does the blame rest with the failure of LsL to properly implement that function?

That is why we regularly write work-arounds. I had to write a work-around to llSleep() that I've shared with other coders. (It's not a pretty hack, but it works.)

That said, I believe you are correct that some scripting problems occur because people don't understand what goes on behind-the-scenes in the script engine, or the internal / invisible logic of the LsL language, and don't know all the caveats. The problem is... those people comprise the vast majority of LsL scripters.
.

Event timeout issue-- scripters are revolting! ;D

Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D

Re: Event timeout issue-- scripters are revolting! ;D