Welcome! Log In Create A New Profile

Advanced

[PATCH] BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

Posted by PiBa-NL 
Hi List, Simon and Baptiste,

Sending to both of you guys as its both tcp-check and email related and
you are the maintainers of those parts.
Patch subject+content basically says it all (i hope.).

It is intended to fixes yesterdays report:
https://www.mail-archive.com/[email protected]/msg28158.html

Please let me know if it is OK, or should be done differently.

Thanks in advance,
PiBa-NL / Pieter
From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
From: PiBa-NL <[email protected]>
Date: Wed, 6 Dec 2017 01:35:43 +0100
Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids sending lots of emails when 'option log-health-checks' is used.
It is avoided to change the server state and possibly queue a new email while processing the email alert by checking if the check task is being processed for the process_email_alert struct.

This needs to be backported to 1.8.
---
src/checks.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/src/checks.c b/src/checks.c
index eaf84a2..55bfde2 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);

static struct pool_head *pool_head_email_alert = NULL;
static struct pool_head *pool_head_tcpcheck_rule = NULL;
+static struct task *process_email_alert(struct task *t);


static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
@@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
*/
static void set_server_check_status(struct check *check, short status, const char *desc)
{
+ if (check->task->process == process_email_alert)
+ return; // email alerts should not change the status of the server
+
struct server *s = check->server;
short prev_status = check->status;
int report = 0;
--
2.10.1.windows.1
Hi Pieter,

CCing Emeric since these parts have changed a bit for threads and
there may be some subtle things we oversee.

thanks for this!
Willy

On Wed, Dec 06, 2017 at 02:11:53AM +0100, PiBa-NL wrote:
> Hi List, Simon and Baptiste,
>
> Sending to both of you guys as its both tcp-check and email related and you
> are the maintainers of those parts.
> Patch subject+content basically says it all (i hope.).
>
> It is intended to fixes yesterdays report:
> https://www.mail-archive.com/[email protected]/msg28158.html
>
> Please let me know if it is OK, or should be done differently.
>
> Thanks in advance,
> PiBa-NL / Pieter

> From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
> From: PiBa-NL <[email protected]>
> Date: Wed, 6 Dec 2017 01:35:43 +0100
> Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
> email-alert task
>
> This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids sending lots of emails when 'option log-health-checks' is used.
> It is avoided to change the server state and possibly queue a new email while processing the email alert by checking if the check task is being processed for the process_email_alert struct.
>
> This needs to be backported to 1.8.
> ---
> src/checks.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/src/checks.c b/src/checks.c
> index eaf84a2..55bfde2 100644
> --- a/src/checks.c
> +++ b/src/checks.c
> @@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);
>
> static struct pool_head *pool_head_email_alert = NULL;
> static struct pool_head *pool_head_tcpcheck_rule = NULL;
> +static struct task *process_email_alert(struct task *t);
>
>
> static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
> @@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
> */
> static void set_server_check_status(struct check *check, short status, const char *desc)
> {
> + if (check->task->process == process_email_alert)
> + return; // email alerts should not change the status of the server
> +
> struct server *s = check->server;
> short prev_status = check->status;
> int report = 0;
> --
> 2.10.1.windows.1
>
Hi Pieter,

I'm CCing Christopher, he did some test on your patch.

R,
Emeric

On 12/06/2017 07:06 AM, Willy Tarreau wrote:
> Hi Pieter,
>
> CCing Emeric since these parts have changed a bit for threads and
> there may be some subtle things we oversee.
>
> thanks for this!
> Willy
>
> On Wed, Dec 06, 2017 at 02:11:53AM +0100, PiBa-NL wrote:
>> Hi List, Simon and Baptiste,
>>
>> Sending to both of you guys as its both tcp-check and email related and you
>> are the maintainers of those parts.
>> Patch subject+content basically says it all (i hope.).
>>
>> It is intended to fixes yesterdays report:
>> https://www.mail-archive.com/[email protected]/msg28158.html
>>
>> Please let me know if it is OK, or should be done differently.
>>
>> Thanks in advance,
>> PiBa-NL / Pieter
>
>> From bf80b0398c08f94bebec30feaaddda422cb87ba1 Mon Sep 17 00:00:00 2001
>> From: PiBa-NL <[email protected]>
>> Date: Wed, 6 Dec 2017 01:35:43 +0100
>> Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
>> email-alert task
>>
>> This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids sending lots of emails when 'option log-health-checks' is used.
>> It is avoided to change the server state and possibly queue a new email while processing the email alert by checking if the check task is being processed for the process_email_alert struct.
>>
>> This needs to be backported to 1.8.
>> ---
>> src/checks.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/src/checks.c b/src/checks.c
>> index eaf84a2..55bfde2 100644
>> --- a/src/checks.c
>> +++ b/src/checks.c
>> @@ -72,6 +72,7 @@ static int tcpcheck_main(struct check *);
>>
>> static struct pool_head *pool_head_email_alert = NULL;
>> static struct pool_head *pool_head_tcpcheck_rule = NULL;
>> +static struct task *process_email_alert(struct task *t);
>>
>>
>> static const struct check_status check_statuses[HCHK_STATUS_SIZE] = {
>> @@ -198,6 +199,9 @@ const char *get_analyze_status(short analyze_status) {
>> */
>> static void set_server_check_status(struct check *check, short status, const char *desc)
>> {
>> + if (check->task->process == process_email_alert)
>> + return; // email alerts should not change the status of the server
>> +
>> struct server *s = check->server;
>> short prev_status = check->status;
>> int report = 0;
>> --
>> 2.10.1.windows.1
>>
>
Le 06/12/2017 à 02:11, PiBa-NL a écrit :
> Hi List, Simon and Baptiste,
>
> Sending to both of you guys as its both tcp-check and email related and
> you are the maintainers of those parts.
> Patch subject+content basically says it all (i hope.).
>
> It is intended to fixes yesterdays report:
> https://www.mail-archive.com/[email protected]/msg28158.html
>
> Please let me know if it is OK, or should be done differently.
>

Hi,

I'm able to reproduce the bug and your patch fixes it.

The difference between 1.7 and 1.8 is that, when an email alert is
created, the check->status field is uninitialized in 1.7. In 1.8, the
status is set to HCHK_STATUS_INI.

So another way to fix the bug is to leave check->status field
uninitialized, as for 1.7. I amended your patch.

Honestly, I don't know which version is the best. Email alerts should
probably be rewritten to not use the checks. This was the only solution
to do connections by hand when Simon implemented it. That's not true
anymore.

Anyway, the bug must be fixed. Thanks.

--
Christopher Faulet
On Thu, Dec 07, 2017 at 04:27:16PM +0100, Christopher Faulet wrote:
> Honestly, I don't know which version is the best.

Just let me know guys :-)

> Email alerts should
> probably be rewritten to not use the checks. This was the only solution to
> do connections by hand when Simon implemented it. That's not true anymore.

I agree and I think I was the one asking Simon to do it like this by then
eventhough he didn't like this solution. That was an acceptable tradeoff
in my opinion, with very limited impact on existing code. Now with applets
being much more flexible we could easily reimplement a more complete and
robust SMTP engine not relying on hijacking the tcp-check engine anymore.

Willy
Hi Christopher, Willy,

Op 7-12-2017 om 19:33 schreef Willy Tarreau:
> On Thu, Dec 07, 2017 at 04:27:16PM +0100, Christopher Faulet wrote:
>> Honestly, I don't know which version is the best.
> Just let me know guys :-)
imho Christopher's patch is smaller and probably easier to maintain and
eventually remove without adding (unneeded) code to the
set_server_check_status(). Though it is a bit less obvious to me that it
will have the same effect, i works just as well.
>
>> Email alerts should
>> probably be rewritten to not use the checks. This was the only solution to
>> do connections by hand when Simon implemented it. That's not true anymore.
> I agree and I think I was the one asking Simon to do it like this by then
> eventhough he didn't like this solution. That was an acceptable tradeoff
> in my opinion, with very limited impact on existing code. Now with applets
> being much more flexible we could easily reimplement a more complete and
> robust SMTP engine not relying on hijacking the tcp-check engine anymore.
>
> Willy

A 'smtp engine' for sending email-alert's might be nice eventually but
that is not easily done 'today'. (not by me anyhow) (Would it group
messages together if multiple are created within a short time-span?)

As for the current issue / patch, i prefer the solution Christopher
found/made.

Made a new version of it with a bit of extra comments inside the code,
removed a unrelated white-space change, and added a matching patch
description.
Or perhaps Christopher can create it under his own name? Either way is
fine for me. :)

Regards,
PiBa-NL / Pieter

From 3129e1ae21e41a026f6d067b3658f6643835974c Mon Sep 17 00:00:00 2001
From: PiBa-NL <[email protected]>
Date: Wed, 6 Dec 2017 01:35:43 +0100
Subject: [PATCH] BUG/MEDIUM: email-alert: don't set server check status from a
email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and avoids sending lots of emails when 'option log-health-checks' is used. It is avoided to change the server state and possibly queue a new email while
processing the email alert by setting check->status to HCHK_STATUS_UNKNOWN which will exit the set_server_check_status(..) early.

This needs to be backported to 1.8.
---
src/checks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index eaf84a2..3a6f020 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -3145,7 +3145,7 @@ static struct task *process_email_alert(struct task *t)
t->expire = now_ms;
check->server = alert->srv;
check->tcpcheck_rules = &alert->tcpcheck_rules;
- check->status = HCHK_STATUS_INI;
+ check->status = HCHK_STATUS_UNKNOWN; // the UNKNOWN status is used to exit set_server_check_status(.) early
check->state |= CHK_ST_ENABLED;
}

--
2.10.1.windows.1
Hi Pieter,

On Thu, Dec 07, 2017 at 11:02:13PM +0100, PiBa-NL wrote:
> Made a new version of it with a bit of extra comments inside the code,
> removed a unrelated white-space change, and added a matching patch
> description.

OK, now applied, thank you!

Willy
Sorry, only registered users may post in this forum.

Click here to login