Spaces:

itsalissonsilva
/

CreditRiskAIAssitant

Sleeping

App Files Files Community

itsalissonsilva commited on May 2

Commit

77cce75

verified ·

1 Parent(s): 96c9937

Upload 10 files

Browse files

Files changed (10) hide show

attached_assets/Pasted--Does-the-current-PD-model-align-with-the-latest-IFRS9-regulatory-requirements-Based-on-the-pr-1743447418785.txt +184 -0
attached_assets/Pasted-Document-Preview-File-report-test-txt-Size-5-07-KB-File-Type-TXT-Preview-document-content--1745328656867.txt +88 -0
attached_assets/Pasted-Document-Preview-File-report-test-txt-Size-5-07-KB-File-Type-TXT-Preview-document-content--1745328760272.txt +88 -0
attached_assets/Pasted-import-os-import-uuid-import-logging-from-pathlib-import-Path-from-typing-import-List-Dict-from-pyd-1743447350035.txt +227 -0
attached_assets/__init__.py +1 -0
attached_assets/consolidated_analysis.py +205 -0
attached_assets/file_handler.py +37 -0
attached_assets/gap_analysis.py +77 -0
attached_assets/ifrs9_analysis.py +146 -0
attached_assets/pdf_generator.py +201 -0

attached_assets/Pasted--Does-the-current-PD-model-align-with-the-latest-IFRS9-regulatory-requirements-Based-on-the-pr-1743447418785.txt ADDED Viewed

	@@ -0,0 +1,184 @@

+🔹 Does the current PD model align with the latest IFRS9 regulatory requirements?
+➡️ Based on the provided original document and best practice evidence, there is insufficient data to conclusively determine whether the current PD model aligns with the latest IFRS 9 regulatory requirements. The original document lacks specific details about the current PD model's structure, validation processes, and alignment with IFRS 9 standards.
+However, the best practice evidence highlights several key areas that should be addressed for compliance, such as the need for robust validation frameworks, regular model validation, and the challenges posed by intense data requirements and ambiguous regulatory standards. It also emphasizes the importance of maintaining independence, completeness, adequacy, and soundness in validation processes.
+The gap lies in the absence of detailed information in the original document regarding how the current PD model addresses these best practice elements and challenges. Without this information, it is not possible to assess whether the model follows best practices or identify specific areas of non-compliance. To bridge this gap, a comprehensive review of the current PD model's validation framework, data usage, and alignment with IFRS 9 requirements is necessary.
+🔹 Are the data sources used for PD calculation comprehensive and up-to-date?
+➡️ To perform a GAP analysis on the query regarding the comprehensiveness and currency of data sources used for Probability of Default (PD) calculation, we need to compare the original document against the best practice evidence provided.
+**Best Practice Evidence Summary:**
+1. Historical data for PD calculation should cover 5-10 years.
+2. External data can be used if historical data is insufficient.
+3. PDs should incorporate cyclical and non-cyclical, systematic, and obligor-specific information.
+4. Industry-specific factors and macroeconomic indicators should be used to enhance predictive power.
+5. Frequent re-rating of obligors is necessary to capture changes in PDs.
+6. Data used in pooled models must meet specific data requirements or be adjusted accordingly.
+**Original Document Analysis:**
+- The original document must be assessed to determine if it includes:
+  - Historical data spanning 5-10 years or the use of external data if such historical data is unavailable.
+  - Consideration of cyclical and non-cyclical, systematic, and obligor-specific information.
+  - Utilization of industry-specific factors and macroeconomic indicators.
+  - Procedures for frequent re-rating of obligors.
+  - Verification that pooled data meets the necessary data requirements or is adjusted.
+**GAP Analysis:**
+- **Historical Data:** If the original document does not specify the use of 5-10 years of historical data or the use of external data when historical data is lacking, there is a gap.
+- **Incorporation of Information:** If the document does not mention the inclusion of cyclical, non-cyclical, systematic, and obligor-specific information, there is a gap.
+- **Industry and Macroeconomic Factors:** A gap exists if the document does not address the use of industry-specific factors and macroeconomic indicators.
+- **Frequent Re-rating:** If the document lacks procedures for frequent re-rating of obligors, this is a gap.
+- **Pooled Data Verification:** If the document does not ensure that pooled data meets data requirements or is adjusted, there is a gap.
+**Conclusion:**
+Without the specific content of the original document, it is not possible to definitively identify gaps. If the document does not address the points outlined in the best practice evidence, it does not follow best practices, and the gaps should be addressed accordingly. If the document does cover these aspects, it aligns with best practices. If the original document is not provided, there is insufficient data to perform a complete GAP analysis.
+🔹 Is there a robust process in place for the regular back-testing of PD models?
+➡️ Based on the original validation document and the best practice evidence provided, there appears to be a gap in the process for the regular back-testing of PD models. The best practice evidence emphasizes the importance of regular back-testing activities to assess the accuracy of the term structure of PD models. It suggests that validation should focus on minimizing predictive error using historical and economic forecast data, and should include measures of forecasting error and assessments of the discriminatory and predictive power of PDs. Additionally, it highlights the need for adequate governance, including policies and procedures, to ensure model accuracy and consistency.
+The original document, however, does not provide sufficient detail on whether these best practices are being followed. There is no mention of regular back-testing activities, measures of forecasting error, or assessments of the discriminatory and predictive power of PDs. Furthermore, the document lacks information on the governance framework in place to ensure the accuracy and consistency of the models.
+Therefore, the gap identified is the lack of evidence in the original document regarding the implementation of a robust process for regular back-testing of PD models, as well as the absence of a detailed governance framework to support this process. To align with best practices, the document should include specific details on the back-testing activities conducted, the measures used to assess model performance, and the governance policies and procedures established to ensure model accuracy and consistency.
+🔹 How is the segmentation of portfolios handled in the PD model, and is it appropriate?
+➡️ Based on the original document and best practice evidence provided, the GAP analysis for the segmentation of portfolios in the PD model is as follows:
+1. **Segmentation Definition and Objectives**: The original document provides a clear definition of SME portfolio segmentation and outlines its objectives, which align with best practices. The focus on optimizing credit portfolio management and considering credit risk is appropriate.
+2. **Segmentation Characteristics**: The document lists meaningful dimensions for segmentation, such as market segment, origination channel, geographic location, cohort, and underwriting criteria. These dimensions are consistent with best practices, as they allow for a comprehensive understanding of different portfolio segments.
+3. **Issues and Challenges**: The document acknowledges challenges such as the lack of an intrinsic size metric for SMEs and the need for objective and reproducible segmentation criteria. It also emphasizes the importance of having a statistically meaningful number of segments, which aligns with best practices.
+4. **Consideration of Behavioural Lives and Historical Information**: The best practice evidence suggests that segmentation should reflect different behavioural lives for different portfolio segments and consider whether historical behavioural information captures current conditions and forward-looking information. The original document does not explicitly address these aspects, indicating a gap in ensuring that segmentation reflects behavioural differences and incorporates relevant historical and forward-looking data.
+**GAP**: The original document lacks explicit guidance on ensuring that segmentation reflects different behavioural lives of portfolio segments and the integration of historical and forward-looking information. This gap suggests that the segmentation approach may not fully capture the dynamic nature of credit risk and behavioural changes over time, which is crucial for accurate PD modeling.
+**Recommendation**: To align with best practices, the document should include strategies for incorporating behavioural differences and adjusting historical data to reflect current and future conditions in the segmentation process. This would enhance the accuracy and relevance of the PD model.
+🔹 Are the assumptions used in the PD model clearly documented and justified?
+➡️ Based on the original validation document provided, it is unclear whether the assumptions used in the Probability of Default (PD) model are clearly documented and justified. The best practice evidence suggests that a well-documented model facilitates validation and periodic review. However, the original document does not provide sufficient information to determine if the assumptions are explicitly stated and supported with appropriate justification. This represents a gap in documentation, as best practices require clear articulation and rationale for all assumptions to ensure transparency and ease of validation. To address this gap, the original document should be revised to include a comprehensive section detailing each assumption, along with the reasoning and evidence supporting them.
+🔹 What is the process for incorporating forward-looking information into the PD model?
+➡️ To perform a GAP analysis on the process for incorporating forward-looking information into the Probability of Default (PD) model, we need to compare the original document's approach with the best practice evidence provided.
+**Original Document:**
+The original document mentions that the Group formulates a 'base case' view of future economic variables and a range of other possible forecast scenarios. These forecasts are then used to adjust estimates of PDs. However, the document does not provide detailed information on how forward-looking information is integrated into the PD model, nor does it specify the methodologies or considerations involved in this process.
+**Best Practice Evidence:**
+The best practice evidence outlines several key considerations for incorporating forward-looking information into Expected Credit Loss (ECL) models under IFRS 9:
+1. Use of sound judgment and generally accepted methods for economic analysis and forecasting.
+2. Demonstration of how relevant, reasonable, and supportable information is considered in the ECL assessment.
+3. Application of experienced credit judgment in considering future scenarios and their potential impacts.
+4. Inclusion of information even if the likelihood of an event is low or its impact is uncertain, unless it is not reasonable and supportable.
+5. Documentation and justification for excluding information in exceptional circumstances.
+6. Unbiased consideration of relevant factors affecting creditworthiness and cash shortfalls.
+**GAP Analysis:**
+The original document lacks specific details on several best practice elements:
+- It does not describe the methodologies or frameworks used for economic analysis and forecasting.
+- There is no mention of how the Group ensures the information is reasonable and supportable.
+- The document does not discuss the application of credit judgment in scenario analysis or the consideration of low-likelihood events.
+- There is no information on the documentation and justification process for excluding certain information.
+**Conclusion:**
+The original document does not fully align with best practices for incorporating forward-looking information into the PD model. It lacks detailed descriptions of the methodologies, judgment applications, and documentation processes that are considered best practice under IFRS 9. To close this gap, the document should include comprehensive details on these aspects to ensure a robust and transparent approach to integrating forward-looking information into the PD model.
+🔹 Is there a mechanism for ongoing monitoring and validation of the PD model's performance?
+➡️ Based on the original validation document provided, there is insufficient data to determine whether there is a mechanism for ongoing monitoring and validation of the PD model's performance. The document does not explicitly mention any established governance, policies, or procedures for ensuring the accuracy and consistency of the model, nor does it detail any periodic validation or monitoring processes.
+Best practice evidence suggests that there should be a structured approach to model validation, including periodic validations and ongoing monitoring to assess the model's performance and identify any potential deterioration. This includes comparing the model's performance at development with its performance over subsequent periods and using monitoring as an early warning indicator.
+The gap identified is the lack of explicit mention of these best practices in the original document. To align with best practices, the document should include detailed procedures for periodic validation and ongoing monitoring, as well as governance structures to ensure the model's continued accuracy and relevance.
+🔹 How are overrides and expert judgments applied in the PD model, and are they documented?
+➡️ To perform a GAP analysis on the application of overrides and expert judgments in the PD model, we need to compare the original validation document against the best practice evidence provided. However, the original document is not provided in your query, which makes it impossible to directly assess whether it follows best practices. Therefore, I will outline the best practices based on the evidence provided and identify potential gaps that could exist if these practices are not followed.
+### Best Practices for Overrides and Expert Judgments:
+1. **Framework and Guidelines**: There should be a clear framework with detailed guidelines and procedures for applying human judgment, including the use of pre-defined questionnaires.
+2. **Documentation**: The use of human judgment must be documented to ensure that the rating assignment can be understood and replicated by a third party.
+3. **Criteria for Qualitative Inputs**: Institutions should have clear criteria for using qualitative model inputs and ensure consistent application across personnel.
+4. **Policies for Overrides**: There should be specific policies for using overrides in the rating assignment process, covering both inputs and outputs. These policies should be conservative, limiting potential decreases in estimates.
+5. **Documentation of Overrides**: Each override should be documented with its scale and rationale, including a predefined list of justifications, the date, and the person responsible.
+6. **Monitoring and Analysis**: Institutions should regularly monitor override levels and justifications, specifying maximum acceptable rates and taking measures if these are breached. They should also analyze the performance of exposures with overrides.
+### Potential Gaps:
+- **Lack of Framework**: If the original document does not establish a clear framework for applying human judgment, this would be a significant gap.
+- **Inadequate Documentation**: Failure to document the use of human judgment or overrides comprehensively would not align with best practices.
+- **Unclear Criteria**: If there are no clear criteria for qualitative inputs or inconsistent application, this would be a gap.
+- **Absence of Override Policies**: Not having specific, conservative policies for overrides would be a deviation from best practices.
+- **Insufficient Monitoring**: If the document does not outline regular monitoring and analysis of overrides, this would be a gap.
+Without the original validation document, it is not possible to definitively identify specific gaps. If you can provide the original document or more details, a more precise GAP analysis can be conducted.
+🔹 Are there clear governance and oversight structures for PD model development and validation?
+➡️ Based on the original validation document and the best practice evidence provided, there appears to be a gap in the governance and oversight structures for PD model development and validation. The best practice evidence outlines several key components that should be present in an adequate governance framework, including the establishment of policies and procedures to ensure model accuracy, consistency, and risk component estimation. It also emphasizes the need for model validation at both the development stage and through periodic validation and monitoring thereafter. Additionally, the best practices highlight the importance of having a governance framework that includes transparency between accounting and risk interpretations, a forum for challenging and approving impairments, and a strong focus on data integrity and model validation.
+The original document, however, does not provide sufficient information to confirm the presence of these elements. There is no mention of specific policies or procedures in place to ensure model accuracy and consistency, nor is there evidence of a structured process for periodic validation and monitoring. Furthermore, the document lacks details on the governance framework over model design, development, and maintenance, as well as the existence of forums for review and approval of impairments. The absence of these critical components suggests a gap between the current practices and the best practice standards. Therefore, it is recommended that the organization establish a comprehensive governance framework that includes clear policies, procedures, and oversight mechanisms to align with best practices.
+🔹 What are the key risk factors considered in the PD model, and how are they quantified?
+➡️ To perform a GAP analysis, we need to compare the original document's content regarding the key risk factors considered in the Probability of Default (PD) model and their quantification against the best practice evidence provided.
+**Best Practice Evidence Summary:**
+1. PD models should incorporate both quantitative and qualitative factors indicative of credit risk.
+2. If qualitative factors cannot be included in the quantitative assessment, they should be considered separately.
+3. PD models can be developed based on historical default data and should reflect current conditions (PIT PD) or average conditions over an economic cycle (TTC PD).
+4. PD estimates should be based on statistical models using both internal data and, where available, market data.
+**Original Document Analysis:**
+The original document is not provided in the query, so we cannot directly assess its content. However, the best practice evidence outlines clear expectations for PD models, including the integration of qualitative factors, the use of historical data, and the application of statistical models.
+**GAP Analysis:**
+Without the original document, we cannot definitively determine if there is a gap. However, the analysis should focus on whether the original document:
+- Includes both quantitative and qualitative factors in the PD model.
+- Addresses the recalibration or adjustment of PDs when qualitative factors cannot be quantitatively assessed.
+- Describes the use of historical default data and the types of PD models (PIT PD, TTC PD).
+- Utilizes statistical models and considers market data for large corporate counterparties.
+If the original document lacks any of these elements, it would not fully align with best practices, indicating a gap. If the document does not provide sufficient detail on these aspects, it would also suggest a gap in meeting best practice standards.
+🔹 Is there a process for recalibrating the PD model in response to changing economic conditions?
+➡️ The original document does not provide sufficient information to determine whether there is a process for recalibrating the PD model in response to changing economic conditions. The best practice evidence suggests that under a Point-in-Time (PIT) Probability of Default (PD) approach, PDs should be frequently updated to reflect changes in both cyclical and non-cyclical factors, including industry-specific and macroeconomic indicators. This requires a process for frequent re-rating of obligors to capture changes in their PDs.
+The original document mentions the use of forecasts to adjust PD estimates based on advice from the Group Market Risk Committee and economic experts, but it does not explicitly describe a process for recalibrating the PD model in response to changing economic conditions. Specifically, it lacks details on how often recalibration occurs, what specific factors are considered, and how the recalibration process is implemented.
+Therefore, the gap lies in the absence of a detailed and explicit process for recalibrating the PD model in response to changing economic conditions, as recommended by best practice evidence. To align with best practices, the document should include a clear description of the recalibration process, including the frequency of updates, the factors considered, and the methodology used to adjust PD estimates.
+🔹 How is model risk managed and mitigated in the context of PD estimation under IFRS9?
+➡️ Based on the original validation document provided, there are several gaps in the management and mitigation of model risk in the context of PD estimation under IFRS 9 when compared to best practice evidence.
+1. **Lack of Comprehensive Model Validation Framework**: The document mentions the need for validation and monitoring tests to assess credit scoring models' quality but does not provide a detailed framework or methodology for how this validation should be conducted. Best practices would include a comprehensive validation framework that outlines specific tests, frequency of validation, and criteria for model performance assessment.
+2. **Insufficient Detail on Model Risk Management**: While the document acknowledges increased model risk over long time horizons, it does not elaborate on specific strategies or controls to manage and mitigate this risk. Best practices would involve detailed risk management strategies, including stress testing, back-testing, and scenario analysis to ensure model robustness.
+3. **Data Quality and Governance**: The document highlights the importance of underlying data quality but lacks a discussion on data governance practices. Best practices would include a robust data governance framework to ensure data accuracy, completeness, and timeliness, which are critical for reliable PD estimation.
+4. **Ongoing Monitoring and Reporting**: There is no mention of ongoing monitoring and reporting mechanisms to track model performance over time. Best practices would include regular monitoring and reporting processes to identify model drift and recalibrate models as necessary.
+5. **Documentation and Transparency**: The document does not provide sufficient detail on the documentation and transparency of the model development and validation process. Best practices would require thorough documentation of model assumptions, limitations, and validation results to ensure transparency and facilitate regulatory compliance.
+6. **Integration with Risk Management Framework**: The document does not discuss how PD models are integrated into the broader risk management framework of the institution. Best practices would involve aligning PD models with the institution's overall risk appetite and risk management strategies.
+In summary, the original document lacks detailed guidance on several key aspects of model risk management and mitigation, which are essential for aligning with best practices in the context of PD estimation under IFRS 9.

attached_assets/Pasted-Document-Preview-File-report-test-txt-Size-5-07-KB-File-Type-TXT-Preview-document-content--1745328656867.txt ADDED Viewed

	@@ -0,0 +1,88 @@

+Document Preview
+File: report_test.txt
+Size: 5.07 KB
+File Type: TXT
+Preview document content
+❌ Error during GAP analysis: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-rU153***********************************************************************************A1cA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
+AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-rU153***********************************************************************************A1cA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
+Traceback:
+File "/home/runner/workspace/app.py", line 162, in main
+    results = run_gap_analysis_pipeline(file_contents)
+              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/attached_assets/ifrs9_analysis.py", line 141, in run_gap_analysis_pipeline
+    final_state = graph.invoke(initial_state)
+                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/__init__.py", line 2683, in invoke
+    for chunk in self.stream(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/__init__.py", line 2331, in stream
+    for _ in runner.tick(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 146, in tick
+    run_with_retry(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/retry.py", line 40, in run_with_retry
+    return task.proc.invoke(task.input, config)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 606, in invoke
+    input = step.invoke(input, config, **kwargs)
+            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 371, in invoke
+    ret = context.run(self.func, *args, **kwargs)
+          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/attached_assets/ifrs9_analysis.py", line 83, in generate_queries
+    response = query_chain.run(doc=state.original_doc, query_count=query_count)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/_api/deprecation.py", line 181, in warning_emitting_wrapper
+    return wrapped(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 611, in run
+    return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/_api/deprecation.py", line 181, in warning_emitting_wrapper
+    return wrapped(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 389, in __call__
+    return self.invoke(
+           ^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 170, in invoke
+    raise e
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 160, in invoke
+    self._call(inputs, run_manager=run_manager)
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/llm.py", line 126, in _call
+    response = self.generate([inputs], run_manager=run_manager)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/llm.py", line 138, in generate
+    return self.llm.generate_prompt(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 843, in generate_prompt
+    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 683, in generate
+    self._generate_with_cache(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 908, in _generate_with_cache
+    result = self._generate(
+             ^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 476, in _generate
+    response = self.completion_with_retry(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 387, in completion_with_retry
+    return self.client.create(**kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_utils/_utils.py", line 279, in wrapper
+    return func(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 914, in create
+    return self._post(
+           ^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 1242, in post
+    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
+                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 919, in request
+    return self._request(
+           ^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 1023, in _request
+    raise self._make_status_error_from_response(err.response) from None

attached_assets/Pasted-Document-Preview-File-report-test-txt-Size-5-07-KB-File-Type-TXT-Preview-document-content--1745328760272.txt ADDED Viewed

	@@ -0,0 +1,88 @@

+Document Preview
+File: report_test.txt
+Size: 5.07 KB
+File Type: TXT
+Preview document content
+❌ Error during GAP analysis: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-rU153***********************************************************************************A1cA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
+AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-rU153***********************************************************************************A1cA. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}
+Traceback:
+File "/home/runner/workspace/app.py", line 162, in main
+    results = run_gap_analysis_pipeline(file_contents)
+              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/attached_assets/ifrs9_analysis.py", line 141, in run_gap_analysis_pipeline
+    final_state = graph.invoke(initial_state)
+                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/__init__.py", line 2683, in invoke
+    for chunk in self.stream(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/__init__.py", line 2331, in stream
+    for _ in runner.tick(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/runner.py", line 146, in tick
+    run_with_retry(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/pregel/retry.py", line 40, in run_with_retry
+    return task.proc.invoke(task.input, config)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 606, in invoke
+    input = step.invoke(input, config, **kwargs)
+            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langgraph/utils/runnable.py", line 371, in invoke
+    ret = context.run(self.func, *args, **kwargs)
+          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/attached_assets/ifrs9_analysis.py", line 83, in generate_queries
+    response = query_chain.run(doc=state.original_doc, query_count=query_count)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/_api/deprecation.py", line 181, in warning_emitting_wrapper
+    return wrapped(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 611, in run
+    return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/_api/deprecation.py", line 181, in warning_emitting_wrapper
+    return wrapped(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 389, in __call__
+    return self.invoke(
+           ^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 170, in invoke
+    raise e
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/base.py", line 160, in invoke
+    self._call(inputs, run_manager=run_manager)
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/llm.py", line 126, in _call
+    response = self.generate([inputs], run_manager=run_manager)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain/chains/llm.py", line 138, in generate
+    return self.llm.generate_prompt(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 843, in generate_prompt
+    return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 683, in generate
+    self._generate_with_cache(
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py", line 908, in _generate_with_cache
+    result = self._generate(
+             ^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 476, in _generate
+    response = self.completion_with_retry(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/langchain_community/chat_models/openai.py", line 387, in completion_with_retry
+    return self.client.create(**kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_utils/_utils.py", line 279, in wrapper
+    return func(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 914, in create
+    return self._post(
+           ^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 1242, in post
+    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
+                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 919, in request
+    return self._request(
+           ^^^^^^^^^^^^^^
+File "/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/openai/_base_client.py", line 1023, in _request
+    raise self._make_status_error_from_response(err.response) from None

attached_assets/Pasted-import-os-import-uuid-import-logging-from-pathlib-import-Path-from-typing-import-List-Dict-from-pyd-1743447350035.txt ADDED Viewed

	@@ -0,0 +1,227 @@

+import os
+import uuid
+import logging
+from pathlib import Path
+from typing import List, Dict
+from pydantic import BaseModel, Field
+from langchain.chat_models import ChatOpenAI
+from langchain.prompts import PromptTemplate
+from langchain.chains import LLMChain
+from langchain.output_parsers import PydanticOutputParser
+from langchain.vectorstores import FAISS
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_core.documents import Document
+from langgraph.graph import StateGraph, END
+# ========================
+# Logging Setup
+# ========================
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger(__name__)
+LLM_LOG_DIR = Path("logs/llm_logs")
+LLM_LOG_DIR.mkdir(parents=True, exist_ok=True)
+def log_llm_io(step: str, input_text: str, output_text: str):
+    filename = LLM_LOG_DIR / f"{step}_{uuid.uuid4().hex[:8]}.txt"
+    with open(filename, "w", encoding="utf-8") as f:
+        f.write("=== INPUT ===\n")
+        f.write(input_text.strip() + "\n\n")
+        f.write("=== OUTPUT ===\n")
+        f.write(output_text.strip())
+    logger.debug(f"📝 Logged LLM I/O to: {filename}")
+# ========================
+# LLM Setup (GPT-4o)
+# ========================
+llm = ChatOpenAI(
+    model="gpt-4o",
+    temperature=0,
+    api_key=os.environ["OPENAI_API_KEY"]
+)
+# ========================
+# Embeddings & Vector Store
+# ========================
+embedding_model = HuggingFaceEmbeddings(
+    model_name="intfloat/e5-large-v2",
+    model_kwargs={"device": "cpu"}
+)
+vector_store = FAISS.load_local(
+    "gpu_vectorstore",
+    embeddings=embedding_model,
+    allow_dangerous_deserialization=True
+)
+retriever_catA = vector_store.as_retriever(
+    search_type="similarity",
+    search_kwargs={"k": 10, "filter": {"category": "catA"}}
+)
+retriever_catB = vector_store.as_retriever(
+    search_type="similarity",
+    search_kwargs={"k": 10, "filter": {"category": "catB"}}
+)
+# ========================
+# State Definition
+# ========================
+class GapAnalysisState(BaseModel):
+    original_doc: str
+    queries: List[str] = []
+    raw_results: Dict[str, Dict[str, List[Document]]] = {}
+    filtered_results: Dict[str, List[Document]] = {}
+    gap_analyses: Dict[str, str] = {}
+# ========================
+# Step 1: Query Generation
+# ========================
+class QueryListOutput(BaseModel):
+    queries: List[str] = Field(..., description="List of GAP analysis questions")
+query_parser = PydanticOutputParser(pydantic_object=QueryListOutput)
+query_prompt = PromptTemplate.from_template(
+    "You are an expert in IFRS9 PD validation. Based on the following document, extract 12 critical questions to perform a GAP analysis.\n\n{doc}\n\n{format_instructions}"
+).partial(format_instructions=query_parser.get_format_instructions())
+query_chain = LLMChain(llm=llm, prompt=query_prompt)
+def generate_queries(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Generating queries from original document...")
+    llm_input = query_prompt.format(doc=state.original_doc)
+    response = query_chain.run(doc=state.original_doc)
+    log_llm_io("01_generate_queries", llm_input, response)
+    state.queries = query_parser.parse(response).queries
+    logger.info(f"✅ Generated {len(state.queries)} queries.")
+    return state
+# ========================
+# Step 2: Vector Retrieval
+# ========================
+def run_retrieval(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Running vector store retrieval for all queries...")
+    state.raw_results = {}
+    for query in state.queries:
+        docsA = retriever_catA.get_relevant_documents(query)[:10]
+        docsB = retriever_catB.get_relevant_documents(query)[:10]
+        state.raw_results[query] = {
+            "catA": docsA,
+            "catB": docsB
+        }
+        logger.debug(f"🔍 Retrieved {len(docsA)} catA and {len(docsB)} catB docs for query: '{query}'")
+    logger.info("✅ Retrieval complete.")
+    return state
+# ========================
+# Step 3: Relevance Filtering
+# ========================
+class RelevantDocSelector(BaseModel):
+    relevant_indices: List[int] = Field(..., description="Indices of relevant excerpts")
+filter_parser = PydanticOutputParser(pydantic_object=RelevantDocSelector)
+filter_prompt = PromptTemplate.from_template(
+    "You are reviewing documents for this GAP query:\n\nQuery: {query}\n\n"
+    "Original Validation Document (truncated):\n{original_doc}\n\n"
+    "Candidate Excerpts:\n{excerpts}\n\n"
+    "{format_instructions}"
+).partial(format_instructions=filter_parser.get_format_instructions())
+filter_chain = LLMChain(llm=llm, prompt=filter_prompt)
+def filter_docs(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Filtering relevant documents for each query...")
+    for query in state.queries:
+        docs = state.raw_results[query]["catA"] + state.raw_results[query]["catB"]
+        excerpt_texts = "\n".join([f"[{i}] {doc.page_content}" for i, doc in enumerate(docs)])
+        llm_input = filter_prompt.format(
+            query=query,
+            original_doc=state.original_doc[:3000],
+            excerpts=excerpt_texts
+        )
+        response = filter_chain.run(
+            query=query,
+            original_doc=state.original_doc[:3000],
+            excerpts=excerpt_texts
+        )
+        log_llm_io(f"02_filter_docs_{state.queries.index(query)+1}", llm_input, response)
+        try:
+            relevant_indices = filter_parser.parse(response).relevant_indices
+        except Exception as e:
+            logger.warning(f"⚠️ Failed to parse relevant indices for query '{query}': {e}")
+            relevant_indices = []
+        state.filtered_results[query] = [docs[i] for i in relevant_indices if i < len(docs)]
+    logger.info("✅ Document filtering complete.")
+    return state
+# ========================
+# Step 4: GAP Analysis
+# ========================
+gap_prompt = PromptTemplate.from_template(
+    "You are performing a GAP analysis for this query:\n\n{query}\n\n"
+    "Based on the original validation document and the best practice evidence, determine whether the document follows best practices. "
+    "If not, explain the gap. If insufficient data, say so.\n\n"
+    "Original Document:\n{original_doc}\n\n"
+    "Best Practice Evidence:\n{best_practices}\n\n"
+    "GAP Analysis Paragraph:"
+)
+gap_chain = LLMChain(llm=llm, prompt=gap_prompt)
+def generate_gap_analysis(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Generating GAP analysis for each query...")
+    for query in state.queries:
+        relevant_texts = "\n\n".join([doc.page_content for doc in state.filtered_results.get(query, [])])
+        llm_input = gap_prompt.format(
+            query=query,
+            original_doc=state.original_doc[:3000],
+            best_practices=relevant_texts[:3000]
+        )
+        response = gap_chain.run(
+            query=query,
+            original_doc=state.original_doc[:3000],
+            best_practices=relevant_texts[:3000]
+        )
+        log_llm_io(f"03_gap_analysis_{state.queries.index(query)+1}", llm_input, response)
+        state.gap_analyses[query] = response.strip()
+    logger.info("✅ GAP analysis generation complete.")
+    return state
+# ========================
+# LangGraph Pipeline
+# ========================
+builder = StateGraph(GapAnalysisState)
+builder.add_node("generate_queries", generate_queries)
+builder.add_node("retrieve_docs", run_retrieval)
+builder.add_node("filter_docs", filter_docs)
+builder.add_node("generate_gap", generate_gap_analysis)
+builder.set_entry_point("generate_queries")
+builder.add_edge("generate_queries", "retrieve_docs")
+builder.add_edge("retrieve_docs", "filter_docs")
+builder.add_edge("filter_docs", "generate_gap")
+builder.add_edge("generate_gap", END)
+graph = builder.compile()
+# ========================
+# Entrypoint
+# ========================
+def run_gap_analysis_pipeline(original_doc: str):
+    logger.info("🚀 Starting GAP analysis pipeline...")
+    initial_state = GapAnalysisState(original_doc=original_doc)
+    final_state = graph.invoke(initial_state)
+    logger.info("🎉 GAP analysis pipeline completed successfully.")
+    return final_state

attached_assets/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # This file makes the attached_assets directory a Python package

attached_assets/consolidated_analysis.py ADDED Viewed

	@@ -0,0 +1,205 @@

+import os
+import numpy as np
+import plotly.express as px
+import plotly.graph_objects as go
+import pandas as pd
+import streamlit as st
+from typing import Dict, List, Tuple
+from langchain_community.chat_models import ChatOpenAI
+from langchain.prompts import PromptTemplate
+from langchain.chains import LLMChain
+from pydantic import BaseModel, Field
+import logging
+# Set up logging
+logger = logging.getLogger(__name__)
+class ConsolidatedAnalysisOutput(BaseModel):
+    """Output schema for the consolidated analysis."""
+    compliance_percentage: float = Field(..., description="Overall compliance percentage with IFRS9 standards (0-100)")
+    executive_summary: str = Field(..., description="Executive summary of the key findings and main areas for improvement")
+    key_areas: List[Dict] = Field(..., description="List of key areas with their compliance scores and descriptions")
+    detailed_analysis: str = Field(..., description="Detailed consolidated gap analysis")
+def generate_consolidated_analysis(queries: List[str], gap_analyses: Dict[str, str]) -> ConsolidatedAnalysisOutput:
+    """Generate a consolidated analysis from individual gap analyses."""
+    logger.info("Generating consolidated analysis...")
+    # Initialize OpenAI model
+    llm = ChatOpenAI(
+        temperature=0,
+        model="gpt-4o-mini",
+    )
+    # Create a combined input from all queries and analyses
+    combined_input = "\n\n".join([f"Question: {q}\nAnalysis: {gap_analyses.get(q, '')}" for q in queries])
+    # Create prompt for consolidated analysis
+    consolidation_prompt = PromptTemplate.from_template(
+        "You are an expert IFRS9 consultant tasked with creating a consolidated gap analysis report.\n\n"
+        "Below are individual gap analyses for different aspects of an IFRS9 PD model implementation:\n\n"
+        "{combined_analyses}\n\n"
+        "Based on these individual analyses, create a consolidated report with the following components:\n"
+        "1. An overall compliance percentage (0-100%) that represents how well the current implementation adheres to IFRS9 standards\n"
+        "2. An executive summary highlighting the 3-5 most critical findings and areas for improvement\n"
+        "3. A list of key areas with individual compliance scores and brief descriptions\n"
+        "4. A detailed consolidated analysis that brings together all findings into a cohesive narrative\n\n"
+        "Format your response exactly as follows:\n"
+        "COMPLIANCE_PERCENTAGE: [overall percentage as a number between 0-100]\n\n"
+        "EXECUTIVE_SUMMARY:\n[Your executive summary text here]\n\n"
+        "KEY_AREAS:\n[area1_name]|[score1]|[area1_description]\n[area2_name]|[score2]|[area2_description]\n[etc.]\n\n"
+        "DETAILED_ANALYSIS:\n[Your detailed consolidated analysis here]"
+    )
+    # Create chain for consolidated analysis
+    consolidation_chain = LLMChain(llm=llm, prompt=consolidation_prompt)
+    # Run the chain
+    response = consolidation_chain.run(combined_analyses=combined_input)
+    # Parse the response
+    try:
+        sections = {}
+        current_section = None
+        section_content = []
+        for line in response.split("\n"):
+            if line.startswith("COMPLIANCE_PERCENTAGE:"):
+                sections["compliance_percentage"] = float(line.split(":", 1)[1].strip())
+            elif line.startswith("EXECUTIVE_SUMMARY:"):
+                current_section = "executive_summary"
+                section_content = []
+            elif line.startswith("KEY_AREAS:"):
+                if current_section:
+                    sections[current_section] = "\n".join(section_content).strip()
+                current_section = "key_areas"
+                section_content = []
+            elif line.startswith("DETAILED_ANALYSIS:"):
+                if current_section:
+                    sections[current_section] = "\n".join(section_content).strip()
+                current_section = "detailed_analysis"
+                section_content = []
+            else:
+                section_content.append(line)
+        if current_section:
+            sections[current_section] = "\n".join(section_content).strip()
+        # Process key areas
+        key_areas = []
+        for line in sections.get("key_areas", "").split("\n"):
+            if "|" in line:
+                parts = line.split("|")
+                if len(parts) >= 3:
+                    key_areas.append({
+                        "name": parts[0].strip(),
+                        "score": float(parts[1].strip()),
+                        "description": parts[2].strip()
+                    })
+        return ConsolidatedAnalysisOutput(
+            compliance_percentage=sections.get("compliance_percentage", 0.0),
+            executive_summary=sections.get("executive_summary", ""),
+            key_areas=key_areas,
+            detailed_analysis=sections.get("detailed_analysis", "")
+        )
+    except Exception as e:
+        logger.error(f"Error parsing consolidated analysis: {str(e)}")
+        raise
+def create_compliance_gauge(compliance_percentage: float):
+    """Create a gauge chart for overall compliance percentage."""
+    fig = go.Figure(go.Indicator(
+        mode="gauge+number",
+        value=compliance_percentage,
+        domain={'x': [0, 1], 'y': [0, 1]},
+        title={'text': "IFRS9 Compliance Score", 'font': {'size': 24}},
+        gauge={
+            'axis': {'range': [0, 100], 'tickwidth': 1, 'tickcolor': "darkblue"},
+            'bar': {'color': "darkblue"},
+            'bgcolor': "white",
+            'borderwidth': 2,
+            'bordercolor': "gray",
+            'steps': [
+                {'range': [0, 50], 'color': 'red'},
+                {'range': [50, 75], 'color': 'orange'},
+                {'range': [75, 90], 'color': 'yellow'},
+                {'range': [90, 100], 'color': 'green'}
+            ],
+        }
+    ))
+    fig.update_layout(
+        height=300,
+        margin=dict(l=20, r=20, t=50, b=20),
+        paper_bgcolor="white",
+        font={'color': "darkblue", 'family': "Arial"}
+    )
+    return fig
+def create_compliance_areas_chart(key_areas):
+    """Create a horizontal bar chart for key area compliance scores."""
+    # Extract data from key areas
+    area_names = [area["name"] for area in key_areas]
+    scores = [area["score"] for area in key_areas]
+    # Generate color scale based on scores
+    colors = ['red' if score < 50 else 'orange' if score < 75 else 'yellow' if score < 90 else 'green'
+              for score in scores]
+    fig = go.Figure(go.Bar(
+        x=scores,
+        y=area_names,
+        orientation='h',
+        marker_color=colors,
+        text=scores,
+        textposition='auto',
+    ))
+    fig.update_layout(
+        title="Compliance by Key Areas",
+        xaxis_title="Compliance Score (%)",
+        yaxis_title="Key Areas",
+        height=400,
+        margin=dict(l=20, r=20, t=50, b=20),
+        paper_bgcolor="white",
+        plot_bgcolor="white",
+        font={'color': "darkblue", 'family': "Arial"},
+        xaxis=dict(range=[0, 100])
+    )
+    return fig
+def display_consolidated_analysis(consolidated_result):
+    """Display the consolidated analysis in streamlit."""
+    # Overall compliance score gauge
+    st.header("📊 IFRS9 Compliance Analysis")
+    col1, col2 = st.columns([1, 2])
+    with col1:
+        gauge_chart = create_compliance_gauge(consolidated_result.compliance_percentage)
+        st.plotly_chart(gauge_chart, use_container_width=True)
+    with col2:
+        st.subheader("📝 Executive Summary")
+        st.write(consolidated_result.executive_summary)
+    # Key areas visualization
+    st.subheader("🔑 Key Areas for Improvement")
+    areas_chart = create_compliance_areas_chart(consolidated_result.key_areas)
+    st.plotly_chart(areas_chart, use_container_width=True)
+    # Display key areas details
+    cols = st.columns(len(consolidated_result.key_areas) if len(consolidated_result.key_areas) <= 3 else 3)
+    for i, area in enumerate(consolidated_result.key_areas):
+        with cols[i % 3]:
+            color = "red" if area["score"] < 50 else "orange" if area["score"] < 75 else "yellow" if area["score"] < 90 else "green"
+            st.markdown(f"<h4 style='color:{color};'>{area['name']} ({area['score']}%)</h4>", unsafe_allow_html=True)
+            st.write(area["description"])
+    # Detailed analysis
+    with st.expander("📋 Detailed Analysis", expanded=False):
+        st.write(consolidated_result.detailed_analysis)

attached_assets/file_handler.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import io
+import docx
+from PyPDF2 import PdfReader
+def read_text_file(file_content):
+    """Read content from a text file."""
+    return file_content.decode("utf-8")
+def read_pdf_file(file_content):
+    """Extract text from a PDF file."""
+    pdf_reader = PdfReader(io.BytesIO(file_content))
+    text = ""
+    for page in pdf_reader.pages:
+        text += page.extract_text() + "\n"
+    return text
+def read_docx_file(file_content):
+    """Extract text from a Word document."""
+    doc = docx.Document(io.BytesIO(file_content))
+    full_text = []
+    for para in doc.paragraphs:
+        full_text.append(para.text)
+    return '\n'.join(full_text)
+def extract_text_from_file(uploaded_file):
+    """Extract text from the uploaded file based on its type."""
+    file_extension = uploaded_file.name.split('.')[-1].lower()
+    file_content = uploaded_file.getvalue()
+    if file_extension == 'txt':
+        return read_text_file(file_content)
+    elif file_extension == 'pdf':
+        return read_pdf_file(file_content)
+    elif file_extension in ['docx', 'doc']:
+        return read_docx_file(file_content)
+    else:
+        raise ValueError(f"Unsupported file type: {file_extension}")

attached_assets/gap_analysis.py ADDED Viewed

	@@ -0,0 +1,77 @@

+import os
+import logging
+import re
+from pathlib import Path
+from typing import List, Dict, Optional
+# Simple data class for the state
+class GapAnalysisState:
+    def __init__(self, original_doc: str):
+        self.original_doc = original_doc
+        self.queries: List[str] = []
+        self.gap_analyses: Dict[str, str] = {}
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger(__name__)
+def run_gap_analysis_pipeline(original_doc: str) -> GapAnalysisState:
+    """Simulated gap analysis pipeline that returns example results from the provided file."""
+    logger.info("🚀 Starting GAP analysis pipeline...")
+    # Initialize state
+    state = GapAnalysisState(original_doc=original_doc)
+    # Define the standard queries used in the GAP analysis
+    state.queries = [
+        "Does the current PD model align with the latest IFRS9 regulatory requirements?",
+        "Are the data sources used for PD calculation comprehensive and up-to-date?",
+        "Is there a robust process in place for the regular back-testing of PD models?",
+        "How is the segmentation of portfolios handled in the PD model, and is it appropriate?",
+        "Are the assumptions used in the PD model clearly documented and justified?",
+        "What is the process for incorporating forward-looking information into the PD model?",
+        "Is there a mechanism for ongoing monitoring and validation of the PD model's performance?",
+        "How are overrides and expert judgments applied in the PD model, and are they documented?",
+        "What methods are used to ensure appropriate granularity in the PD model?",
+        "How are model uncertainties and limitations addressed in the PD estimation?",
+        "Is the PD model integration with other IFRS9 components (LGD, EAD) well-designed?",
+        "How is the PD model governance structured, and does it ensure proper oversight?"
+    ]
+    # Load example gap analysis results from the provided file
+    try:
+        logger.info("📄 Loading example GAP analysis results...")
+        example_file = Path('attached_assets/Pasted--Does-the-current-PD-model-align-with-the-latest-IFRS9-regulatory-requirements-Based-on-the-pr-1743447418785.txt')
+        if example_file.exists():
+            with open(example_file, 'r') as f:
+                example_text = f.read()
+            # Parse the example text into our structure
+            # Pattern matches each query-response pair in the example file
+            pattern = r"🔹 (.+?)\n➡️ (.+?)(?=\n\n🔹|\Z)"
+            matches = re.findall(pattern, example_text, re.DOTALL)
+            for query, analysis in matches:
+                query = query.strip()
+                analysis = analysis.strip()
+                state.gap_analyses[query] = analysis
+            logger.info(f"✅ Successfully parsed {len(state.gap_analyses)} GAP analysis results")
+        else:
+            logger.warning("⚠️ Example file not found, using placeholder data")
+            # If the file is not found, provide a placeholder message
+            for query in state.queries:
+                state.gap_analyses[query] = "Analysis would be performed on the uploaded document in a real implementation."
+    except Exception as e:
+        logger.error(f"❌ Error parsing example GAP analysis: {str(e)}")
+        # In case of any error, provide a generic message
+        for query in state.queries:
+            state.gap_analyses[query] = f"Error processing analysis for this query: {str(e)}"
+    logger.info("🎉 GAP analysis pipeline completed")
+    return state

attached_assets/ifrs9_analysis.py ADDED Viewed

	@@ -0,0 +1,146 @@

+import os
+from uuid6 import uuid7
+import logging
+from pathlib import Path
+from typing import List, Dict
+from pydantic import BaseModel, Field
+from langchain.chat_models import ChatOpenAI
+from langchain.prompts import PromptTemplate
+from langchain.chains import LLMChain
+from langchain.output_parsers import PydanticOutputParser
+from langgraph.graph import StateGraph, END
+# ========================
+# Logging Setup
+# ========================
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(levelname)s - %(message)s"
+)
+logger = logging.getLogger(__name__)
+LLM_LOG_DIR = Path("logs/llm_logs")
+LLM_LOG_DIR.mkdir(parents=True, exist_ok=True)
+#We use uuid7 to generate a unique identifier for each log file.
+#They can be sorted by time and are "effectively" (not literally).
+def log_llm_io(step: str, input_text: str, output_text: str):
+    filename = LLM_LOG_DIR / f"{step}_{uuid7().hex}.txt"
+    with open(filename, "w", encoding="utf-8") as f:
+        f.write("=== INPUT ===\n")
+        f.write(input_text.strip() + "\n\n")
+        f.write("=== OUTPUT ===\n")
+        f.write(output_text.strip())
+    logger.debug(f"📝 Logged LLM I/O to: {filename}")
+# ========================
+# State Definition
+# ========================
+class GapAnalysisState(BaseModel):
+    original_doc: str
+    queries: List[str] = []
+    gap_analyses: Dict[str, str] = {}
+# ========================
+# LLM Setup (GPT-4o)
+# ========================
+llm = ChatOpenAI(
+    model="gpt-4o",
+    temperature=0,
+    api_key=os.environ["OPENAI_API_KEY"]
+)
+# ========================
+# Step 1: Query Generation
+# ========================
+class QueryListOutput(BaseModel):
+    queries: List[str] = Field(..., description="List of GAP analysis questions")
+query_parser = PydanticOutputParser(pydantic_object=QueryListOutput)
+def calculate_query_count(doc_length: int) -> int:
+    """Calculate number of queries based on document length."""
+    token_estimate = doc_length / 4  # Rough estimate of tokens (4 chars per token)
+    if token_estimate <= 4000:
+        return max(2, int(token_estimate / 400) * 2)  # 2 queries per 400 tokens
+    else:
+        base_queries = 20  # 2 queries/400 tokens for first 4000 tokens
+        additional_queries = int((token_estimate - 4000) / 400)  # 1 query/400 tokens after
+        return min(30, base_queries + additional_queries)  # Cap at 30 queries
+query_prompt = PromptTemplate.from_template(
+    "You are an expert in IFRS9 PD validation. Based on the following document, extract {query_count} critical questions to perform a GAP analysis.\n\n{doc}\n\n{format_instructions}"
+).partial(format_instructions=query_parser.get_format_instructions())
+query_chain = LLMChain(llm=llm, prompt=query_prompt)
+def generate_queries(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Generating queries from original document...")
+    query_count = calculate_query_count(len(state.original_doc))
+    logger.info(f"📊 Generating {query_count} queries based on document length...")
+    llm_input = query_prompt.format(doc=state.original_doc, query_count=query_count)
+    response = query_chain.run(doc=state.original_doc, query_count=query_count)
+    log_llm_io("01_generate_queries", llm_input, response)
+    state.queries = query_parser.parse(response).queries
+    logger.info(f"✅ Generated {len(state.queries)} queries.")
+    return state
+# ========================
+# Step 2: GAP Analysis
+# ========================
+gap_prompt = PromptTemplate.from_template(
+    "You are performing a GAP analysis for this query:\n\n{query}\n\n"
+    "Based on the original validation document, determine whether the IFRS9 PD model follows best practices."
+    "If not, explain the gap. If insufficient data, say so.\n\n"
+    "Original Document:\n{original_doc}\n\n"
+    "GAP Analysis Paragraph:"
+)
+gap_chain = LLMChain(llm=llm, prompt=gap_prompt)
+def generate_gap_analysis(state: GapAnalysisState) -> GapAnalysisState:
+    logger.info("🟦 Generating GAP analysis for each query...")
+    for query in state.queries:
+        llm_input = gap_prompt.format(
+            query=query,
+            original_doc=state.original_doc[:3000],
+        )
+        response = gap_chain.run(
+            query=query,
+            original_doc=state.original_doc[:3000],
+        )
+        log_llm_io(f"02_gap_analysis_{state.queries.index(query)+1}", llm_input, response)
+        state.gap_analyses[query] = response.strip()
+    logger.info("✅ GAP analysis generation complete.")
+    return state
+# ========================
+# LangGraph Pipeline
+# ========================
+def create_graph():
+    builder = StateGraph(GapAnalysisState)
+    builder.add_node("generate_queries", generate_queries)
+    builder.add_node("generate_gap", generate_gap_analysis)
+    builder.set_entry_point("generate_queries")
+    builder.add_edge("generate_queries", "generate_gap")
+    builder.add_edge("generate_gap", END)
+    return builder.compile()
+graph = create_graph()
+# ========================
+# Entrypoint
+# ========================
+def run_gap_analysis_pipeline(original_doc: str):
+    logger.info("🚀 Starting GAP analysis pipeline...")
+    initial_state = GapAnalysisState(original_doc=original_doc)
+    final_state = graph.invoke(initial_state)
+    logger.info("🎉 GAP analysis pipeline completed successfully.")
+    return final_state

attached_assets/pdf_generator.py ADDED Viewed

	@@ -0,0 +1,201 @@

+import os
+from fpdf import FPDF
+import textwrap
+import tempfile
+import matplotlib.pyplot as plt
+import numpy as np
+import io
+from typing import Dict, List, Optional
+class GAPReportPDF(FPDF):
+    def __init__(self, title="IFRS9 GAP Analysis Report"):
+        super().__init__()
+        self.set_auto_page_break(auto=True, margin=15)
+        self.add_page()
+        self.set_font("helvetica", "B", 16)
+        self.cell(0, 10, title, ln=True, align="C")
+        self.ln(10)
+    def add_section_header(self, title, size=14):
+        """Add a section header to the PDF."""
+        self.set_font("helvetica", "B", size)
+        self.cell(0, 10, title, ln=True)
+        self.ln(2)
+    def add_paragraph(self, text, size=11):
+        """Add a paragraph to the PDF."""
+        self.set_font("helvetica", "", size)
+        wrapped_text = textwrap.fill(text, width=90)
+        self.multi_cell(0, 8, wrapped_text)
+        self.ln(5)
+    def add_compliance_gauge(self, compliance_percentage):
+        """Add compliance gauge chart to the PDF."""
+        # Create matplotlib figure for gauge
+        fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(polar=True))
+        # Set the limits
+        ax.set_theta_offset(np.pi / 2)
+        ax.set_theta_direction(-1)
+        # Set the limit for the gauge
+        ax.set_thetamin(0)
+        ax.set_thetamax(180)
+        # Draw gauge background
+        ax.set_ylim(0, 10)
+        ax.set_yticks([])
+        ax.set_xticks(np.linspace(0, np.pi, 5))
+        ax.set_xticklabels(['0%', '25%', '50%', '75%', '100%'])
+        # Add colored ranges
+        theta = np.linspace(0, np.pi, 100)
+        ax.fill_between(theta, 5, 9, color='red', alpha=0.3, where=(theta <= np.pi * 0.5))
+        ax.fill_between(theta, 5, 9, color='orange', alpha=0.3, where=((theta > np.pi * 0.5) & (theta <= np.pi * 0.75)))
+        ax.fill_between(theta, 5, 9, color='yellow', alpha=0.3, where=((theta > np.pi * 0.75) & (theta <= np.pi * 0.9)))
+        ax.fill_between(theta, 5, 9, color='green', alpha=0.3, where=(theta > np.pi * 0.9))
+        # Add needle
+        needle_angle = np.pi * compliance_percentage / 100
+        ax.plot([0, needle_angle], [0, 7], color='black', linewidth=2)
+        ax.plot([0], [0], marker='o', markersize=10, color='black')
+        # Add compliance percentage text
+        ax.text(np.pi/2, 3, f"{compliance_percentage:.1f}%", ha='center', va='center', fontsize=16, fontweight='bold')
+        # Add title
+        plt.title("IFRS9 Compliance Score", pad=20)
+        # Save figure to buffer
+        buf = io.BytesIO()
+        plt.savefig(buf, format='png', dpi=100, bbox_inches='tight')
+        plt.close(fig)
+        # Add image to PDF
+        buf.seek(0)
+        self.image(buf, x=50, y=None, w=100)
+        self.ln(5)
+    def add_key_areas_chart(self, key_areas):
+        """Add key areas chart to the PDF."""
+        # Sort areas by score
+        sorted_areas = sorted(key_areas, key=lambda x: x['score'])
+        # Create horizontal bar chart
+        fig, ax = plt.subplots(figsize=(7, 3 + 0.5 * len(sorted_areas)))
+        area_names = [area['name'] for area in sorted_areas]
+        scores = [area['score'] for area in sorted_areas]
+        # Generate colors based on scores
+        colors = ['red' if score < 50 else 'orange' if score < 75 else 'yellow' if score < 90 else 'green'
+                for score in scores]
+        y_pos = np.arange(len(area_names))
+        ax.barh(y_pos, scores, color=colors)
+        ax.set_yticks(y_pos)
+        ax.set_yticklabels(area_names)
+        ax.set_xlabel('Compliance Score (%)')
+        ax.set_xlim(0, 100)
+        ax.set_title('Compliance by Key Areas')
+        # Add score labels
+        for i, score in enumerate(scores):
+            ax.text(score + 1, i, f"{score:.1f}%", va='center')
+        plt.tight_layout()
+        # Save figure to buffer
+        buf = io.BytesIO()
+        plt.savefig(buf, format='png', dpi=100, bbox_inches='tight')
+        plt.close(fig)
+        # Add image to PDF
+        buf.seek(0)
+        self.image(buf, x=15, y=None, w=180)
+        self.ln(5)
+    def add_key_areas_details(self, key_areas):
+        """Add key areas details to the PDF."""
+        for area in key_areas:
+            self.set_font("helvetica", "B", 12)
+            self.cell(0, 10, f"{area['name']} ({area['score']:.1f}%)", ln=True)
+            self.set_font("helvetica", "", 11)
+            wrapped_desc = textwrap.fill(area['description'], width=90)
+            self.multi_cell(0, 8, wrapped_desc)
+            self.ln(5)
+    def add_query_section(self, query: str, gap_analysis: str):
+        """Add a query and its analysis to the PDF."""
+        # Query header
+        self.set_font("helvetica", "B", 12)
+        wrapped_query = textwrap.fill(query, width=90)
+        self.multi_cell(0, 10, wrapped_query)
+        self.ln(5)
+        # GAP Analysis content
+        self.set_font("helvetica", "", 11)
+        wrapped_analysis = textwrap.fill(gap_analysis, width=90)
+        self.multi_cell(0, 8, wrapped_analysis)
+        self.ln(10)
+def generate_consolidated_pdf(consolidated_result, queries: list = None, gap_analyses: dict = None, output_path: str = "gap_analysis_report.pdf"):
+    """Generate a consolidated PDF report with visualizations."""
+    try:
+        pdf = GAPReportPDF("IFRS9 Consolidated GAP Analysis Report")
+        # Add compliance gauge
+        pdf.add_compliance_gauge(consolidated_result.compliance_percentage)
+        # Add executive summary
+        pdf.add_section_header("Executive Summary")
+        pdf.add_paragraph(consolidated_result.executive_summary)
+        # Add key areas visualization
+        pdf.add_section_header("Key Areas for Improvement")
+        pdf.add_key_areas_chart(consolidated_result.key_areas)
+        # Add key areas details
+        pdf.add_section_header("Key Areas Details")
+        pdf.add_key_areas_details(consolidated_result.key_areas)
+        # Add detailed analysis
+        pdf.add_section_header("Detailed Analysis")
+        pdf.add_paragraph(consolidated_result.detailed_analysis)
+        # Add individual query analyses if provided
+        if queries and gap_analyses:
+            pdf.add_page()
+            pdf.add_section_header("Individual Query Analyses")
+            for query in queries:
+                if query in gap_analyses:
+                    pdf.add_query_section(query, gap_analyses[query])
+        # Use a temporary file path
+        temp_dir = tempfile.gettempdir()
+        temp_path = os.path.join(temp_dir, output_path)
+        pdf.output(temp_path)
+        return temp_path
+    except Exception as e:
+        print(f"Error generating consolidated PDF: {str(e)}")
+        raise
+def generate_gap_analysis_pdf(queries: list, gap_analyses: dict, output_path: str = "gap_analysis_report.pdf"):
+    """Generate a PDF with individual query analyses."""
+    try:
+        pdf = GAPReportPDF()
+        for query in queries:
+            if query in gap_analyses:
+                pdf.add_query_section(query, gap_analyses[query])
+        # Use a temporary file path
+        temp_dir = tempfile.gettempdir()
+        temp_path = os.path.join(temp_dir, output_path)
+        pdf.output(temp_path)
+        return temp_path
+    except Exception as e:
+        print(f"Error generating PDF: {str(e)}")
+        raise