Free AWS AIP-C01 Actual Exam Questions - Question 5 Discussion
Scenario: A research team needs a mechanism to represent user queries and internal documents as semantic embeddings to capture contextual relationships. The solution must be fully managed, scalable, and integrate easily with Bedrock AI agents for downstream RAG workflows. Question- Which approach best satisfies these requirements?. Options:
B imo, since Titan Text Embeddings are designed specifically for generating semantic vectors, and pairing them with OpenSearch allows scalable, context-aware retrieval. This setup feels more flexible for embedding workflows than A.
Option A stands out because Amazon Kendra is purpose-built for semantic search with built-in ranking and retrieval, fully managed, and scales automatically. It also integrates smoothly with AI services, so you don’t have to worry about managing embeddings or storage yourself. Compared to B, which needs you to handle vector storage and search in OpenSearch, Kendra feels like a more turnkey solution for this scenario. The question emphasizes ease of integration and scalability, which Kendra nails without extra setup or overhead.
D imo, SageMaker JumpStart focuses more on fine-tuning models for generation rather than managing scalable embeddings for search. It seems less aligned with the requirement for a fully managed, scalable embedding solution.
It’s B for me. Using Titan Text Embeddings in Bedrock means you get native embedding generation fully managed by AWS, which fits the requirement for scalability and easy integration. Storing embeddings in OpenSearch isn’t too complex since it supports vector search now, so it can handle retrieval efficiently. Plus, you get more flexibility with how embeddings are created and used downstream compared to Kendra’s more fixed setup. A is fully managed, but B offers a tighter integration with Bedrock embeddings and RAG workflows, which seems crucial here.
B looks interesting since it directly uses Titan embeddings, which might offer more control over the embedding process compared to Kendra. But does storing vectors in OpenSearch add extra management overhead?
I don’t think C or D fit since they’re more about preprocessing or fine-tuning models, not about directly handling embeddings for semantic search. Between A and B, A’s advantage is it’s fully managed and purpose-built for semantic search, which means less setup hassle. B might give more control but adds complexity with OpenSearch. The question stresses fully managed and easy integration, which points me to A over B.
B/D? I get why A seems like the easy pick since it’s fully managed semantic search, but the question mentions representing queries and docs as embeddings specifically. B uses Titan embeddings in Bedrock, which fits that part perfectly and is also scalable. Yeah, you’d have to handle OpenSearch, but that gives more control over vector storage and retrieval. D’s about fine-tuning models for summarization, which might be useful downstream but doesn’t directly handle embeddings or semantic retrieval as cleanly as B. C seems off since it’s more about preprocessing features, not true semantic embedd
Maybe A works best since it's fully managed and built specifically for semantic search, so no extra hassle with indexing or storage. B sounds more complex with OpenSearch maintenance, which might not be ideal here.
Maybe A makes sense here too. It’s a fully managed service designed specifically for semantic search and natural language queries, so it handles indexing and retrieving relevant docs without extra setup. Plus, it integrates well with AI agents, which fits the downstream RAG workflow requirement. Unlike option B, which involves managing OpenSearch yourself, A seems more out-of-the-box and scalable. The question highlights fully managed and easy integration, so that’s why A stands out as a solid choice.
Option B fits best since it directly converts text into embeddings and uses OpenSearch for scalable retrieval, making it easier to integrate with Bedrock AI agents for semantic search workflows.
I think D makes the most sense here. Decreasing max_depth limits how complex each tree can get, which should help reduce overfitting. Increasing max_depth or min_child_weight like in A and B might actually make overfitting worse. Not sure about C though, since increasing colsample_bytree might also increase variance.